Voice anti-spoofing data-set built from Latin American Spanish accents implementing voice conversion and text-to-speech techniques

Tamayo Flórez, Pablo Andrés

dc.contributor	Manrique Piramanrique, Rubén Francisco
dc.contributor	Núñez Castro, Haydemar María
dc.contributor	Pacheco Páramo, Diego Felipe
dc.contributor	FLAG
dc.creator	Tamayo Flórez, Pablo Andrés
dc.date.accessioned	2023-01-30T15:31:59Z
dc.date.accessioned	2023-09-06T23:20:26Z
dc.date.available	2023-01-30T15:31:59Z
dc.date.available	2023-09-06T23:20:26Z
dc.date.created	2023-01-30T15:31:59Z
dc.date.issued	2022-11-23
dc.identifier	http://hdl.handle.net/1992/64313
dc.identifier	instname:Universidad de los Andes
dc.identifier	reponame:Repositorio Institucional Séneca
dc.identifier	repourl:https://repositorio.uniandes.edu.co/
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/8726422
dc.description.abstract	In this work, a voice anti-spoofing dataset was built from samples in Latin America Spanish implementing voice conversion and text-to-speech algorithms and later a test was performed on anti-spoofing models trained on samples in English to see their behaviors with other languages than those in which they were trained.
dc.language	eng
dc.publisher	Universidad de los Andes
dc.publisher	Maestría en Ingeniería de Sistemas y Computación
dc.publisher	Facultad de Ingeniería
dc.publisher	Departamento de Ingeniería Sistemas y Computación
dc.relation	M. Dua, C. Jain, and S. Kumar, "Lstm and cnn based ensemble approach for spoof detection task in automatic speaker verification systems," Journal of Ambient Intelligence and Humanized Computing, 2021.
dc.relation	H. Dinkel, Y. Qian, and K. Yu, "Investigating raw wave deep neural networks for end-to-end speaker spoofing detection," IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 26, pp. 2002-2014, 2018.
dc.relation	Q. Fu, Z. Teng, J. White, M. Powell, and D. C. Schmidt, "Fastaudio: A learnable audio front-end for spoof speech detection," 2021.
dc.relation	T. Arif, A. Javed, M. Alhameed, F. Jeribi, and A. Tahir, "Voice spoofing countermeasure for logical access attacks detection," IEEE Access, vol. 9, pp. 162857-162868, 2021.
dc.relation	Y. Xie, Z. Zhang, and Y. Yang, "Siamese network with wav2vec feature for spoofing speech detection," Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 6, pp. 4700-4704, 2021.
dc.relation	A. Gomez-Alanis, A. M. Peinado, J. A. Gonzalez, and A. M. Gomez, "A gated recurrent convolutional neural network for robust spoofing detection," IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 27, pp. 1985-1999, 2019.
dc.relation	X. Wang and J. Yamagishi, "chapter-a practical guide to logical access voice presentation attack detection," 2022.
dc.relation	A. Consortium, "Asvspoof 2019 evaluation plan," vol. 4, pp. 1-19, 2019.
dc.relation	Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, C. Hanil¸ci, M. Sahidullah, and A. Sizov, "Asvspoof 2015: The first automatic speaker verification spoofing and ountermeasures challenge," Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2015-Janua, pp. 2037-2041, 2015
dc.relation	R. Reimao and V. Tzerpos, "For: A dataset for synthetic speech detection," 2019 10th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2019, 2019.
dc.relation	J. Schepens, T. Dijkstra, F. Grootjen, and W. van Heuven, "Cross-language distributions of high frequency and phonetically similar cognates," PloS one, vol. 8, p. e63006, 05 2013.
dc.relation	A. Guevara-Rukoz, I. Demirsahin, F. He, S. H. C. Chu, S. Sarin, K. Pipatsrisawat, A. Gutkin, A. Butryna, and O. Kjartansson, "Crowdsourcing latin american spanish for low-resource text-to-speech," LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings, pp. 6504-6513, 2020.
dc.relation	J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, T. Kinnunen, and Z. Ling, "The voice conversion challenge 2018: Promoting development of parallel and nonparallel methods," pp. 195-202, 2018.
dc.relation	S. S. Sribhashyam, M. S. Salekin, D. Goldgof, G. Zamzmi, M. Last, and Y. Sun, "Pattern recognition in vital signs using spectrograms," Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, pp. 1133-1138, 2021.
dc.relation	L. Wyse, "Audio spectrogram representations for processing with convolutional neural networks," vol. 1, pp. 37-41, 2017.
dc.relation	Y. Jia, X. Chen, J. Yu, L. Wang, Y. Xu, S. Liu, and Y. Wang, "Speaker recognition based on characteristic spectrograms and an improved selforganizing feature map neural network," Complex and Intelligent Systems, vol. 7, pp. 1749-1757, 8 2021.
dc.relation	M. Zhang, X. Wang, F. Fang, H. Li, and J. Yamagishi, "Joint training framework for text-to-speech and voice conversion using multi-source tacotron and wavenet," Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2019-Septe, pp. 1298-1302, 2019.
dc.relation	S. Russell and P. Norvig, "Artificial neural networks," 2016.
dc.relation	F. Chollet, ¿What is deep learning?, 2021.
dc.relation	J. Krohn, G. Beyleveld, and A. Bassens, "Generative adversarial networks," 2019.
dc.relation	J. Krohn, G. Beyleveld, and A. Bassens, Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence. Addison-Wesley Professional, 1st ed., 2019.
dc.relation	J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," 2020.
dc.relation	K. S. Rao and M. K. E, Speech Recognition Using Articulatory and Excitation Source Features. Springer International Publishing, 2017.
dc.relation	B. Markovic, J. Galic, and M. Miji´c, "Application of teager energy operator on linear and mel scales for whispered speech recognition," Archives of Acoustics, vol. 43, 01 2018.
dc.relation	Z. Wu, P. L. D. Leon, C. Demiroglu, A. Khodabakhsh, S. King, Z. H. Ling, D. Saito, B. Stewart, T. Toda, M. Wester, and J. Yamagish, "Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance," IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 24, pp. 768-783, 2016.
dc.relation	J. Yamagishi, C. Veaux, and K. MacDonald, "Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit (version 0.92)," 2019.
dc.relation	H. Tak, M. Todisco, X. Wang, J.-W. Jung, J. Yamagishi, and N. Evans, "Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation,"
dc.relation	Z. Zhang, X. Yi, and X. Zhao, "Fake speech detection using residual network with transformer encoder," IH and MMSec 2021 - Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, pp. 13-22, 2021.
dc.relation	X. Wang and J. Yamagishi, "A comparative study on recent neural spoofing countermeasures for synthetic speech detection," Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 6, pp. 4685-4689, 2021.
dc.relation	X. Wang and J. Yamagishi, "Investigating self-supervised front ends for speech spoofing countermeasures," 2021.
dc.relation	Y. Zhang, F. Jiang, and Z. Duan, "One-class learning towards synthetic voice spoofing detection," IEEE Signal Processing Letters, vol. 28, pp. 937-941, 2021.
dc.relation	Y. Ma, Z. Ren, and S. Xu, "Rw-resnet: A novel speech anti-spoofing model using raw waveform,"Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 5, pp. 3696-3700, 2021.
dc.relation	Y. Wang, M. Zhang, and Z. Zhu, "Detection of voice transformation disguise based on deep residual net, "PervasiveHealth: Pervasive Computing Technologies for Healthcare, pp. 126-130, 2020
dc.relation	A. Cohen, I. Rimon, E. Aflalo, and H. Permuter, "A study on data augmentation in voice anti-spoofing," 2021.
dc.relation	T. Kaneko and H. Kameoka, "Cyclegan-vc: Non-parallel voice conversion using cycle-consistent adversarial networks," European Signal Processing Conference, vol. 2018-Septe, pp. 2100-2104, 2018.
dc.relation	J. C. Chou and H. Y. Lee, "One-shot voice conversion by separating speaker and content representations with instance normalization," Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2019-Septe, pp. 664-668, 2019.
dc.relation	C. C. Lo, S. W. Fu, W. C. Huang, X. Wang, J. Yamagishi, Y. Tsao, and H. M. Wang, "Mosnet: Deep learning-based objective assessment for voice conversion," Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2019-Septe, pp. 1541-1545, 2019.
dc.relation	K. Qian, Y. Zhang, S. Chang, X. Yang, and M. Hasegawa-Johnson, "Autovc: Zero-shot voice style transfer with only autoencoder loss," 2019.
dc.relation	H. Kameoka, T. Kaneko, K. Tanaka, and N. Hojo, "Stargan-vc: Nonparallel many-to-many voice conversion using star generative adversarial networks," 2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings, pp. 266-273, 2019.
dc.relation	W. Ping, K. Peng, A. Gibiansky, S. Ark, A. Kannan, S. Narang, J. Raiman, and J. Miller, "Deep voice 3: Scaling text-to-speech with convolutional sequence learning," 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, pp. 1-16, 2018.
dc.relation	V. Popov, I. Vovk, V. Gogoryan, T. Sadekova, M. Kudinov, and J. Wei, "Diffusion-based voice conversion with fast maximum likelihood sampling scheme," 9 2021.
dc.relation	S. Liu, Y. Cao, D. Su, and H. Meng, "Diffsvc: A diffusion probabilistic model for singing voice conversion," 5 2021.
dc.relation	K. Akuzawa, K. Onishi, K. Takiguchi, K. Mametani, and K. Mori, "Conditional deep hierarchical variational autoencoder for voice conversion," 12 2021.
dc.relation	O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation, 5 2015.
dc.relation	X. Zhang, R. Zhao, J. Yan, M. Gao, Y. Qiao, X. Wang, and H. Li, "P2sgrad: Refined gradients for optimizing deep face models," 2019.
dc.relation	G. Lavrentyeva, S. Novoselov, A. Tseren, M. Volkova, A. Gorlanov, and A. Kozlov, "Stc antispoofing systems for the asvspoof2019 challenge," vol. 2019-September, pp. 1033-1037, International Speech Communication Association, 2019.
dc.relation	A. Kashkin, I. Karpukhin, and S. Shishkin, "Hifi-vc: High quality asr-based voice conversion," 3 2022.
dc.rights	Atribución-CompartirIgual 4.0 Internacional
dc.rights	Atribución-CompartirIgual 4.0 Internacional
dc.rights	http://creativecommons.org/licenses/by-sa/4.0/
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	http://purl.org/coar/access_right/c_abf2
dc.title	Voice anti-spoofing data-set built from Latin American Spanish accents implementing voice conversion and text-to-speech techniques
dc.type	Trabajo de grado - Maestría

Este ítem pertenece a la siguiente institución

Universidad de los Andes (Colombia)