Artigo
Perceptual Error Analysis of Human and Synthesized Voices
Fecha
2017Registro en:
Journal Of Voice. New York, v. 31, n. 4, p. -, 2017.
0892-1997
10.1016/j.jvoice.2016.12.015
WOS:000406147000054
Autor
Englert, Marina [UNIFESP]
Madazio, Glaucya
Gielow, Ingrid
Lucero, Jorge
Behlau, Mara [UNIFESP]
Institución
Resumen
Objective/ Hypothesis. To assess the quality of synthesized voices through listeners' skills in discriminating human and synthesized voices. Study Design. Prospective study. Methods. Eighteen human voices with different types and degrees of deviation (roughness, breathiness, and strain, with three degrees of deviation: mild, moderate, and severe) were selected by three voice specialists. Synthesized samples with the same deviations of human voices were produced by the VoiceSim system. The manipulated parameters were vocal frequency perturbation (roughness), additive noise (breathiness), increasing tension, subglottal pressure, and decreasing vocal folds separation (strain). Two hundred sixty-nine listeners were divided in three groups: voice specialist speech language pathologists (V-SLPs), general clinician SLPs (G-SLPs), and naive listeners (NLs). The SLP listeners also indicated the type and degree of deviation. Results. The listeners misclassified 39.3% of the voices, both synthesized (42.3%) and human (36.4%) samples (P = 0.001). V-SLPs presented the lowest error percentage considering the voice nature (34.6%) G-SLPs and NLs identified almost half of the synthesized samples as human (46.9%, 45.6%). The male voices were more susceptible for misidentification. The synthesized breathy samples generated a greater perceptual confusion. The samples with severe deviation seemed to be more susceptible for errors. The synthesized female deviations were correctly classified. The male breathiness and strain were identified as roughness. Conclusion. VoiceSim produced stimuli very similar to the voices of patients with dysphonia. V-SLPs had a better ability to classify human and synthesized voices. VoiceSim is better to simulate vocal breathiness and female deviations the male samples need adjustment.