PROSODY IN THE LEGAL PROTECTION OF THE HUMAN VOICE
Abstract
The draft law on the protection of the human voice as an object of personal non-property rights when it is generated by artificial neural networks defined the objectives of this study to identify the features, parametric analysis and digitalization of one of the main modalities of human speech-demand prosody. The methodology of the work is based on a systemic analysis of the phenomenon of prosody as a legally significant parameter, putting forward and testing a hypothesis about which type of speech prosody is the most pronounced. The selected methodology included a search for a source of high-quality samples of this speech modality, parameterization and digitalization of the selected prosody, as well as its comparison with neutral examples of verbal communication.
The results of the work include a conclusion on the need for legal detailing of the category of voice in order to develop effective mechanisms for its protection and defense. A useful result of the work was the setting up of an experiment on measuring and recording parameters that form a set of recurring features characteristic of the speech modality of demand, using unadapted, natural examples of individual statements. The results of the experiment were mathematically assessed and linguistically interpreted. Based on the identified patterns and experimental data, a software module for generating speech prosody of the requirement was developed. The created prototype passed experimental verification of the quality of its target function.
The scientific novelty of the work lies in the specification of knowledge about the parameters of prosody, important for its legislative regulation as an intangible benefit, as well as in proving the possibility of automatic reproduction of human verbal expression by software and hardware. The practical significance of the study is due to the applicability of its results for the legal protection of the human voice, deep training of large linguistic models in speech communication skills and imparting characteristics of a certain prosody to verbal elements, including through the integration of the developed prototype.
Downloads
References
Language Models are Unsupervised Multitask Learners / A. Radford, J. Wu, R. Child et al. OpenAI, 2019. P. 1–24.
Jurafsky D., Martin J.H. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition/ Ed. 3. Lond.: Pearson, 2019. P. 1–16.
Потапов В.В., Казак Е.А. Речевая коммуникация в сетевых структурах: между глобальным и локальным: cб. науч. трудов. М.: РАН. ИНИОН, Отдел языкознания, 2022. 280 с.
Чубиков А.В. Исследование методов для повышения качества озвучивания текста в системах речевого синтеза // Вестник Российского университета дружбы народов. 2018. № 4(20). C. 551–558.
Huang L., Yu D. Linguistic Features for Speech Synthesis: A Review // IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2011. Vol. 19. No. 7. P. 1723–1734.
Xu Y., Jiang X. Voice Synthesis Based on Timbral Characteristics // Proceedings International Conference on Electrical and Control Engineering. Melaka, Malaysia, 2019. P. 124–128.
Tao X., Li S. Emotional Speech Synthesis Based on Prosodic Attributes // Proceedings IIEEE International Conference on Acoustics, Speech and Signal Processing. New Orleans, LA, USA, 2017. P. 511–515.
Chung D., Kim H., Ahn S. An integrated study of user acceptance and resistance on voice commerce // International Journal of Innovation and Technology Management. 2022. Vol. 19. No. 7.
Singh R., Jiménez A., Øland A. Voice disguise by mimicry: deriving statistical articulometric evidence to evaluate claimed impersonation // IET Biometrics. 2017. Vol. 6. No. 4. P. 282–289.
Матвеев А.Г., Мартьянова Е.Ю. Гражданско-правовая охрана голоса человека при его синтезе и последующем использовании // Ex jure. 2023. № 3. С. 118–131. DOI: 10.17072/2619-0648-2023-3-118-131
Аникян Т.С. Экспрессивный потенциал просодии в инаугурационной риторике (на материале обращения Джозефа Байдена в 2021 г.) // Litera. 2021. No. 8. DOI: 10.25136/2409-8698.2021.8.36334
Лотман Ю.М. Семиотика культуры и понятие текста // Ученые записки Тарт. гос. ун-та. 1981. № 515. С. 3–7.
Searle J.R. Speech Acts: an Essay in the Philosophy of Language. Lond.: Cambridge University Press, 1969. P. 1–220.
YouTube-канал A&E, рубрика видео «LIVE PD». — URL: https://www.youtube.com/@AETV
Madzlan N.A., Han J., Bonin F., Campbell N. Automatic recognition of attitudes in video blogs — prosodic and visual feature analysis // INTERSPEECH-2014. Eds. H. Li, P. Ching. ISCA, 2014. P. 1826–1830. — URL: https://www.isca-speech.org/archive/interspeech_2014/i14_1826.html
Madzlan N.A., Han J., Bonin F., Campbell N. Towards automatic recognition of attitudes: Prosodic analysis of video blogs // Proceedings 7th International Conference on Speech Prosody 2014 / Eds. N. Campbell, D. Gibbon, D. Hirst. ISCA, 2014. P. 91–94. DOI: 10.21437/SpeechProsody.2014-6
Рафикова А.С., Валуева Е.А., Панфилова А.С. Голос и психологические свойства человека: обзор современных исследований // Психология Журнал Высшей школы экономики. 2022. T. 19. № 1. С. 195–215. DOI: https://doi.org/10.17323/1813-8918-2022-1-195-215
Ramsay R.W. Speech patterns and personality // Language and Speech. 1968. Vol. 11. No. 1. P. 54–63. DOI: 10.1177/002383096801100108
Eisenberg P., Zalowitz E. Judging expressive movement: III. Judgments of dominance-feeling from phonograph records of voice // Journal of Applied Psychology. 1938. Vol. 22. No. 6. P. 620–631. DOI: 10.1037/h0059457
Stagner R. Judgments of voice and personality // Journal of Educational Psychology. 1936. Vol. 27. No. 4. P. 272–277. DOI: 10.1037/h0057086
Taylor H.C. Social agreement on personality traits as judged from speech // Journal of Social Psychology. 1934. Vol. 5. No. 2. P. 244-248. DOI: 10.1080/00224545.1934.9919452
Truesdale D.M., Pell M.D. The sound of passion and indifference // Speech Communication. 2018. Vol. 99. P. 124–134. DOI: 10.1016/j.specom.2018.03.007
Jones A., Bennett R., Cross S. Keepin’ it real? Life, death, and holograms on the live music stage // The Digital Evolution of Live Music / Eds. Angela Cresswell Jones, Rebecca Jane Bennett. Kingston Upon Hull, UK: Chandos Publishing, 2015. P. 123–138. ISBN 9780081000670. DOI: 10.1016/B978-0-08-100067-0.00010-5
Рудин Л.Б. Основы голосоведения. М.: Граница, 2009. 104 с. — URL: https://voiceacademy.ru/images/files/nauch_trudi/osnovy-golosovedenija-2009.pdf
Емельянов В.В. Развитие голоса. Координация и тренинг: учебное пособие. 12-е изд., стер. СПб.: Лань; Планета музыки, 2023. 168 с. — URL: https://reader.lanbook.com/book/316097#4
Busquet F., Efthymiou F., Hildebrand C. Voice analytics in the wild: Validity and predictive accuracy of common audio-recording devices // Behaviour Research. 2024. Vol. 56. P. 2114–2134. DOI: 10.3758/s13428-023-02139-9
Brown A., Davis K. The importance of using repetitive speech samples in prosody research // Journal of Speech Sciences. 2018. Vol. 3. No. 27. P. 245–260.
Кибрик А.Е. Методика полевых исследований (к постановке проблемы). М.: Изд-во Моск. ун-та, 1972. 181 с.
Lee H., Park S. Analyzing prosodic elements in speech demands using repetitive speech fragments // Proceedings of the International Conference on Speech and Language Processing. ISCA, 2019. P. 89–102.
Johnson R., Adams M. The use of regulated speech samples in studying prosody of obligation // Law Enforcement Quarterly. 2020. Vol. 4. No. 15. P. 312–325.
Garcia L., Rodriguez E. Intonational modalities in specific types of utterances // Journal of Phonetics and Intonation. 2017. Vol. 1. No. 19. P. 56–70.
Cho T., Ladefoged P. Variation and universals in VOT: evidence from 18 languages // Journal of Phonetics. 1999. Vol. 2. No. 27. P. 207–229. DOI: 10.1006/jpho.1999.0094
Heldner M. On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in Swedish // Journal of Phonetics. 2003. No. 31. P. 39–62. DOI: 10.1016/S0095-4470(02)00071-2
Scherer K.R., Banse R., Wallbott H.G. Emotion inferences from vocal expression correlate across languages and cultures // Journal of Cross-Cultural Psychology. 2001. Vol. 1. No. 32. P. 76-92. DOI: 10.1177/0022022101032001009
Blicher D.L., Diehl R.L., Cohen L.B. Effects of syllable duration on the perception of the Mandarin tone2/tone3 distinction: Evidence of auditory enhancement // J. Phonetics. 1990. No. 18. P. 37–49. DOI: 10.1016/S0095-4470(19)30357-2
Cutler A., Foss D.J. On the role of sentence stress in sentence processing // Language and Speech. 1977. Vol. 1. No. 20. P. 1–10. DOI: 10.1177/002383097702000101
Elif Bozkurt, Yücel Yemez, Engin Erzin. Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures // Speech Communication. 2016. No. 85. P. 29–42. DOI: 10.1016/j.specom.2016.10.004
Chen M., Mao S., Liu Y. Big data: A survey // Mobile Networks and Applications. 2014. Vol. 2. No. 19. P. 171–209. DOI: 10.1007/s11036-013-0489-0
References
Language Models are Unsupervised Multitask Learners / A. Radford, J. Wu, R. Child et al. OpenAI. 2019. P. 1–24.
Jurafsky D., Martin J.H. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition/ Ed. 3. Lond.: Pearson, 2019. P. 1–16.
Potapov V.V., Kazak E.A. Rechevaya kommunikaciya v setevyx strukturax: mezhdu globalnym i lokalnym: sb. nauch. trudov. M.: RAN. INION; Otdel yazykoznaniya, 2022. 280 p.
Chubikov A.V. Issledovanie metodov dlya povysheniya kachestva ozvuchivaniya teksta v sistemax rechevogo sinteza // Vestnik Rossijskogo universiteta druzhby narodov. 2018. No. 4(20). P. 551–558.
Huang L., Yu D. Linguistic Features for Speech Synthesis: A Review // IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2011. Vol. 19. No. 7. P. 1723–1734.
Xu Y., Jiang X. Voice Synthesis Based on Timbral Characteristics // International Conference on Electrical and Control Engineering. Melaka, Malaysia, 2019. P. 124–128.
Tao X., Li S. Emotional Speech Synthesis Based on Prosodic Attributes // Proceedings IIEEE International Conference on Acoustics, Speech and Signal Processing. New Orleans, LA, USA, 2017. P. 511–515.
Chung D., Kim H., Ahn S. An integrated study of user acceptance and resistance on voice commerce // International Journal of Innovation and Technology Management. 2022. Vol. 19. No. 7. DOI: https://doi.org/10.1142/S0219877022500250
Singh R., Jiménez A., Øland A. Voice disguise by mimicry: deriving statistical articulometric evidence to evaluate claimed impersonation // IET Biometrics. 2017. Vol. 6. No. 4. P. 282–289. DOI: https://doi.org/10.1049/iet-bmt.2016.0126
Matveyev A.G., Martyanova E.Yu. Grazhdansko-pravovaya okhrana golosa cheloveka pri yego sinteze i posleduyushchem ispolzovanii // Ex jure. 2023. No. 3. S. 118–131. DOI: 10.17072/2619-0648-2023-3-118-131 DOI: https://doi.org/10.17072/2619-0648-2023-3-118-131
Anikyan T.S. Ekspressivnyj potencial prosodii v inauguracionnoj ritorike (na materiale obrashheniya Dzhozefa Bajdena v 2021 g.) // Litera. 2021. No. 8. DOI: 10.25136/2409-8698.2021.8.36334 DOI: https://doi.org/10.25136/2409-8698.2021.8.36334
Lotman Yu.M. Semiotika kultury i ponyatie teksta // Uchen. zapisky Tart. gos. un-ta. 1981. No. 515. S. 3–7.
Searle J.R. Speech Acts. An Essay in the Philosophy of Language // Cambridge University Press. 1969. P. 1–220. DOI: https://doi.org/10.1017/CBO9781139173438
YouTube-kanal A&E, rubrika video “LIVE PD”. — URL: https://www.youtube.com/@AETV
Madzlan N.A., Han J., Bonin F., Campbell N. Automatic recognition of attitudes in video blogs — prosodic and visual feature analysis // INTERSPEECH-2014. Eds. H. Li, P. Ching. ISCA, 2014. P. 1826–1830. — URL: https://www.isca-speech.org/archive/interspeech_2014/i14_1826.html DOI: https://doi.org/10.21437/Interspeech.2014-415
Madzlan N.A., Han J., Bonin F., Campbell N. Towards automatic recognition of attitudes: Prosodic analysis of video blogs // Proceedings 7th International Conference on Speech Prosody 2014 / Eds. N. Campbell, D. Gibbon, D. Hirst. ISCA, 2014. P. 91–94. DOI: 10.21437/SpeechProsody.2014-6 DOI: https://doi.org/10.21437/SpeechProsody.2014-6
Rafikova A.S., Valuyeva E. A., Panfilova A.S. Golos i psikhologicheskiye svoystva cheloveka: obzor sovremennykh issledovaniy // Journal of the Higher School of Economics. Psychology. 2022. T. 19. No. 1. S. 195–215.
Ramsay R.W. Speech patterns and personality // Language and Speech. 1968. Vol. 11. No. 1. P. 54–63. DOI: 10.1177/002383096801100108. DOI: https://doi.org/10.1177/002383096801100108
Eisenberg P., Zalowitz E. Judging expressive movement: III. Judgments of dominance-feeling from phonograph records of voice // Journal of Applied Psychology. 1938. Vol. 22. No. 6. P. 620–631. DOI: 10.1037/h0059457 DOI: https://doi.org/10.1037/h0059457
Stagner R. Judgments of voice and personality // Journal of Educational Psychology. 1936. Vol. 27. No. 4. P. 272–277. DOI: 10.1037/h0057086 DOI: https://doi.org/10.1037/h0057086
Taylor H.C. Social agreement on personality traits as judged from speech // Journal of Social Psychology. 1934. Vol. 5. No. 2. P. 244-248. DOI: 10.1080/00224545.1934.9919452 DOI: https://doi.org/10.1080/00224545.1934.9919452
Truesdale D.M., Pell M.D. The sound of passion and indifference // Speech Communication. 2018. Vol. 99. P. 124–134. DOI: 10.1016/j.specom.2018.03.007 DOI: https://doi.org/10.1016/j.specom.2018.03.007
Jones A., Bennett R., Cross S. Keepin’ it real? Life, death, and holograms on the live music stage // The Digital Evolution of Live Music / Eds. Angela Cresswell Jones, Rebecca Jane Bennett. Kingston Upon Hull, UK: Chandos Publishing, 2015. P. 123–138. ISBN 9780081000670. DOI: 10.1016/B978-0-08-100067-0.00010-5 DOI: https://doi.org/10.1016/B978-0-08-100067-0.00010-5
Rudin L.B. Osnovy golosovedeniya. M.: Granitsa, 2009. 104 s. — URL: https://voiceacademy.ru/images/files/nauch_trudi/osnovy-golosovedenija-2009.pdf
Emelyanov V.V. Razvitiye golosa. Koordinatsiya i trening: uchebnoye posobiye. 12-e izd., ster. SPb.: Lan; Planeta muzyki, 2023. 168 s. — URL: https://reader.lanbook.com/book/316097#4
Busquet F., Efthymiou F., Hildebrand C. Voice analytics in the wild: Validity and predictive accuracy of common audio-recording devices // Behaviour Research. 2024. Vol. 56. P. 2114–2134. DOI: 10.3758/s13428-023-02139-9 DOI: https://doi.org/10.3758/s13428-023-02139-9
Brown A., Davis K. The importance of using repetitive speech samples in prosody research // Journal of Speech Sciences. 2018. Vol. 3. No. 27. P. 245–260.
Kibrik A.E. Metodika polevyx issledovanij (k postanovke problemy). M.: Izd-vo Mosk. un-t-a, 1972. 181 p.
Lee H., Park S. Analyzing prosodic elements in speech demands using repetitive speech fragments // Proceedings of the International Conference on Speech and Language Processing. ISCA, 2019. P. 89–102.
Johnson R., Adams M. The use of regulated speech samples in studying prosody of obligation // Law Enforcement Quarterly. 2020. Vol. 4. No. 15. P. 312–325.
Garcia L., Rodriguez E. Intonational modalities in specific types of utterances // Journal of Phonetics and Intonation. 2017. Vol. 1. No. 19. P. 56–70.
Cho T., Ladefoged P. Variation and universals in VOT: evidence from 18 languages // Journal of Phonetics. 1999. Vol. 2. No. 27. P. 207–229. DOI: 10.1006/jpho.1999.0094 DOI: https://doi.org/10.1006/jpho.1999.0094
Heldner M. On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in Swedish // Journal of Phonetics. 2003. No. 31. P. 39–62. DOI: 10.1016/S0095-4470(02)00071-2 DOI: https://doi.org/10.1016/S0095-4470(02)00071-2
Scherer K.R., Banse R., Wallbott H.G. Emotion inferences from vocal expression correlate across languages and cultures // Journal of Cross-Cultural Psychology. 2001. Vol. 1. No. 32. P. 76-92. DOI: 10.1177/0022022101032001009 DOI: https://doi.org/10.1177/0022022101032001009
Blicher D.L., Diehl R.L., Cohen L.B. Effects of syllable duration on the perception of the Mandarin tone2/tone3 distinction: Evidence of auditory enhancement // J. Phonetics. 1990. No. 18. P. 37–49. DOI: 10.1016/S0095-4470(19)30357-2 DOI: https://doi.org/10.1016/S0095-4470(19)30357-2
Cutler A., Foss D.J. On the role of sentence stress in sentence processing // Language and Speech. 1977. Vol. 1. No. 20. P. 1–10. DOI: 10.1177/002383097702000101 DOI: https://doi.org/10.1177/002383097702000101
Elif Bozkurt, Yücel Yemez, Engin Erzin. Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures // Speech Communication. 2016. No. 85. P. 29–42. DOI: 10.1016/j.specom.2016.10.004 DOI: https://doi.org/10.1016/j.specom.2016.10.004
Chen M., Mao S., Liu Y. Big data: A survey // Mobile Networks and Applications. 2014. Vol. 2. No. 19. P. 171–209. DOI: 10.1007/s11036-013-0489-0 DOI: https://doi.org/10.1007/s11036-013-0489-0
Copyright (c) 2025 Works on Intellectual Property

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.