ПРОСОДИЯ КАК ЭЛЕМЕНТ ПРАВОВОЙ ОХРАНЫ ГОЛОСА ГРАЖДАНИНА

Artjom R.  BUDNIK

doi:10.17323/tis.2025.27359

Artjom R. BUDNIK Bauman Moscow State Technical University, Moscow, Russia https://orcid.org/0009-0005-6623-9668

DOI: https://doi.org/10.17323/tis.2025.27359

Keywords: legal protection of voice, personal right, prosody, prosody parameters, speech modality digitization, linguistic model, generative language model, artificial neural network, software product development

Abstract

The draft law on the protection of the human voice as an object of personal non-property rights when it is generated by artificial neural networks defined the objectives of this study to identify the features, parametric analysis and digitalization of one of the main modalities of human speech-demand prosody. The methodology of the work is based on a systemic analysis of the phenomenon of prosody as a legally significant parameter, putting forward and testing a hypothesis about which type of speech prosody is the most pronounced. The selected methodology included a search for a source of high-quality samples of this speech modality, parameterization and digitalization of the selected prosody, as well as its comparison with neutral examples of verbal communication.

The results of the work include a conclusion on the need for legal detailing of the category of voice in order to develop effective mechanisms for its protection and defense. A useful result of the work was the setting up of an experiment on measuring and recording parameters that form a set of recurring features characteristic of the speech modality of demand, using unadapted, natural examples of individual statements. The results of the experiment were mathematically assessed and linguistically interpreted. Based on the identified patterns and experimental data, a software module for generating speech prosody of the requirement was developed. The created prototype passed experimental verification of the quality of its target function.

The scientific novelty of the work lies in the specification of knowledge about the parameters of prosody, important for its legislative regulation as an intangible benefit, as well as in proving the possibility of automatic reproduction of human verbal expression by software and hardware. The practical significance of the study is due to the applicability of its results for the legal protection of the human voice, deep training of large linguistic models in speech communication skills and imparting characteristics of a certain prosody to verbal elements, including through the integration of the developed prototype.

Downloads

Download data is not yet available.

Author Biography

Artjom R. BUDNIK, Bauman Moscow State Technical University, Moscow, Russia

A.R. Budnik is a 3rd year student of the Bauman Moscow State Technical University, majoring in linguistics and neurotechnological linguistics.

References

Language Models are Unsupervised Multitask Learners / A. Radford, J. Wu, R. Child et al. OpenAI, 2019. P. 1–24.

Jurafsky D., Martin J.H. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition/ Ed. 3. Lond.: Pearson, 2019. P. 1–16.

Потапов В.В., Казак Е.А. Речевая коммуникация в сетевых структурах: между глобальным и локальным: cб. науч. трудов. М.: РАН. ИНИОН, Отдел языкознания, 2022. 280 с.

Чубиков А.В. Исследование методов для повышения качества озвучивания текста в системах речевого синтеза // Вестник Российского университета дружбы народов. 2018. № 4(20). C. 551–558.

Huang L., Yu D. Linguistic Features for Speech Synthesis: A Review // IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2011. Vol. 19. No. 7. P. 1723–1734.

Xu Y., Jiang X. Voice Synthesis Based on Timbral Characteristics // Proceedings International Conference on Electrical and Control Engineering. Melaka, Malaysia, 2019. P. 124–128.

Tao X., Li S. Emotional Speech Synthesis Based on Prosodic Attributes // Proceedings IIEEE International Conference on Acoustics, Speech and Signal Processing. New Orleans, LA, USA, 2017. P. 511–515.

Chung D., Kim H., Ahn S. An integrated study of user acceptance and resistance on voice commerce // International Journal of Innovation and Technology Management. 2022. Vol. 19. No. 7.

Singh R., Jiménez A., Øland A. Voice disguise by mimicry: deriving statistical articulometric evidence to evaluate claimed impersonation // IET Biometrics. 2017. Vol. 6. No. 4. P. 282–289.

Матвеев А.Г., Мартьянова Е.Ю. Гражданско-правовая охрана голоса человека при его синтезе и последующем использовании // Ex jure. 2023. № 3. С. 118–131. DOI: 10.17072/2619-0648-2023-3-118-131

Аникян Т.С. Экспрессивный потенциал просодии в инаугурационной риторике (на материале обращения Джозефа Байдена в 2021 г.) // Litera. 2021. No. 8. DOI: 10.25136/2409-8698.2021.8.36334

Лотман Ю.М. Семиотика культуры и понятие текста // Ученые записки Тарт. гос. ун-та. 1981. № 515. С. 3–7.

Searle J.R. Speech Acts: an Essay in the Philosophy of Language. Lond.: Cambridge University Press, 1969. P. 1–220.

YouTube-канал A&E, рубрика видео «LIVE PD». — URL: https://www.youtube.com/@AETV

Madzlan N.A., Han J., Bonin F., Campbell N. Automatic recognition of attitudes in video blogs — prosodic and visual feature analysis // INTERSPEECH-2014. Eds. H. Li, P. Ching. ISCA, 2014. P. 1826–1830. — URL: https://www.isca-speech.org/archive/interspeech_2014/i14_1826.html

Madzlan N.A., Han J., Bonin F., Campbell N. Towards automatic recognition of attitudes: Prosodic analysis of video blogs // Proceedings 7th International Conference on Speech Prosody 2014 / Eds. N. Campbell, D. Gibbon, D. Hirst. ISCA, 2014. P. 91–94. DOI: 10.21437/SpeechProsody.2014-6

Рафикова А.С., Валуева Е.А., Панфилова А.С. Голос и психологические свойства человека: обзор современных исследований // Психология Журнал Высшей школы экономики. 2022. T. 19. № 1. С. 195–215. DOI: https://doi.org/10.17323/1813-8918-2022-1-195-215

Ramsay R.W. Speech patterns and personality // Language and Speech. 1968. Vol. 11. No. 1. P. 54–63. DOI: 10.1177/002383096801100108

Eisenberg P., Zalowitz E. Judging expressive movement: III. Judgments of dominance-feeling from phonograph records of voice // Journal of Applied Psychology. 1938. Vol. 22. No. 6. P. 620–631. DOI: 10.1037/h0059457

Stagner R. Judgments of voice and personality // Journal of Educational Psychology. 1936. Vol. 27. No. 4. P. 272–277. DOI: 10.1037/h0057086

Taylor H.C. Social agreement on personality traits as judged from speech // Journal of Social Psychology. 1934. Vol. 5. No. 2. P. 244-248. DOI: 10.1080/00224545.1934.9919452

Truesdale D.M., Pell M.D. The sound of passion and indifference // Speech Communication. 2018. Vol. 99. P. 124–134. DOI: 10.1016/j.specom.2018.03.007

Jones A., Bennett R., Cross S. Keepin’ it real? Life, death, and holograms on the live music stage // The Digital Evolution of Live Music / Eds. Angela Cresswell Jones, Rebecca Jane Bennett. Kingston Upon Hull, UK: Chandos Publishing, 2015. P. 123–138. ISBN 9780081000670. DOI: 10.1016/B978-0-08-100067-0.00010-5

Рудин Л.Б. Основы голосоведения. М.: Граница, 2009. 104 с. — URL: https://voiceacademy.ru/images/files/nauch_trudi/osnovy-golosovedenija-2009.pdf

Емельянов В.В. Развитие голоса. Координация и тренинг: учебное пособие. 12-е изд., стер. СПб.: Лань; Планета музыки, 2023. 168 с. — URL: https://reader.lanbook.com/book/316097#4

Busquet F., Efthymiou F., Hildebrand C. Voice analytics in the wild: Validity and predictive accuracy of common audio-recording devices // Behaviour Research. 2024. Vol. 56. P. 2114–2134. DOI: 10.3758/s13428-023-02139-9

Brown A., Davis K. The importance of using repetitive speech samples in prosody research // Journal of Speech Sciences. 2018. Vol. 3. No. 27. P. 245–260.

Кибрик А.Е. Методика полевых исследований (к постановке проблемы). М.: Изд-во Моск. ун-та, 1972. 181 с.

Lee H., Park S. Analyzing prosodic elements in speech demands using repetitive speech fragments // Proceedings of the International Conference on Speech and Language Processing. ISCA, 2019. P. 89–102.

Johnson R., Adams M. The use of regulated speech samples in studying prosody of obligation // Law Enforcement Quarterly. 2020. Vol. 4. No. 15. P. 312–325.

Garcia L., Rodriguez E. Intonational modalities in specific types of utterances // Journal of Phonetics and Intonation. 2017. Vol. 1. No. 19. P. 56–70.

Cho T., Ladefoged P. Variation and universals in VOT: evidence from 18 languages // Journal of Phonetics. 1999. Vol. 2. No. 27. P. 207–229. DOI: 10.1006/jpho.1999.0094

Heldner M. On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in Swedish // Journal of Phonetics. 2003. No. 31. P. 39–62. DOI: 10.1016/S0095-4470(02)00071-2

Scherer K.R., Banse R., Wallbott H.G. Emotion inferences from vocal expression correlate across languages and cultures // Journal of Cross-Cultural Psychology. 2001. Vol. 1. No. 32. P. 76-92. DOI: 10.1177/0022022101032001009

Blicher D.L., Diehl R.L., Cohen L.B. Effects of syllable duration on the perception of the Mandarin tone2/tone3 distinction: Evidence of auditory enhancement // J. Phonetics. 1990. No. 18. P. 37–49. DOI: 10.1016/S0095-4470(19)30357-2

Cutler A., Foss D.J. On the role of sentence stress in sentence processing // Language and Speech. 1977. Vol. 1. No. 20. P. 1–10. DOI: 10.1177/002383097702000101

Elif Bozkurt, Yücel Yemez, Engin Erzin. Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures // Speech Communication. 2016. No. 85. P. 29–42. DOI: 10.1016/j.specom.2016.10.004

Chen M., Mao S., Liu Y. Big data: A survey // Mobile Networks and Applications. 2014. Vol. 2. No. 19. P. 171–209. DOI: 10.1007/s11036-013-0489-0

References

Language Models are Unsupervised Multitask Learners / A. Radford, J. Wu, R. Child et al. OpenAI. 2019. P. 1–24.

Jurafsky D., Martin J.H. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition/ Ed. 3. Lond.: Pearson, 2019. P. 1–16.

Potapov V.V., Kazak E.A. Rechevaya kommunikaciya v setevyx strukturax: mezhdu globalnym i lokalnym: sb. nauch. trudov. M.: RAN. INION; Otdel yazykoznaniya, 2022. 280 p.

Chubikov A.V. Issledovanie metodov dlya povysheniya kachestva ozvuchivaniya teksta v sistemax rechevogo sinteza // Vestnik Rossijskogo universiteta druzhby narodov. 2018. No. 4(20). P. 551–558.

Huang L., Yu D. Linguistic Features for Speech Synthesis: A Review // IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2011. Vol. 19. No. 7. P. 1723–1734.

Xu Y., Jiang X. Voice Synthesis Based on Timbral Characteristics // International Conference on Electrical and Control Engineering. Melaka, Malaysia, 2019. P. 124–128.

Tao X., Li S. Emotional Speech Synthesis Based on Prosodic Attributes // Proceedings IIEEE International Conference on Acoustics, Speech and Signal Processing. New Orleans, LA, USA, 2017. P. 511–515.

Chung D., Kim H., Ahn S. An integrated study of user acceptance and resistance on voice commerce // International Journal of Innovation and Technology Management. 2022. Vol. 19. No. 7. DOI: https://doi.org/10.1142/S0219877022500250

Singh R., Jiménez A., Øland A. Voice disguise by mimicry: deriving statistical articulometric evidence to evaluate claimed impersonation // IET Biometrics. 2017. Vol. 6. No. 4. P. 282–289. DOI: https://doi.org/10.1049/iet-bmt.2016.0126

Matveyev A.G., Martyanova E.Yu. Grazhdansko-pravovaya okhrana golosa cheloveka pri yego sinteze i posleduyushchem ispolzovanii // Ex jure. 2023. No. 3. S. 118–131. DOI: 10.17072/2619-0648-2023-3-118-131 DOI: https://doi.org/10.17072/2619-0648-2023-3-118-131

Anikyan T.S. Ekspressivnyj potencial prosodii v inauguracionnoj ritorike (na materiale obrashheniya Dzhozefa Bajdena v 2021 g.) // Litera. 2021. No. 8. DOI: 10.25136/2409-8698.2021.8.36334 DOI: https://doi.org/10.25136/2409-8698.2021.8.36334

Lotman Yu.M. Semiotika kultury i ponyatie teksta // Uchen. zapisky Tart. gos. un-ta. 1981. No. 515. S. 3–7.

Searle J.R. Speech Acts. An Essay in the Philosophy of Language // Cambridge University Press. 1969. P. 1–220. DOI: https://doi.org/10.1017/CBO9781139173438

YouTube-kanal A&E, rubrika video “LIVE PD”. — URL: https://www.youtube.com/@AETV

Madzlan N.A., Han J., Bonin F., Campbell N. Automatic recognition of attitudes in video blogs — prosodic and visual feature analysis // INTERSPEECH-2014. Eds. H. Li, P. Ching. ISCA, 2014. P. 1826–1830. — URL: https://www.isca-speech.org/archive/interspeech_2014/i14_1826.html DOI: https://doi.org/10.21437/Interspeech.2014-415

Madzlan N.A., Han J., Bonin F., Campbell N. Towards automatic recognition of attitudes: Prosodic analysis of video blogs // Proceedings 7th International Conference on Speech Prosody 2014 / Eds. N. Campbell, D. Gibbon, D. Hirst. ISCA, 2014. P. 91–94. DOI: 10.21437/SpeechProsody.2014-6 DOI: https://doi.org/10.21437/SpeechProsody.2014-6

Rafikova A.S., Valuyeva E. A., Panfilova A.S. Golos i psikhologicheskiye svoystva cheloveka: obzor sovremennykh issledovaniy // Journal of the Higher School of Economics. Psychology. 2022. T. 19. No. 1. S. 195–215.

Ramsay R.W. Speech patterns and personality // Language and Speech. 1968. Vol. 11. No. 1. P. 54–63. DOI: 10.1177/002383096801100108. DOI: https://doi.org/10.1177/002383096801100108

Eisenberg P., Zalowitz E. Judging expressive movement: III. Judgments of dominance-feeling from phonograph records of voice // Journal of Applied Psychology. 1938. Vol. 22. No. 6. P. 620–631. DOI: 10.1037/h0059457 DOI: https://doi.org/10.1037/h0059457

Stagner R. Judgments of voice and personality // Journal of Educational Psychology. 1936. Vol. 27. No. 4. P. 272–277. DOI: 10.1037/h0057086 DOI: https://doi.org/10.1037/h0057086

Taylor H.C. Social agreement on personality traits as judged from speech // Journal of Social Psychology. 1934. Vol. 5. No. 2. P. 244-248. DOI: 10.1080/00224545.1934.9919452 DOI: https://doi.org/10.1080/00224545.1934.9919452

Truesdale D.M., Pell M.D. The sound of passion and indifference // Speech Communication. 2018. Vol. 99. P. 124–134. DOI: 10.1016/j.specom.2018.03.007 DOI: https://doi.org/10.1016/j.specom.2018.03.007

Jones A., Bennett R., Cross S. Keepin’ it real? Life, death, and holograms on the live music stage // The Digital Evolution of Live Music / Eds. Angela Cresswell Jones, Rebecca Jane Bennett. Kingston Upon Hull, UK: Chandos Publishing, 2015. P. 123–138. ISBN 9780081000670. DOI: 10.1016/B978-0-08-100067-0.00010-5 DOI: https://doi.org/10.1016/B978-0-08-100067-0.00010-5

Rudin L.B. Osnovy golosovedeniya. M.: Granitsa, 2009. 104 s. — URL: https://voiceacademy.ru/images/files/nauch_trudi/osnovy-golosovedenija-2009.pdf

Emelyanov V.V. Razvitiye golosa. Koordinatsiya i trening: uchebnoye posobiye. 12-e izd., ster. SPb.: Lan; Planeta muzyki, 2023. 168 s. — URL: https://reader.lanbook.com/book/316097#4

Busquet F., Efthymiou F., Hildebrand C. Voice analytics in the wild: Validity and predictive accuracy of common audio-recording devices // Behaviour Research. 2024. Vol. 56. P. 2114–2134. DOI: 10.3758/s13428-023-02139-9 DOI: https://doi.org/10.3758/s13428-023-02139-9

Brown A., Davis K. The importance of using repetitive speech samples in prosody research // Journal of Speech Sciences. 2018. Vol. 3. No. 27. P. 245–260.

Kibrik A.E. Metodika polevyx issledovanij (k postanovke problemy). M.: Izd-vo Mosk. un-t-a, 1972. 181 p.

Lee H., Park S. Analyzing prosodic elements in speech demands using repetitive speech fragments // Proceedings of the International Conference on Speech and Language Processing. ISCA, 2019. P. 89–102.

Johnson R., Adams M. The use of regulated speech samples in studying prosody of obligation // Law Enforcement Quarterly. 2020. Vol. 4. No. 15. P. 312–325.

Garcia L., Rodriguez E. Intonational modalities in specific types of utterances // Journal of Phonetics and Intonation. 2017. Vol. 1. No. 19. P. 56–70.

Cho T., Ladefoged P. Variation and universals in VOT: evidence from 18 languages // Journal of Phonetics. 1999. Vol. 2. No. 27. P. 207–229. DOI: 10.1006/jpho.1999.0094 DOI: https://doi.org/10.1006/jpho.1999.0094

Heldner M. On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in Swedish // Journal of Phonetics. 2003. No. 31. P. 39–62. DOI: 10.1016/S0095-4470(02)00071-2 DOI: https://doi.org/10.1016/S0095-4470(02)00071-2

Scherer K.R., Banse R., Wallbott H.G. Emotion inferences from vocal expression correlate across languages and cultures // Journal of Cross-Cultural Psychology. 2001. Vol. 1. No. 32. P. 76-92. DOI: 10.1177/0022022101032001009 DOI: https://doi.org/10.1177/0022022101032001009

Blicher D.L., Diehl R.L., Cohen L.B. Effects of syllable duration on the perception of the Mandarin tone2/tone3 distinction: Evidence of auditory enhancement // J. Phonetics. 1990. No. 18. P. 37–49. DOI: 10.1016/S0095-4470(19)30357-2 DOI: https://doi.org/10.1016/S0095-4470(19)30357-2

Cutler A., Foss D.J. On the role of sentence stress in sentence processing // Language and Speech. 1977. Vol. 1. No. 20. P. 1–10. DOI: 10.1177/002383097702000101 DOI: https://doi.org/10.1177/002383097702000101

Elif Bozkurt, Yücel Yemez, Engin Erzin. Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures // Speech Communication. 2016. No. 85. P. 29–42. DOI: 10.1016/j.specom.2016.10.004 DOI: https://doi.org/10.1016/j.specom.2016.10.004

Chen M., Mao S., Liu Y. Big data: A survey // Mobile Networks and Applications. 2014. Vol. 2. No. 19. P. 171–209. DOI: 10.1007/s11036-013-0489-0 DOI: https://doi.org/10.1007/s11036-013-0489-0

PROSODY IN THE LEGAL PROTECTION OF THE HUMAN VOICE

Abstract

Downloads

Author Biography

References