Chapter 9. Cross-linguistic comparison of automatic detection of speech breaks in read and narrated speech in four languages

Barbosa, Plínio A.

doi:10.1075/scl.94.09bar

Part of

In Search of Basic Units of Spoken Language: A corpus-driven approach
Edited by Shlomo Izre'el, Heliana Mello, Alessandro Panunzi and Tommaso Raso
[Studies in Corpus Linguistics 94] 2020
► pp. 285–300

Chapter 9
Cross-linguistic comparison of automatic detection of speech breaks in read and narrated speech in four languages

Plínio A. Barbosa | State University of Campinas, Institute for Language Studies, CNPq

This chapter tests an algorithm for the automatic detection of speech breaks in read and narrated speech in Brazilian Portuguese (BP), European Portuguese (EP), French, and German. The algorithm is independent of previous transcription or linguistic analysis (syllable, phone labeling and segmentation), requiring only the audio file. It operates in two stages: vowel onsets detection firstly, followed by V-to-V duration intervals normalization for smoothed duration z-scores. Peaks over 2.5 of the latter were considered speech breaks. Compared to human segmentation, hits for reading (70%) were higher than for narration (60%). Crosslinguistic results show EP and French having the highest proportion of hits. A test with the English Navy audio file reveals a hit proportion similar to German.

Keywords: automatic speech segmentation, duration, prosodic boundary, cross-linguistic comparison

Article outline

1.Introduction
2.Methodology
- 2.1Corpus
- 2.2The SalienceDetector script
3.Results
- 3.1Testing with English spontaneous speech
4.Conclusions
Notes
References

Published online: 18 June 2020

https://doi.org/10.1075/scl.94.09bar

References (25)

Avanzi, M., Lacheret, A., & Victorri, B.

(2008) Analor, un outil d’aide pour la modélisation de l’interface prosodie-grammaire. Travaux Linguistiques du CerLiCO, 21, 27–46.

Barbosa, P. A.

(1994) Caractérisation et génération automatique de la structuration rythmique du français (Unpublished doctoral dissertation). Institut National Polytechnique de Grenoble, France).

(1996) At least two macrorhythmic units are necessary for modeling Brazilian Portuguese duration: Emphasis on segmental duration generation. Cadernos de Estudos Linguísticos, 31, 33–53.

(2006) Incursões em torno do ritmo da fala. Campinas: Pontes.

(2007) From syntax to acoustic duration: A dynamical model of speech rhythm production. Speech Communication, 49, 725–742.

(2010) Automatic duration-related salience detection in Brazilian Portuguese read and spontaneous speech. Proceedings of the Speech Prosody 2010 Conference 10–14 May, Chicago, IL.

Boersma, P. & Weenink, D.

(2017) Praat: Doing phonetics by computer (Version 6.0.29) [Computer software]. Retrieved from [URL]

Botinis, A., Granström, B., & Möbius, B.

(2001) Developments and paradigms in intonation research. Speech Communication, 33, 263–296.

Campbell, N.

(1993) Automatic detection of prosodic boundaries in speech. Speech Communication, 13(3–4), 343–354.

Chistovich, L. A., & Ogorodnikova, E. A.

(1982) Temporal processing of spectral data in vowel perception. Speech Communication, 1, 45–54.

Cresti, E.

(2000) Corpus di italiano parlato (Vol. 1). Florence: Accademia della Crusca.

Cummins, F., & Port, R.

(1998) Rhythmic constraints on stress timing in English. J. Phon, 26, 145–171.

Eriksson, A., & Heldner, M.

(2015) The acoustics of word stress in English as a function of stress level and speaking style. Proc. of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), Dresden, Germany, 41–45.

Godfrey, J. J., Holliman, E. C., & McDaniel, J.

(1992) SWITCHBOARD: Telephone speech corpus for research and development. Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 517–520.

Gotoy, Y., & Renals, S.

(2000) Sentence boundary detection in broadcast speech transcripts. Proc. of the International Speech Communication Association (ISCA) Workshop: Automatic Speech Recognition: Challenges for the New Millennium (ASR-2000), Paris.

Kim, J.

(2004) Automatic detection of sentence boundaries, disfluencies, and conversational fillers in spontaneous speech (Unpublished doctoral dissertation). University of Washington. Retrieved from [URL]

Lacheret-Dujour, A., Simon, A., Goldman, J., & Avanzi, M.

(2013) Prominence perception and accent detection in French: From phonetic processing to grammatical analysis. Language Sciences, 39, 95–106.

Mettouchi, A., Lacheret-Dujour, A., Silber-Varod, V., & Izre’el, S.

(2007) Only prosody? Perception of speech segmentation in Kabyle and Hebrew. Nouveaux Cahiers de Linguistique Française, 28, 207–218.

Mittman, M. M., & Barbosa, P. A.

(2016) An automatic speech segmentation tool based on multiple acoustic parameters. CHIMERA. Romance Corpora and Linguistic Studies, 3(2), 133–147.

Ni, C. J., Zhang, A. Y., Liu, W. J., & Xu, B.

(2012) Automatic prosodic break detection and feature analysis. J. Comput. Sci. Technol., 27, 1184–1196.

Raso, T., Barbosa, P. A., Cavalcante, F. A., & Mittmann, M. M.

this volume). Segmentation and analysis of the two English excerpts: The Brazilian team proposal. In S. Izre’el, H. Mello, A. Panunzi, & T. Raso Eds. In search of basic units of spoken language: A corpus-driven approach Amsterdam John Benjamins

Shriberg, E., Stolcke, A., Hakkani-Tür, D., & Tür, G.

(2000) Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1), 127–154.

Tamburini, F., & Wagner, P.

(2007) On automatic prominence detection for German. Proc. of the 8th Annual Conference of the International Speech Communication Association (INTERSPEECH 2007), (pp. 1809–1812). Antwerp, Belgium.

Teixeira, B., Barbosa, P., & Raso, T.

(2018) Automatic detection of prosodic boundaries in Brazilian Portuguese spontaneous speech. In A. Villavicencio, M. Viviane, A. Abad, H. Caseli, P. Gamallo, C. Ramisch, H. R. Gonçalo Oliveira & G. H. Paetzold (Eds.), Computational processing of the Portuguese language. PROPOR 2018 (pp. 429–437). Canela, Brazil. Cham: Springer.

Wightman, C. W., Shattuck-Hufnagel, S., Ostendorf, M., & Price, P.

(1992) Segmental durations in the vicinity of prosodic phrase boundaries. J. Acoust. Soc. Am., 91, 1707–1717.

audio

Cited by (1)

Cited by 1 other publications

Izre'el, Shlomo, Heliana Mello, Alessandro Panunzi & Tommaso Raso

2020. Introduction. In search of a basic unit of spoken language. In In Search of Basic Units of Spoken Language [Studies in Corpus Linguistics, 94], ► pp. 1 ff.

This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.

Chapter 9Cross-linguistic comparison of automatic detection of speech breaks in read and narrated speech in four languages

Cited by 1 other publications

Chapter 9
Cross-linguistic comparison of automatic detection of speech breaks in read and narrated speech in four languages