Chapter 9
Cross-linguistic comparison of automatic detection of speech breaks in read and
narrated speech in four languages
Plínio A. Barbosa |
State University of Campinas, Institute for Language Studies, CNPq
This chapter tests an algorithm for the automatic detection of
speech breaks in read and narrated speech in Brazilian Portuguese (BP), European Portuguese
(EP), French, and German. The algorithm is independent of previous transcription or
linguistic analysis (syllable, phone labeling and segmentation), requiring only the audio
file. It operates in two stages: vowel onsets detection firstly, followed by V-to-V duration
intervals normalization for smoothed duration z-scores. Peaks over 2.5 of the latter were
considered speech breaks. Compared to human segmentation, hits for reading (70%) were higher
than for narration (60%). Crosslinguistic results show EP and French having the highest
proportion of hits. A test with the English Navy audio file reveals a hit
proportion similar to German.
Article outline
- 1.Introduction
- 2.Methodology
- 2.1Corpus
- 2.2The SalienceDetector script
- 3.Results
- 3.1Testing with English spontaneous speech
- 4.Conclusions
-
Notes
-
References
References
Avanzi, M., Lacheret, A., & Victorri, B.
(
2008)
Analor,
un outil d’aide pour la modélisation de l’interface
prosodie-grammaire.
Travaux Linguistiques du
CerLiCO, 21, 27–46.

Barbosa, P. A.
(
1994)
Caractérisation
et génération automatique de la structuration rythmique du
français (
Unpublished doctoral
dissertation). Institut National Polytechnique de Grenoble, France).

Barbosa, P. A.
(
1996)
At
least two macrorhythmic units are necessary for modeling Brazilian Portuguese duration:
Emphasis on segmental duration generation.
Cadernos de
Estudos
Linguísticos, 31, 33–53.

Barbosa, P. A.
(
2006)
Incursões
em torno do ritmo da
fala. Campinas: Pontes.

Barbosa, P. A.
(
2007)
From
syntax to acoustic duration: A dynamical model of speech rhythm
production.
Speech
Communication, 49, 725–742.


Barbosa, P. A.
(
2010)
Automatic
duration-related salience detection in Brazilian Portuguese read and spontaneous
speech.
Proceedings of the Speech Prosody 2010
Conference 10–14 May, Chicago, IL.

Boersma, P. & Weenink, D.
(
2017)
Praat:
Doing phonetics by computer (
Version
6.0.29) [Computer software]. Retrieved
from
[URL]
Botinis, A., Granström, B., & Möbius, B.
(
2001)
Developments
and paradigms in intonation research.
Speech
Communication, 33, 263–296.


Campbell, N.
(
1993)
Automatic
detection of prosodic boundaries in speech.
Speech
Communication, 13(3–4), 343–354.


Chistovich, L. A., & Ogorodnikova, E. A.
(
1982)
Temporal
processing of spectral data in vowel perception.
Speech
Communication, 1, 45–54.


Cresti, E.
(
2000)
Corpus
di italiano
parlato (Vol. 1). Florence: Accademia della Crusca.

Cummins, F., & Port, R.
(
1998)
Rhythmic
constraints on stress timing in English.
J.
Phon, 26, 145–171.


Eriksson, A., & Heldner, M.
(
2015)
The
acoustics of word stress in English as a function of stress level and speaking
style.
Proc. of the 16th Annual Conference of the
International Speech Communication Association (INTERSPEECH
2015), Dresden, Germany, 41–45.

Godfrey, J. J., Holliman, E. C., & McDaniel, J.
(
1992)
SWITCHBOARD:
Telephone speech corpus for research and development.
Proc.
of the IEEE International Conference on Acoustics, Speech, and Signal
Processing, 1, 517–520.

Gotoy, Y., & Renals, S.
(
2000)
Sentence
boundary detection in broadcast speech transcripts.
Proc.
of the International Speech Communication Association (ISCA) Workshop: Automatic Speech
Recognition: Challenges for the New
Millennium (
ASR-2000), Paris.

Kim, J.
(
2004)
Automatic
detection of sentence boundaries, disfluencies, and conversational fillers in spontaneous
speech (Unpublished doctoral
dissertation). University of Washington. Retrieved from
[URL]
Lacheret-Dujour, A., Simon, A., Goldman, J., & Avanzi, M.
(
2013)
Prominence
perception and accent detection in French: From phonetic processing to grammatical
analysis.
Language
Sciences, 39, 95–106.


Mettouchi, A., Lacheret-Dujour, A., Silber-Varod, V., & Izre’el, S.
(
2007)
Only
prosody? Perception of speech segmentation in Kabyle and
Hebrew.
Nouveaux Cahiers de Linguistique
Française, 28, 207–218.

Mittman, M. M., & Barbosa, P. A.
(
2016)
An
automatic speech segmentation tool based on multiple acoustic
parameters.
CHIMERA. Romance Corpora and Linguistic
Studies, 3(2), 133–147.

Ni, C. J., Zhang, A. Y., Liu, W. J., & Xu, B.
(
2012)
Automatic
prosodic break detection and feature analysis.
J. Comput.
Sci.
Technol., 27, 1184–1196.


Raso, T., Barbosa, P. A., Cavalcante, F. A., & Mittmann, M. M.
this
volume).
Segmentation and analysis of the two English
excerpts: The Brazilian team
proposal. In
S. Izre’el,
H. Mello,
A. Panunzi, &
T. Raso Eds.
In
search of basic units of spoken language: A corpus-driven
approach Amsterdam John Benjamins
Shriberg, E., Stolcke, A., Hakkani-Tür, D., & Tür, G.
(
2000)
Prosody-based
automatic segmentation of speech into sentences and
topics.
Speech
Communication, 32(1), 127–154.


Tamburini, F., & Wagner, P.
(
2007)
On
automatic prominence detection for German.
Proc. of the 8th
Annual Conference of the International Speech Communication Association (INTERSPEECH
2007), (pp. 1809–1812). Antwerp, Belgium.

Teixeira, B., Barbosa, P., & Raso, T.
(
2018)
Automatic
detection of prosodic boundaries in Brazilian Portuguese spontaneous
speech. In
A. Villavicencio,
M. Viviane,
A. Abad,
H. Caseli,
P. Gamallo,
C. Ramisch,
H. R.
Gonçalo Oliveira &
G. H. Paetzold (Eds.),
Computational
processing of the Portuguese language. PROPOR
2018 (pp. 429–437). Canela, Brazil. Cham: Springer.


Wightman, C. W., Shattuck-Hufnagel, S., Ostendorf, M., & Price, P.
(
1992)
Segmental
durations in the vicinity of prosodic phrase boundaries.
J.
Acoust. Soc.
Am., 91, 1707–1717.


audio
bp_np_nr
bp_np_re
bp_ra_nr
bp_ra_re
ep_am_nr
ep_am_re
ep_ar_nr
ep_ar_re
fr_ca_nr
fr_ca_re
fr_ma_nr
fr_ma_re
ge_s5_nr
ge_s5_re
ge_s6_nr
ge_s6_re
Cited by
Cited by 1 other publications
Izre'el, Shlomo, Heliana Mello, Alessandro Panunzi & Tommaso Raso
This list is based on CrossRef data as of 6 december 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.