Chapter 12
From pitch stylization to automatic tonal annotation of speech corpora
This chapter proposes a labeling scheme for pitch-related aspects of speech prosody and describes an automatic annotation system using this scheme.
In the labelling scheme, the fine-grained transcription provides labels indicating the pitch level and pitch movement of individual syllables. The pitch levels “bottom” and “top” indicate the boundaries of the speaker’s pitch range. Three additional levels – “low”, “mid”, “high” – are defined on the basis of pitch changes in the local context. For pitch movements, both simple and compound, the transcription indicates direction (rise, fall, level) and size (large and small melodic intervals), using size categories adjusted to the speaker’s pitch range.
The automatic tonal annotation system combines several processing steps: segmentation into syllabic nuclei, pause detection, pitch stylization, pitch range estimation, pitch movement classification, and pitch level assignment. It uses a rule-based procedure, which unlike commonly used supervized learning techniques does not require a labelled corpus to train the model.
Article outline
- 1.Introduction
- 2.The proposed convention for tonal annotation
- 2.1Pitch levels
- 2.2Pitch intervals
- 2.3Symbols used in the annotation
- 3.Automatic labeling of prosody
- 4.The procedure for automatic tonal annotation
- 4.1Parameter extraction
- 4.2Segmentation into syllabic nuclei
- 4.3Detection of pauses
- 4.4Pitch stylization
- 4.5Automatic detection of the pitch range of a speaker
- 4.6Syllable-internal pitch movements
- 4.7.1Pitch level detection based on pitch span
- 4.7.2Pitch level detection based on local pitch change
- 4.7.3Pitch level inferred from intra-syllabic pitch movements
- 4.7.4Extrapolating pitch level information
- 4.7.5Pitch levels for plateaus
- 5.The resulting tonal annotation
- 6.Discussion and conclusion