Building a talking baby robot: A contribution to the study of speech acquisition and evolution

Serkhane, Jihène; Schwartz, Jean-Luc; Bessière, Pierre

doi:10.1075/is.6.2.06ser

Article published In:

Vocalize to Localize II
Edited by Christian Abry, Anne Vilain and Jean-Luc Schwartz
[Interaction Studies 6:2] 2005
► pp. 253–286

Building a talking baby robot

A contribution to the study of speech acquisition and evolution

Jihène Serkhane | ICP, Grenoble

Jean-Luc Schwartz | ICP, Grenoble

Pierre Bessière | Laplace-SHARP, Gravir, Grenoble

Speech is a perceptuo-motor system. A natural computational modeling framework is provided by cognitive robotics, or more precisely speech robotics, which is also based on embodiment, multimodality, development, and interaction. This paper describes the bases of a virtual baby robot which consists in an articulatory model that integrates the non-uniform growth of the vocal tract, a set of sensors, and a learning model. The articulatory model delivers sagittal contour, lip shape and acoustic formants from seven input parameters that characterize the configurations of the jaw, the tongue, the lips and the larynx. To simulate the growth of the vocal tract from birth to adulthood, a process modifies the longitudinal dimension of the vocal tract shape as a function of age. The auditory system of the robot comprises a “phasic” system for event detection over time, and a “tonic” system to track formants. The model of visual perception specifies the basic lips characteristics: height, width, area and protrusion. The orosensorial channel, which provides the tactile sensation on the lips, the tongue and the palate, is elaborated as a model for the prediction of tongue-palatal contacts from articulatory commands. Learning involves Bayesian programming, in which there are two phases: (i) specification of the variables, decomposition of the joint distribution and identification of the free parameters through exploration of a learning set, and (ii) utilization which relies on questions about the joint distribution.

Two studies were performed with this system. Each of them focused on one of the two basic mechanisms, which ought to be at work in the initial periods of speech acquisition, namely vocal exploration and vocal imitation. The first study attempted to assess infants’ motor skills before and at the beginning of canonical babbling. It used the model to infer the acoustic regions, the articulatory degrees of freedom and the vocal tract shapes that are the likeliest explored by actual infants according to their vocalizations. Subsequently, the aim was to simulate data reported in the literature on early vocal imitation, in order to test whether and how the robot was able to reproduce them and to gain some insights into the actual cognitive representations that might be involved in this behavior.

Speech modeling in a robotics framework should contribute to a computational approach of sensori-motor interactions in speech communication, which seems crucial for future progress in the study of speech and language ontogeny and phylogeny.

Keywords: sensori-motor exploration, Bayesian robotics, vocal imitation, speech robotics, speech development

Published online: 30 September 2005

https://doi.org/10.1075/is.6.2.06ser

Cited by (11)

Cited by 11 other publications

Order by:

Orr, Edna

2022. Mouthing and fingering supports vocal behaviors development. Early Child Development and Care 192:2 ► pp. 331 ff.

ter Haar, Sita M., Ahana A. Fernandez, Maya Gratier, Mirjam Knörnschild, Claartje Levelt, Roger K. Moore, Michiel Vellema, Xiaoqin Wang & D. Kimbrough Oller

2021. Cross-species parallels in babbling: animals and algorithms. Philosophical Transactions of the Royal Society B: Biological Sciences 376:1836

Moulin-Frier, Clément, Julien Diard, Jean-Luc Schwartz & Pierre Bessière

2015. COSMO (“Communicating about Objects using Sensory–Motor Operations”): A Bayesian modeling framework for studying speech communication and the emergence of phonological systems. Journal of Phonetics 53 ► pp. 5 ff.

Howard, Ian S., Piers Messum & Johan J. Bolhuis

2014. Learning to Pronounce First Words in Three Languages: An Investigation of Caregiver and Infant Behavior Using a Computational Model of an Infant. PLoS ONE 9:10 ► pp. e110334 ff.

Moulin-Frier, Clement & Pierre-Yves Oudeyer

2012. 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL), ► pp. 1 ff.

Schwartz, Jean-Luc, Anahita Basirat, Lucie Ménard & Marc Sato

2012. The Perception-for-Action-Control Theory (PACT): A perceptuo-motor theory of speech perception. Journal of Neurolinguistics 25:5 ► pp. 336 ff.

Gilet, Estelle, Julien Diard, Pierre Bessière & Olaf Sporns

2011. Bayesian Action–Perception Computational Model: Interaction of Production and Recognition of Cursive Letters. PLoS ONE 6:6 ► pp. e20387 ff.

Diard, Julien, Estelle Gilet, Éva Simonin & Pierre Bessière

2010. Incremental learning of Bayesian sensorimotor models: from low-level behaviours to large-scale structure of the environment. Connection Science 22:4 ► pp. 291 ff.

Milenkovic, Paul H., Srikanth Yaddanapudi, Houri K. Vorperian & Raymond D. Kent

2010. Effects of a curved vocal tract with grid-generated tongue profile on low-order formants. The Journal of the Acoustical Society of America 127:2 ► pp. 1002 ff.

Boë, Louis-Jean, Jean-Louis Heim, Kiyoshi Honda, Shinji Maeda, Pierre Badin & Christian Abry

2007. The vocal tract of newborn humans and Neanderthals: Acoustic capabilities and consequences for the debate on the origin of language. A reply to Lieberman (2007a). Journal of Phonetics 35:4 ► pp. 564 ff.

Serkhane, J.E., J.L. Schwartz, L.J. Boë, B.L. Davis & C.L. Matyear

2007. Infants’ vocalizations analyzed with an articulatory model: A preliminary report. Journal of Phonetics 35:3 ► pp. 321 ff.

This list is based on CrossRef data as of 12 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.