Properties of the physical world have shaped human evolutionary design and given rise to physically grounded mental representations. These grounded representations provide the foundation for higher level cognitive processes including language. Most natural language processing machines to date lack grounding. This paper advocates the creation of physically grounded language learning machines as a path toward scalable systems which can conceptualize and communicate about the world in human-like ways. As steps in this direction, two experimental language acquisition systems are presented.
The first system, CELL, is able to learn acoustic word forms and associated shape and color categories from fluent untranscribed speech paired with video camera images. In evaluations, CELL has successfully learned from spontaneous infant-directed speech. A version of CELL has been implemented in a robotic embodiment which can verbally interact with human partners.
The second system, DESCRIBER, acquires a visually-grounded model of natural language which it uses to generate spoken descriptions of objects in visual scenes. Input to DESCRIBER’s learning algorithm consists of computer generated scenes paired with natural language descriptions produced by a human teacher. DESCRIBER learns a three-level language model which encodes syntactic and semantic properties of phrases, word classes, and words. The system learns from a simple ‘show-and-tell’ procedure, and once trained, is able to generate semantically appropriate, contextualized, and syntactically well-formed descriptions of objects in novel scenes.
2004. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., ► pp. 827 ff.
Bauckhage, C., S. Wachsmuth, M. Hanheide, S. Wrede, G. Sagerer, G. Heidemann & H. Ritter
2008. The visual active memory perspective on integrated recognition systems. Image and Vision Computing 26:1 ► pp. 5 ff.
Heath, Scott, David Ball & Janet Wiles
2016. Lingodroids: Cross-Situational Learning for Episodic Elements. IEEE Transactions on Cognitive and Developmental Systems 8:1 ► pp. 3 ff.
Heidemann, Gunther, Ingo Bax & Holger Bekel
2004. Proceedings of the 6th international conference on Multimodal interfaces, ► pp. 53 ff.
Jamieson, M., S. Dickinson, S. Stevenson & S. Wachsmuth
2006. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR'06), ► pp. 2102 ff.
Jung-Hoon Hwang, KangWoo Lee & Dong-Soo Kwon
2005. ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005., ► pp. 623 ff.
Knowles, Michael John & Stefan Wermter
2008. 2008 Eighth International Conference on Hybrid Intelligent Systems, ► pp. 404 ff.
MCCLAIN, MATTHEW & STEPHEN LEVINSON
2007. SEMANTIC BASED LEARNING OF SYNTAX IN AN AUTONOMOUS ROBOT. International Journal of Humanoid Robotics 04:02 ► pp. 321 ff.
Mingo, Jack Mario & Ricardo Aler
2016. A competence-performance based model to develop a syntactic language for artificial agents. Information Sciences 373 ► pp. 79 ff.
Mukerjee, Amitabha & Madan Mohan Dabbeeru
2012. Grounded discovery of symbols as concept–language pairs. Computer-Aided Design 44:10 ► pp. 901 ff.
Rasheed, Nadia & Shamsudin H. M. Amin
2016. Developmental and Evolutionary Lexicon Acquisition in Cognitive Agents/Robots with Grounding Principle: A Short Review. Computational Intelligence and Neuroscience 2016 ► pp. 1 ff.
Roy, Deb K.
2002. Learning visually grounded words and syntax for a scene description task. Computer Speech & Language 16:3-4 ► pp. 353 ff.
Steels, Luc
2003. The Evolution of Communication Systems by Adaptive Agents. In Adaptive Agents and Multi-Agent Systems [Lecture Notes in Computer Science, 2636], ► pp. 125 ff.
Steels, Luc
2003. Evolving grounded communication for robots. Trends in Cognitive Sciences 7:7 ► pp. 308 ff.
Tikhanoff, Vadim, Angelo Cangelosi & Giorgio Metta
2011. Integration of Speech and Action in Humanoid Robots: iCub Simulation Experiments. IEEE Transactions on Autonomous Mental Development 3:1 ► pp. 17 ff.
Wachsmuth, Sven, Sebastian Wrede & Marc Hanheide
2007. Coordinating interactive vision behaviors for cognitive assistance. Computer Vision and Image Understanding 108:1-2 ► pp. 135 ff.
This list is based on CrossRef data as of 29 march 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.