Multi-modal referring expressions in human-human task descriptions and their implications for human-robot interaction
Stephanie Gross | Austrian Research Institute for Artificial Intelligence (OFAI)
Brigitte Krenn | Austrian Research Institute for Artificial Intelligence (OFAI)
Human instructors often refer to objects and actions involved in a task description using both linguistic and non-linguistic means of communication. Hence, for robots to engage in natural human-robot interactions, we need to better understand the various relevant aspects of human multi-modal task descriptions. We analyse reference resolution to objects in a data collection comprising two object manipulation tasks (22 teacher student interactions in Task 1 and 16 in Task 2) and find that 78.76% of all referring expressions to the objects relevant in Task 1 are verbally underspecified and 88.64% of all referring expressions are verbally underspecified in Task 2. The data strongly suggests that a language processing module for robots must be genuinely multi-modal, allowing for seamless integration of information transmitted in the verbal and the visual channel, whereby tracking the speaker’s eye gaze and gestures as well as object recognition are necessary preconditions.
Article outline
- 1.Introduction
- 2.Background and related work
- 2.1Multi-modal reference resolution in human-human interaction
- 2.1.1Variation in language
- 2.1.2Gesture, gaze and language
- 2.2Computational approaches to multi-modal reference resolution
- 3.Data collection experiments and research questions
- 3.1Task 1
- 3.2Task 2
- 3.3Data collection
- 3.4Participants and technical tools employed in data analysis
- 3.5Research questions
- 4.Results
- 4.1RQ1 – Variation of referring expressions per object
- 4.2RQ2 – Underspecified verbal referring expressions
- 4.2.1Verbal part of referring expressions
- 4.2.2Verbal part of initial references
- 4.2.3Pronoun resolution
- 4.3RQ3 – Multi-modality of referring expressions
- 5.Analysis and challenges
- 5.1Challenge 1 – variation of expressions referring to one specific object
- 5.2Challenge 2 – underspecified verbal referring expressions
- 5.3Challenge 3 – multi-modality of referring expressions
- 5.4Lessons for agent design
- 5.4.1Variation of expressions referring to one specific object
- 5.4.2Underspecified verbal referring expressions
- 5.4.3Multi-modality of referring expressions
- 6.Conclusion, limitations, and future work
- Acknowledgements
- Notes
-
References