Neural Network Implementation of Gaze-Target Prediction for Human-Robot Interaction

Gaze cues, which initiate an action or behaviour, are necessary for a responsive and intuitive interaction. Using gaze to signal intentions or request an action during conversation is conventional. We propose a new approach to estimate gaze using a neural network architecture, while considering the dynamic patterns of real world gaze behaviour in natural interaction. The main goal is to provide foundation for robot/avatar to communicate with humans using natural multimodal-dialogue. Currently, robotic gaze systems are reactive in nature but our Gaze-Estimation framework can perform unified gaze detection, gaze-object prediction and object-landmark heatmap in a single scene, which paves the way for a more proactive approach. We generated 2.4M gaze predictions of various types of gaze in a more natural setting (GHIGaze). The predicted and categorised gaze data can be used to automate contextualized robotic gaze-tracking behaviour in interaction. We evaluate the performance on a manually annotated data set and a publicly available gaze-follow dataset. Compared to previously reported methods our model performs better with the closest angular error to that of a human annotator. As future work, we propose an implementable gaze architecture for a social robot from Furhat robotics
Research areas:
Type of Publication:
In Proceedings
Book title:
Proceedings of the 32nd IEEE International Conference on Robot and Human Interactive Communication
Hits: 248