Human interaction is deceptively simple to engage in, yet surprisingly challenging to account for theoretically. Existing theories of language and cognition cannot fully account for the complex dynamics of verbal and non-verbal behaviours in interaction, which is becoming even more apparent with our increasing use of computer-mediated communication, such as the currently ubiquitous Zoom calls.

With DivCon, my vision is to transform our basic understanding of human interaction by showing how successful dialogue is driven by incremental, local and dynamic processes of mismatch management.

In our everyday interactions, we continuously make predictions about what will happen next, based on how our own and others' behaviour a ects the world, to open up new possible courses of action. In dialogue, these predictions are about sounds, words, inferences and even non-speech actions such as gestures or eye gaze. If our expectations are not met, we have to ascertain if the mismatching input can be resolved, or integrated as a surprising but rewarding outcome (as in the case of humour).

DivCon will produce a suite of corpus and experimental data for exploring the timely issues of communication via di fferent forms of computer-mediated communication, including text-based chats, video calls and virtual reality meetings. To do this, the project will create a novel experimental platform for experiments in real time live multimodal interactions using avatars and virtual reality. The formal arm of the project will develop a precise theory of divergence and convergence in interaction which uni tes verbal and non-verbal dialogue phenomena including gesture, gaze, feedback and laughter, using core notions of prediction and underspecifi cation. This model will be implementable in conversational AI -- an important step in the path to genuinely adaptive conversational AI systems, which are still beyond the reach of researchers despite the promise of recent decades.