If you have a change in your birthdate: Assessing counterfactual dialogue capabilities of large language models

Abstract:

Large language models (LLMs) are increasingly being used to provide reasons for decisions, but their ability to engage in human-like dialogue and use commonsense reasoning has been called into question. These aspects are important when using LLMs to assist high-stake decisions and assessments such as credit approval. For example, if a loan applicant is denied credit, a counterfactual explanation may state that the application would be granted if the applicant’s income increased to a certain amount. By injecting a decision-making algorithm into the LLM prompt and systematically probing and annotating responses for carefully chosen inputs, we study potential patterns in the model’s selection of counterfactual explanations. Specifically, we assess notions of actionability acquired by the LLM during pre-training and how such notions are applied by the LLM in natural-language dialogue. The studied notions encompass mutability (e.g. that income can be changed while country of birth cannot), monotonicity (e.g. that age can only change in one direction), and causal dependencies between features (e.g. that duration of residence cannot be increased without also increasing age). Results for the two most recent versions of GPT show that in one studied aspect (mutability), both versions of GPT are well-aligned, while in another aspect (monotonicity), only GPT 4 is well-aligned. Finally, in the third aspect (causal dependencies), none of the versions of GPT are well-aligned. The experiments also suggest that misalignments are primarily due to problems in language generation rather than inherent properties of the models.

Research areas:

Year:

2025

Type of Publication:

In Proceedings

Book title:

Human and Artificial Rationalities (HAR), Lecture Notes in Computer Science

Digital version [Bibtex]

Back