Counterfactual reasoning capabilities of GPT: Preliminary findings

Recently, there has been a large interest in large language models (LLMs) such as GPT and their ability to engage in human-like dialogue and use commonsense reasoning. We experimentally investigate specific aspects of these abilities, namely counterfactual reasoning and explanations. These abilities are particularly important when using LLMs to assist high-stake decisions and assessments such as credit approval or medical diagnostics. For example, if a loan applicant is denied credit, a counterfactual explanation conveys the conditions under which the credit would have been granted. By injecting a decision-making algorithm into the model’s prompt and systematically probing and annotating responses for carefully chosen inputs, we study potential patterns in GPT’s selection of counterfactual examples. Preliminary results indicate that when GPT 3.5 provides counterfactual explanations, it does not consider causal relations between variables in a way that one would expect from a model with strong commonsense reasoning capabilities. We discuss potential implications of these results for real-world applications and future research.
Research areas:
Type of Publication:
In Proceedings
Book title:
Proceedings of the 18th SweCog Conference
Hits: 314