arxiv:2305.18654

Faith and Fate: Limits of Transformers on Compositionality

Published on May 29, 2023

· Submitted by

AK on May 31, 2023

#2 Paper of the day

Upvote

Authors:

Nouha Dziri ,

Melanie Sclar ,

Xiang Lorraine Li ,

Bill Yuchen Lin ,

Peter West ,

Chandra Bhagavatula ,

Jena D. Hwang ,

Soumya Sanyal ,

Allyson Ettinger ,

Yejin Choi

Abstract

Transformers solve compositional tasks by reducing multi-step reasoning to linearized subgraph matching, leading to performance decay with increased complexity.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify Transformers, we investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that Transformers solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how Transformers' performance will rapidly decay with increased task complexity.