Artificial intelligence, since the heyday of ChatGPT two years ago, seems to reach new heights every day. AI solutions have multiplied (Claude 3 from Anthropic, Gemini from Google, the Apple Intelligence suite, etc.), the models are increasingly coherent, creative and efficient. Some systems can generate sounds and videos (like Sora from OpenAI), help with drug design, optimize supply chains or take care of financial risk management.
It is a revolutionary technology par excellence, as was the mastery of electricity in the 19th century or the invention of the wheel in the Neolithic period 5,500 years ago. However, an experiment carried out by a team of researchers led by Yuan Gao at Boston University tempers this observation. These put the strategic reasoning capabilities of today’s largest language models to the test, with surprisingly modest results. The results of their work were published on November 13 on ArXiv.
The Shekel Game: A Revealer of the Depths of Thought
THE ” game 11-20 » stands out in the arsenal of behavioral economists as a remarkable tool to probe the mechanisms of strategic thinking. Its deceptively simple principle hides a veritable laboratory of decision-making dynamics: two participants must choose an amount between 11 and 20 shekels (Israeli currency), with the guarantee of receiving the requested sum. The crucial element is the 20 shekel bonus awarded to the player who demands exactly one shekel less than his opponent.
This mechanism generates a pyramid of reasoningwhich game theorists have called “ level-k reasoning “. At the most basic level (level 0), a player chooses 20 shekels without strategic thinking. A level 1 player, anticipating this naive choice, opts for 19, thus securing the bonus. Level 2 pushes the reasoning further: anticipating that the opponent will play level 1 (19), he chooses 18. This progression can theoretically continue up to 11, creating a spiral of deductions where each level integrates and exceeds the strategy of the previous level.
The beauty of this game lies in its ability to reveal the depth of human strategic reasoning. Players must not only anticipate their opponent’s behavior, but also evaluate their level of strategic complexity. Most humans naturally stop at intermediate levels (like 17 or 16), reflecting some form of balance between cognitive subtlety and pragmatism. This trend demonstrates an intuitive understanding that pushing reasoning too far can be counterproductive, because few opponents reach these extreme levels of thinking.
AI models: superficial strategists
Yuan Gao’s team subjected the latest AI models to a thousand games of this game, under varying conditions. The results reveal a fundamental limitation: even the most advanced systems like GPT-4 stick to basic strategies.
While human players demonstrate an intuitive understanding of social dynamics by choosing intermediate values like 17, AIs remain stuck on basic choices (19 or 20), demonstrating an inability to develop more sophisticated strategies. Another interesting fact: Their responses vary inconsistently depending on irrelevant factors, like the language used in the instructions.
The persistent gap between imitation and understanding
What fundamental distinction between human and artificial intelligence does this study highlight? The human brain naturally integrates a multitude of factors : past experiences, emotions, social intuitions, desire to win and ability to put oneself in the other’s place.
In contrast, language models, despite their apparent sophistication, function essentially as text prediction systems, lacking a real understanding of strategic issues. In this sense, they are not provided with real intelligencethey are simply algorithms. Very powerful, certainly, but mechanical and based on statistical rules. The parrot metaphor used by the researchers perfectly illustrates this limitation: even if these birds can reproduce complex sentences, they do not understand their deeper meaning.
Currently, many companies are considering replacing traditional human panels with AI systems to test their products, advertising campaigns or market strategies. This transition would promise substantial savings: no more need to recruit, pay and coordinate hundreds of human participants for each study.
However, the results of the game 11-20 ring like a warning. If the most advanced AI models fail to replicate the complexity of human reasoning in such a basic situation, how could they faithfully predict more complex behaviors ? The hegemony of AI is probably not for tomorrow, which confirms the work carried out by the Center for Applied Ethics at UMass Boston University which you told us about in this article. However, nothing is set in stone, and this observation could very quickly change in view of the progress made in the sector over the last decade.
- A study on Game 11-20 reveals that the most advanced AI models, like GPT-4, lack strategic depth and are limited to basic choices.
- Unlike humans, AIs do not understand the social and intuitive complexity of decisions.
- These models, for the moment, remain simple predictive tools devoid of understanding.