STEER: Assessing the Economic Rationality of Large Language Models

Narun Raman; Taylor Lundy; Samuel Amouyal; Yoav Levine; Kevin Leyton-Brown; Moshe Tennenholtz

STEER: Assessing the Economic Rationality of Large Language Models

Fiche du document

Auteurs

Date

14 février 2024

Discipline

Economies et finances

Type de document

Textes imprimés

Périmètre

Publications

Identifiant

2402.09552

Source

arXiv - économie

Collection

arXiv

Organisation

Cornell University

Mots-clés Und

Computer Science - Computation and Language Economics - General Economics

Sujets proches En

Language (New words, slang, etc.) Pattern Model Economic theory Political economy

Citer ce document

Narun Raman et al., « STEER: Assessing the Economic Rationality of Large Language Models », arXiv - économie

Partage / Export

Résumé 0

There is increasing interest in using LLMs as decision-making "agents." Doing so includes many degrees of freedom: which model should be used; how should it be prompted; should it be asked to introspect, conduct chain-of-thought reasoning, etc? Settling these questions -- and more broadly, determining whether an LLM agent is reliable enough to be trusted -- requires a methodology for assessing such an agent's economic rationality. In this paper, we provide one. We begin by surveying the economic literature on rational decision making, taxonomizing a large set of fine-grained "elements" that an agent should exhibit, along with dependencies between them. We then propose a benchmark distribution that quantitatively scores an LLMs performance on these elements and, combined with a user-provided rubric, produces a "STEER report card." Finally, we describe the results of a large-scale empirical experiment with 14 different LLMs, characterizing the both current state of the art and the impact of different model sizes on models' ability to exhibit rational behavior.

STEER: Assessing the Economic Rationality of Large Language Models

Fiche du document

Mots-clés Und

Sujets proches En

Citer ce document

Partage / Export

Résumé 0

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en