On Analogy-Making in Large Language Models

Jan 3, 2023

A response to "Emergent Analogical Reasoning in Large Language Models" by Webb et al.

10 Comments

Mar 30, 2023

Hi Melanie, thank you for this analysis. I am only now getting into the prior literature of analogical reasoning and generalization although I have used Gentner and Forbus' SME and SAGE systems within a larger cognitive reasoning framework.

I have a few questions about the experiment setup and evaluation:

1. A lot of the patterns don't provide enough information to induce a rule. A single example is not sufficient to determine which rule is the right one to answer a question (you noted it as well). Are there questions in which there are multiple examples that constrain the hypothesis space such that there is no ambiguity? Can humans recognize that there are several applicable rules? Do you think that recognition of multiple hypotheses and asking for more information is a unique human skill?

2. The 'better than humans' claim is typically made by AI scientists who are looking at accuracy metrics. I was surprised to similar claims by psychologists. I can see that several humans were much better than ChanGPT (especially in the story analogy case), the average is lower but the spread is really big. Is it standard to compare averages of human behavior?

Expand full comment

Reply (1)

Melanie Mitchell

Apr 3, 2023

Good questions. I think to answer #1 more probing is needed than was done in the Webb et al. paper. For #2, I think you're quite right -- it's questionable how to interpret averages of human behavior.

Expand full comment

Slade Winstone

Aug 27

"However, I did try one of my favorites: abc —>abd, xyz —> ? GPT-3 returned the strange answer xye."

I can't help wondering if GPT-3 was applying the “basic successor” three-letter-string pattern to the phonetic spelling of 'z' as 'zee' to arrive at 'e'.

Expand full comment

Mike

May 14, 2023

Hi Melanie. Did you try prompts that encourage chain of thought reasoning or self-reflection on the initial answer? It seems like performance on some of these tasks might benefit from that.

Expand full comment

Reply (1)

Melanie Mitchell

May 14, 2023

No, and I don't think Webb et al. tried this either. Feel free to try & report back. The problem, of course, is that these specific problems may have made it into the training set of these systems.

Expand full comment

Martin

Jan 9, 2023

I tried the letter-string analogies and got wrong answer and wrong explanation:

Here is a possible solution to the riddle:

mno -> mnp, ijk -> ilk

The pattern here is to change the last letter of the first word to the next letter in the alphabet, and to change the second letter of the second word to the next letter in the alphabet.

Expand full comment

Fabrice Debarge

Jan 7, 2023

J'ai lu votre livre en français pour mieux comprendre ce que pouvait faire ou non l'IA, c'était très intéressant.

Je suis content de lire ce blog détaillé. Les posts twitter étaient trop courts.

Expand full comment

Reply (1)

Melanie Mitchell

Jan 7, 2023

Merci!

Expand full comment

Sebastian Raschka, PhD

Jan 6, 2023

Really enjoyed this essay. We have some many articles discussing the technical details of autoregressive language models as well as different metrics to evaluate training performance. However, there are still so many challenges in terms of robust and common-sense benchmarks -- they are just hard to formalize. Anyways, I just wanted to say that this was a very refreshing read!

Expand full comment

Reply (1)

Melanie Mitchell

Jan 6, 2023

Thank you!

Expand full comment

AI: A Guide for Thinking Humans

On Analogy-Making in Large Language Models