Discussion about this post

User's avatar
Shiwali Mohan's avatar

Hi Melanie, thank you for this analysis. I am only now getting into the prior literature of analogical reasoning and generalization although I have used Gentner and Forbus' SME and SAGE systems within a larger cognitive reasoning framework.

I have a few questions about the experiment setup and evaluation:

1. A lot of the patterns don't provide enough information to induce a rule. A single example is not sufficient to determine which rule is the right one to answer a question (you noted it as well). Are there questions in which there are multiple examples that constrain the hypothesis space such that there is no ambiguity? Can humans recognize that there are several applicable rules? Do you think that recognition of multiple hypotheses and asking for more information is a unique human skill?

2. The 'better than humans' claim is typically made by AI scientists who are looking at accuracy metrics. I was surprised to similar claims by psychologists. I can see that several humans were much better than ChanGPT (especially in the story analogy case), the average is lower but the spread is really big. Is it standard to compare averages of human behavior?

Expand full comment
Slade Winstone's avatar

"However, I did try one of my favorites: abc —>abd, xyz —> ? GPT-3 returned the strange answer xye."

I can't help wondering if GPT-3 was applying the “basic successor” three-letter-string pattern to the phonetic spelling of 'z' as 'zee' to arrive at 'e'.

Expand full comment
8 more comments...

No posts