AI: A Guide for Thinking Humans

Jan 14, 2024

Thanks for the kind words and for the paper link!

Expand full comment

Tim Klinger

Nice discussion of the paper! I had a similar take. One additional issue with the approach is that they don't really handle productivity.

Expand full comment

saiborg

Thanks for the brilliant summary. I had the paper on the to-read list and now I have marked it as done. There are lots of works training models to solve meta problems using in-context generalization such as TabPFN or the work on linear regression with transformers or even works on causal discovery using similar approach etc.

Can we obtain systematic generalization to all distributions we care about by widening the problem generating distribution or is it just pushing the problem one step back? And where does agency and embodiment and causality fit in this way of solving generalization?

Expand full comment

Herbert Roitblat

I deeply appreciate your excellent analysis. In many ways it parallels an analysis by Lachter and Bever (https://www.researchgate.net/profile/Thomas-Bever/publication/19806078_The_relation_between_linguistic_structure_and_associative_theories_of_language_learning-A_constructive_critique_of_some_connectionist_learning_models/links/5bf9c4f292851ced67d5f474/The-relation-between-linguistic-structure-and-associative-theories-of-language-learning-A-constructive-critique-of-some-connectionist-learning-models.pdf).

As you point out, the choice to train on 80% correct training examples suggests that there was something unexpected going on to produce this result. Lachter and Bever point out that the choice of representation is also not neutral. They were concerned with a connectionist assertion that the model would learn the past tense form of verbs and they pointed out the choice of representations for the language controlled the results.

I can conceive of the possibility that there really is something to this meta-learning potential, but I would remain skeptical until we can convincingly rule out that the way the problem was translated into an embedding and the way the model was structured was not sufficient to explain the results.

Expand full comment

John Kost

https://youtu.be/rUf3ysohR6Q?si=oMHAzEgMzvEELzQP

Maybe not exactly 'symbolic machinery' was needed, but somehow maybe a 'heuristic' was encoded in the training set or emerged from the training. That 80% achievement makes me think a heuristic might be involved. Blame it on Yannic for making me think really, really hard because of this useful 'prank' of his.

Expand full comment

Nicky Clarke

Very exciting! Fundamental theories of thinking and upper ontologies merging to sense make for us seems to be 2024’s theme.

Expand full comment

Scott Burson

"But given the explicit training [in human-like error modes], I didn’t understand why this would [be] surprising, and I didn’t see what insights such results provide."

You're putting that more diplomatically than I would :-)

Expand full comment

Mykola Rabchevskiy

In the examples given, the correct statement is not “this is a solution”, but “this is one of the possible solutions”.

Expand full comment

Indeed, you're right. The solution I gave assumes a particular underlying grammar, which is also assumed in the Lake & Baroni paper. But in principle there could be an infinite number of possible solutions.

Expand full comment

Roumen Popov

The problem domain of the puzzles seems to be limited, so there is a big chance that the model saw the test examples during training. If the authors did not explicitly control for this then it's a big weak spot in their conclusions. Just supplying random examples from the meta-grammar is not enough because with a limited problem domain a large number of training examples will cover most of the problem domain even if the examples are supplied randomly.

There is a simple unlimited problem domain that can be used for testing compositionality, systematicity and productivity - arithmetic problems. As far as I know, so far LLMs fail on even simple arithmetic tasks when large numbers are involved (a lot of possible combinations of digits that can not all be memorized by the LLM during training time). It puzzles me why LLM researchers keep avoiding this problem. I think it's a huge elephant in the LLM room.

Expand full comment

Thanks for your comment. The authors did check to make sure the test examples were not in the training data.

Expand full comment

Tyler Durden

Jun 18, 2024

For the record, my IQ is around 140. The test is problematic because it fails to provide enough clear examples to effectively test pattern recognition skills. It's more about asking participants how they arrived at their conclusions and what logic makes sense to them, which deviates from its original purpose and leads to various interpretations.

Expand full comment

Tyler Durden

Jun 18, 2024

As a small natural language model, I totally failed the test and not even got a clue about what the hell it is about. I owe apologies to the entire human race.

Expand full comment

Jan 13, 2024

Melanie, apologies if you've written about this and I missed it, but have you published a position and plan for the situation with Substack platforming and monetizing nazis?

Expand full comment

Charlie Guo

Jan 9, 2024

Thank you!

Expand full comment

Charlie Guo

Jan 9, 2024

Really fascinating paper. I'm trying to find where they compared GPT-4 and got 58% accurate results? What I'm curious about is whether the same few-shot prompting approach was used or not. Also, it would be interesting to see the results of fine-tuning GPT-4 on the meta-grammar examples to see whether that improves the output.

Expand full comment

Jan 9, 2024

It's in the supplementary information: https://cims.nyu.edu/~brenden/papers/LakeBaroniNatureSI.pdf

Expand full comment

James McDermott