20 Comments
Oct 21Liked by Melanie Mitchell

"I think this kind of debate is actually really good for the science of LLMs"

I would agree with that. It is interesting to me that so many clever people are so uncritical about something so important to them. I might put that down to a steady diet of sci-fi and computer code, rather than a deep education about cognition, and indeed the hype cycle that's been used to fund this research and monetize the results. We're probably all guilty of that at some level I think, so perhaps it's best not to throw stones.

The really interesting issue to me is that it turns out the semantics captured by language are so amenable to an analysis just of syntax using statistical patterns. We seem to have found the scale at which this starts to happen, which is very large but not infinite. Yet this is not so surprising I think. The human mind is prodigious, but it's not infinite either.

But I think our minds are doing more than predicting tokens given a massive set of examples. Funnily enough, the question what we are really doing when we think and talk still remains fundamentally unanswered, even if we now know that we do these things using a tractable tool.

Expand full comment

It's like when you didn't study for a test the night before so you try to answer multiple choice questions by other tricks. For example, test makers usually put more thought into the wording of the right answer than the wrong ones. Humans are very good at this kind of reasoning. I expect LLMs are even better at it.

Expand full comment
18 hrs agoLiked by Melanie Mitchell

Excellent article on one of the key limitations of LLMs (reasoning). The other (IMO) is the extremely shallow internal world model (required for genuine understanding of the real world) that is constructed by the LLM training process. Unless both of these problems (reasoning and understanding) can be robustly resolved, LLM cognition, and therefore the cognition of any agent or robot built on top of it, will be severely limited. It is extremely unlikely (IMO) that any LLM-based system will ever resolve these fundamental problems to the extent required for human-level AGI.

Expand full comment
16 hrs agoLiked by Melanie Mitchell

An excellent digest of an important topic. Many thanks!

Expand full comment

Reasoning cannot be performed in one shot, just as one can't write code in one shot.

There has to be an iterative process. At each step a hypothesis is made. Some validation takes place. Another step takes place. At some point one can realize that this is a dead-end and have to start anew.

A reasoning agent has to have some very good understanding of the environment it is searching. It is not unlike trying to find a treasure in a labyrinth without punching through walls.

It is plausible that a tool like GPT-o1 can be improved enough to reliably work through a problem, with the addition of a formal verifier, or strategies for grounding, checking work, reflection and evaluation.

This will require a lot of domain-specific modeling and lots of examples illustrating the relevant strategies.

Expand full comment

o1 Preview didn't have an trouble working through the Kiwi problem with the distractors when I tried it or the toy problem symbolically.

"Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?"

ChatGPT

Thought for 5 seconds

To determine the total number of kiwis Oliver has, we'll calculate the number he picked each day and then sum them up.

Friday: Oliver picked 44 kiwis.

Saturday: He picked 58 kiwis.

Sunday:

He picked double the number he did on Friday.

Number of kiwis picked on Sunday:

2

×

44

=

88

2×44=88 kiwis.

Note: Five of these kiwis were a bit smaller than average, but unless specified otherwise, they are still counted as kiwis.

Total kiwis:

44

(

Friday

)

+

58

(

Saturday

)

+

88

(

Sunday

)

=

190

 kiwis

44 (Friday)+58 (Saturday)+88 (Sunday)=190 kiwis

Answer: 190

To solve this problem symbolically, let’s represent each variable in a general equation to find how many bouncy balls were added. Here’s the step-by-step approach:

Given:

The bag of building blocks contains {x} blocks.

The bin of stuffed animals contains {y} stuffed animals.

The tower of stacking rings contains {z} rings.

The total number of toys after adding the bouncy balls is {total}.

Define the variables:

Let x be the number of blocks.

Let y be the number of stuffed animals.

Let z be the number of rings.

Let b be the number of bouncy balls.

Let total be the total number of toys after adding the bouncy balls.

Write the equation: The sum of the initial number of toys plus the number of bouncy balls is equal to the total number of toys:

𝑥

+

𝑦

+

𝑧

+

𝑏

=

total

x+y+z+b=total

Rearrange the equation to solve for b (the number of bouncy balls):

𝑏

=

total

(

𝑥

+

𝑦

+

𝑧

)

b=total−(x+y+z)

Summary of the solution:

Equation:

𝑏

=

total

(

𝑥

+

𝑦

+

𝑧

)

b=total−(x+y+z)

To find the number of bouncy balls, subtract the sum of {x}, {y}, and {z} from {total}.

This symbolic equation gives you the number of bouncy balls required to reach the total number of toys.

Expand full comment
19 hrs ago·edited 19 hrs ago

Melanie mentions "deduction, induction, abduction, analogy, common sense, and other ‘rational’ or systematic methods for solving problems" as aspects of reasoning. In my AGI work, the three cognitive primitives are induction (the discovery of patterns), deduction, and abduction (where the latter two are derived from the idea of semantic consequence, arguably the most fundamental concept in logic). Reasoning by analogy may (I believe) be defined as a special case of induction, so if you get induction right it basically comes for free. Generic problem-solving (i.e. "other ‘rational’ or systematic methods for solving problems") may be constructed on top of induction, deduction, and abduction, as may continuous learning and continuous planning. Finally, common sense (knowledge) may be acquired via continuous learning. So basically (in AGI cognition, not necessarily human cognition) there's a hierarchy of cognitive primitives, operations, and processes, but induction, deduction, and abduction are the most fundamental. None of these things are trivial to design and implement, but nevertheless they are all "doable" given sufficient money and effort.

Expand full comment

“Finally, common sense (knowledge) may be acquired via continuous learning.”

My understanding from this (see LLMs have a limited memory section here https://open.substack.com/pub/oneusefulthing/p/thinking-like-an-ai?r=lw3j2&utm_medium=ios) is that the fixed training data set at the time of generating the training set and the limited and impermanent context window preclude continuous learning.

A much larger and fundamental issue almost never addressed is the question of creating from nothing vs. creating from something. At this level I define creating from nothing creation while creating from something isn’t creation but rather a process of change. Things change. Nothing doesn’t change.

Humanity doesn’t yet (and may never) have language for speaking nothing but rather only language for speaking something. The instant objection to the previous sentence begins to point to the realm beyond language and comprehension.

It’s total hubris to imagine the creation of thinking machines without accounting for creation from nothing. It’s back to searching where there already is some light rather than confrontation with absolute nothing or as Heidegger puts it “The nothing”.

I’m not saying that computers aren’t useful. I am saying it’s not time to bow and pray to the neon god we’ve made.

Expand full comment

Not sure what you mean by "creation from nothing".

Expand full comment

Not to be a smartass, and welcome to the club! Existentially speaking, there is no inherent meaning to anything at all. ALL meanings are invented, that is, brought into existence from nothing and then ascribed. Creation is this realm of "Bringing forth," "Generating," "Calling up," or "Languaging." The meaning of these words, what these words point to, is nothing accept the meaning that has been given to them and then agreed to.

Nothing is paradoxically the simplest and, therefore, the most difficult abstraction to get. In fact, it is almost impossible to get, and most people aren't going to get it. Each one of us needs to decide whether they are going to be one of the few who get it or not. I have nothing to do with you getting it or not.

So, uncertainty about the meaning of nothing, sure, I get it. This right here is where the rubber meets the road. YOU create from nothing what it means to create from nothing, or you stay where you are and don't create from nothing what it means to create from nothing. That's what creating from nothing is like. It's a simple act in that it requires no behavior. The mind can't grasp it as it is before the mind. It's an act of self. Yourself, myself, theself but not self as a thing, self as no thing.

At the level of truth, language becomes tautological and endlessly chases its tail. That points to the limits of language not truth. Truth exceeds language, is beyond language. As Wittgenstein said "Whereof one can not speak thereof one must be silent."

What does all this mean? Nothing. It's not meant to leave you with something that you can use. The truth can't be used for anything. It just is what it is and that is not a thing that can be used. Computers are wonderful things that can be used in wonderful ways. Let's use them only for good.

Expand full comment
15 hrs ago·edited 15 hrs ago

If you're basically saying that the physical universe (assuming that it exists) is inherently meaningless then I completely agree - this is the conclusion I came to when considering an AGI (starting from tabula rasa) perceiving the universe for the first time. The structure revealed by an AGI observing the universe via its sensors is inherently meaningless, i.e. it's just data. It's the intelligent agents (machines, humans, ...) themselves that assign meaning to the universe, e.g. by recognising (via induction) a hierarchy of patterns in their percept history. Given such a hierarchy of patterns, each intelligent agent then constructs (via abduction) their own internal model (theory) of the universe. But everyone's experience (hence percept history) is different, and changes over time, and so everyone's internal model of the universe is (a) personal to them, and (b) changes with experience.

Expand full comment

I like your thinking. Too bad it's turtles all the way down. "https://en.wikipedia.org/wiki/Turtles_all_the_way_down"

There is no tabula rasa. A clean slate is meaningless without invented meaning describing clean and slate. Every move to get out of the trap only tightens it further. Think Chinese finger trap. Machines aren't going to observe the universe for the first time. Once you kick that pesky detail under the carpet, the rest is great! Human's observe. Machines don't. No matter how ingenious the lay of the falling dominos and what patterns they produce, and what they fall into and set in motion, they're just inanimate objects without any trace of intellect.

Expand full comment

When I referred to continuous learning, I wasn't referring to LLMs, I was referring to a different AGI paradigm, one where continuous learning is an explicit part of the design.

Expand full comment

Thank you for this very helpful insight into how commonsense notions of what reasoning is are stretched by academic scrutiny of odd, curveball constructions. But then, lovers of puzzles and brain teasers do it to ourselves!

Regarding the 44 kiwis example:

Grice's principle of cooperation in Conversation Analysis says that in normal discourse, people don't throw in random distractors just to challenge the listener. If someone says, "...but five of them were a bit smaller than average", then its a reasonable inference that maybe these are supposed to be exceptions. So maybe that LLM is applying a different smarts from a different direction.

Expand full comment

That's true but a human may be warned by adding something like "and by the way some of these reasoning tasks may contain irrelevant information". I am a bit skeptical if adding that to prompt would be enough for LLM to overcome such obstacle. Besides - a human could realize it themselves in one task and even come back to previous tasks with this new knowledge. That I guess would be difficult for an LLM even if all tasks were given consecutively in the same session.

Expand full comment

My contribution to the debate:

https://open.substack.com/pub/earlboebert/p/can-chatgpt-reason-not-exactly?r=2adh4p&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Advice to researchers: move on to something stronger than shift cipher's, they are too simple to be of interest.

Expand full comment

Excited to build some open source o1 like systems next year to help us all answer this.

Expand full comment

I think it says something about our own reasoning abilities that humans are so easily led astray by what we want to be true rather than what is measurably true, e.g. whether we want to believe that LLMs are generalising rather than memorising, given the small number of studies that support a meaningful conclusion either way. I wonder if LLMs will ever build such emotional attachments to arguments? (I guess it depends what’s in their training data!)

Expand full comment

Interesting article about articles. How about integrating some formal system like predicate logic to LLMs for reasoning? I understand there are limitations of predicate logic, but maybe the research could go in that direction to get 100% correct results based on the information LLMs already have.

Expand full comment
19 hrs ago·edited 19 hrs ago

AIs including neural networks or other logic front-ends (with LLMs just in background drafting possible content) will be able to reason soon?

Expand full comment