Discussion about this post

User's avatar
Herbert Roitblat's avatar

Thanks for this very clear exposition. Talking as if you are reasoning is different from reasoning. To claim that a system that is known to work by predicting the next word is reasoning is an extraordinary claim, but,as you point out, there is little critical evaluation. It should require extraordinary evidence. The public, the governments, and some computer scientists are being bamboozled into thinking that these models do things that they are incapable of doing without considering the possibility that they are just following the statistical language patterns that they have learned. This should be just basic science, but evident,y not. Your critical thought is essential in the medium and long run. Thanks!

Expand full comment
Eric Saund's avatar

"Reasoning" is the application of knowledge to a problem statement that results in making certain information explicit, namely, an answer that was always there, it just wasn't written down yet.

If the knowledge permits derivation of the result in one step of pattern matching---equivalent to constraint satisfaction---then we can expect many architectures to work. If instead, multiple steps are required, then the problem becomes one of search. Search requires keeping track of intermediate results, along with info to navigate the search space.

Where might an LLM hold this information?

In a transformer, in order to parse and emit natural language, the early and late layers probably must attend primarily to lexical and syntactic matters. Presumably the middle layers are the ones that can afford to represent semantics, including various forms of abstraction. Although, the activation vectors must share different time scales of information as the residual stream gets modified: local linguistic patterns must be carried from input to output, but through superposition, activations in the middle layers also carry longer range pressures and constraints associated with categories and manifolds underlying the structure of the problem domain and the problem statement.

Does a transformer LLM have enough room to hold intermediate steps of a complex reasoning task?

The amount of activation vector capacity is the depth of the network times the number of tokens in play. Chain-of-Thought expands capacity by allocating tokens in the context vector to *procedural steps* in the reasoning process. Instead of the model having to internally shoehorn placeholders for its location in the search tree into a short context vector consisting of the problem statement and a compact output, it now has the luxury of seeing where it is in the search space there in writing (output tokens representing steps along the chain of thought). And crucially, intermediate results are made explicit in the context vector, which allows subdivision of a large and complex reasoning process into smaller parts, each of which can be solved with a smaller, more tractable pattern matching step.

Transformers are not the only LLM architecture. Other architectures, especially ones that hold a great deal of internal state that is not closely constrained to text tokens, might behave quite differently.

Expand full comment
58 more comments...

No posts