Thanks for writing this. I must admit, however, that I am a bit confused. There are two hypotheses. One hypothesis is that the model learns the statistical properties of Othello moves. The second hypothesis is that the model learns properties of Othello that are not statistical. It seems to me that any analysis that attempts to distinguish between these two would have to find a situation where the output of the system could not be explained by statistical properties (the more parsimonious hypothesis). I do not see how finding that the activation state of the model correlates with the state of the board addresses that distinction. I would expect that the statistical properties of the moves would be perfectly correlated with the state of the board. Am I missing something? Any specific sequence of moves would result in a specific state of the board. Multiple sequences could result in the same state, but is that not a statistical relation?
Here is an example of a test close to what I mean for a language model. Embeddings are supposed to capture the meaning of a sentence and they do a reasonable job as a first approximation because of distributional semantics. But consider these three sentences:
1. Skinny weighed 297 pounds.
2. Edward weighed 297 pounds.
3. Skinny weighed 297 pounds of potatoes.
It seems to me that sentences 1 and 2 are very similar in meaning relative to sentence 3. They both describe a person's weight. Sentence 3 uses "weighed" to indicate an action performed with some potatoes. It does not refer to the state of a person. Yet, the embeddings for sentence 1 and 3 were much closer to one another than either one was to sentence 2.
This one example does not "prove" anything, certainly, but it does illustrate the kind of contrast that I think we need to evaluate claims of cognitive functions.
I've always been confused by people who think LLMs have a sophisticated world model through learning only text. It seems obvious to me that words themselves are not enough, from a really basic Philosophy 101 "The Map is Not the Territory" and "This is Not a Pipe" level of reasoning. Text based LLMs aren't really experiencing the world, just a shadows-on-a-cave-wall version of it. How can it have an even close to complete world model?
It feels to me though that human-like reasoning does require referring to physical reality, which LLMs don't seem to have. Other AI technologies might! All those Boston Dynamics robots and the like could easily be approaching that. But specifically LLMs probably don't.
There is something which bothers me about teaching an AI to recognize malignant lesions, or cats, or ladies crossing the road with a pram, a mobile phone, and a dog on a leash. If I were teaching a human how to perform one of these tasks, I would ask them how they recognized the malignant lesion, or the cat. "Because it looks dense, or because it is covered in fur and purrs". If they said, because there is a ruler in the picture, I'd know that we had a problem. I can't do this with an AI...
It was great to hear Ev Fedorenko on your podcast, Melanie. Her article last year explaining how language is a primarily a tool for communicating thoughts really undercuts the notion that a model of language can develop world models or any other complex abstraction.
Question: Does the opening example about lesions with rulers in the image speak more about the failure of the algorithms or about the modelers who train them? In the example you cite was the training redone with the rulers removed? If so, with what result and is the process in use?
I have heard related stories before. Sometimes the problem is the background: one set of military vehicles were photographed on a sunny day, the other when it was overcast; or one set of animals were photographed in the wild (forest background), the other with a domestic background. Each time the system learned something which wasn't related to the goal. But maybe people do something similar in real life. We want people to learn to not break the law, but some folks learn not to be caught...
I encountered this example while taking Abu Mustafa's online ML course at Cal Tech. As I recall there are multiple lessons to be learned; including that it's not a bad thing that the net uses everything in the training set; that data sets may contain unrecognized biases; and that one must reist the temptation to do data snooping. In this instance it would not have been data snooping to have cropped the images as necessary to remove the rulers.
Your suggested approach might be considered data snooping and certainly is undesirable from the point of view of sample size.
I feel it's very important to expand the world-model discussion from 'LLMs' to 'AI'. For while training a big AI model on human language doesn't teach the model how to do physics or chemistry, training a big model on physics data or chemistry data DOES teach it physics or chemistry, in a practical sense at least. And in the case of old-fashioned Newtonian physics, AI learns it they way we learn it as children.... not as equations with math, but as instincts based on experience. Example: "I walked on ice and slipped; next time I walked much slower and positioned my weight more directly over my foot as I stepped." Ideas like this one are honed through the hundreds or thousands of steps we take as a child; and extended with new knowledge such as "when I see a dark sheen on the sidewalk and it's cold it may be ice". AI-enabled robots learn the same lessons in essentially the same way. It's not elegant, but it's practical and it works. And for my way of thinking, this is indeed an AI world model of a sort; not from LLMs but from other types of AI.
But what are world models generalized to? And what is the "causal knowledge about the wider world"? To understand what a tree is, one needs to know that there are forests in the world. To understand what a forest is, one needs to know what the land, territory, a countryside is. To understand what the countryside is, one needs to understand what a planet is. And so on, up to the entire universe. I suspect that we are going to encounter Gödel's incompleteness soon. Yet, humans are able to deal with it without having omniscience. I feel that this isn't the whole story. Something is still missing.
Thanks for writing this. I must admit, however, that I am a bit confused. There are two hypotheses. One hypothesis is that the model learns the statistical properties of Othello moves. The second hypothesis is that the model learns properties of Othello that are not statistical. It seems to me that any analysis that attempts to distinguish between these two would have to find a situation where the output of the system could not be explained by statistical properties (the more parsimonious hypothesis). I do not see how finding that the activation state of the model correlates with the state of the board addresses that distinction. I would expect that the statistical properties of the moves would be perfectly correlated with the state of the board. Am I missing something? Any specific sequence of moves would result in a specific state of the board. Multiple sequences could result in the same state, but is that not a statistical relation?
Here is an example of a test close to what I mean for a language model. Embeddings are supposed to capture the meaning of a sentence and they do a reasonable job as a first approximation because of distributional semantics. But consider these three sentences:
1. Skinny weighed 297 pounds.
2. Edward weighed 297 pounds.
3. Skinny weighed 297 pounds of potatoes.
It seems to me that sentences 1 and 2 are very similar in meaning relative to sentence 3. They both describe a person's weight. Sentence 3 uses "weighed" to indicate an action performed with some potatoes. It does not refer to the state of a person. Yet, the embeddings for sentence 1 and 3 were much closer to one another than either one was to sentence 2.
This one example does not "prove" anything, certainly, but it does illustrate the kind of contrast that I think we need to evaluate claims of cognitive functions.
For more on evaluating LLMs for general intelligence, see: http://arxiv.org/abs/2502.07828
I've always been confused by people who think LLMs have a sophisticated world model through learning only text. It seems obvious to me that words themselves are not enough, from a really basic Philosophy 101 "The Map is Not the Territory" and "This is Not a Pipe" level of reasoning. Text based LLMs aren't really experiencing the world, just a shadows-on-a-cave-wall version of it. How can it have an even close to complete world model?
Arriving to a human-like reasoning does not imply a human-like path.
I would say that, if you're nearly timeless, you could get a good representation of the world just looking at its shadows.
It feels to me though that human-like reasoning does require referring to physical reality, which LLMs don't seem to have. Other AI technologies might! All those Boston Dynamics robots and the like could easily be approaching that. But specifically LLMs probably don't.
There is something which bothers me about teaching an AI to recognize malignant lesions, or cats, or ladies crossing the road with a pram, a mobile phone, and a dog on a leash. If I were teaching a human how to perform one of these tasks, I would ask them how they recognized the malignant lesion, or the cat. "Because it looks dense, or because it is covered in fur and purrs". If they said, because there is a ruler in the picture, I'd know that we had a problem. I can't do this with an AI...
It was great to hear Ev Fedorenko on your podcast, Melanie. Her article last year explaining how language is a primarily a tool for communicating thoughts really undercuts the notion that a model of language can develop world models or any other complex abstraction.
Question: Does the opening example about lesions with rulers in the image speak more about the failure of the algorithms or about the modelers who train them? In the example you cite was the training redone with the rulers removed? If so, with what result and is the process in use?
I have heard related stories before. Sometimes the problem is the background: one set of military vehicles were photographed on a sunny day, the other when it was overcast; or one set of animals were photographed in the wild (forest background), the other with a domestic background. Each time the system learned something which wasn't related to the goal. But maybe people do something similar in real life. We want people to learn to not break the law, but some folks learn not to be caught...
I encountered this example while taking Abu Mustafa's online ML course at Cal Tech. As I recall there are multiple lessons to be learned; including that it's not a bad thing that the net uses everything in the training set; that data sets may contain unrecognized biases; and that one must reist the temptation to do data snooping. In this instance it would not have been data snooping to have cropped the images as necessary to remove the rulers.
Your suggested approach might be considered data snooping and certainly is undesirable from the point of view of sample size.
I feel it's very important to expand the world-model discussion from 'LLMs' to 'AI'. For while training a big AI model on human language doesn't teach the model how to do physics or chemistry, training a big model on physics data or chemistry data DOES teach it physics or chemistry, in a practical sense at least. And in the case of old-fashioned Newtonian physics, AI learns it they way we learn it as children.... not as equations with math, but as instincts based on experience. Example: "I walked on ice and slipped; next time I walked much slower and positioned my weight more directly over my foot as I stepped." Ideas like this one are honed through the hundreds or thousands of steps we take as a child; and extended with new knowledge such as "when I see a dark sheen on the sidewalk and it's cold it may be ice". AI-enabled robots learn the same lessons in essentially the same way. It's not elegant, but it's practical and it works. And for my way of thinking, this is indeed an AI world model of a sort; not from LLMs but from other types of AI.
But what are world models generalized to? And what is the "causal knowledge about the wider world"? To understand what a tree is, one needs to know that there are forests in the world. To understand what a forest is, one needs to know what the land, territory, a countryside is. To understand what the countryside is, one needs to understand what a planet is. And so on, up to the entire universe. I suspect that we are going to encounter Gödel's incompleteness soon. Yet, humans are able to deal with it without having omniscience. I feel that this isn't the whole story. Something is still missing.
Interesting.
Can't wait to read Part 2. Should be a good one.