I think that, sooner or later, we will have to acknowledge how the word “intelligence” can't be separated so simplistically from consciousness. Because true intelligence necessitates a semantic understanding of concepts, data, sensory perceptions of the world and the environment, etc. Meaning, the semantic content, the real understanding of a symbol, picture, word, or sentence isn't in the symbol, picture, word, or sentence themselves. Even knowing all the relationships isn't sufficient. Because meaning is intrinsically connected with a subjective experience, with sentience, with some sort of sensation. You can't understand what a color means if you haven't had the experience of the redness of a tomato, the blueness of the sky or the greenness of a plant. You can't really understand what light is if you are congenitally blind, other than talking about it based on inferences starting from what others say. You can't understand what the taste of chocolate is without having tasted chocolate. That's why it is so hard to build self-driving cars. I would never take a ride with a human who hasn't a semantic awareness of the environment, and “sees” only numbers or streams of bit instead of experiencing the world and, thereby, acquiring a semantic understanding of what cars, pedestrians, bicycle, houses, etc. are. There is nothing in the machine that “understands” in the human sense, other than being quite clever in imitating its intelligence. That's why the so impressive LLMs might sound so human, but suddenly and unexpectedly at some point might furnish an utterly nonsensical answer (e.g., you ask how many rocks one should eat per day, and it recommends “at least one”.) No consciousness, no semantic awareness, no intelligence.
Funny that I'm with you entirely on being unable to uncouple intelligence from consciousness, but your view on self-driving cars is the bit I recoil from. I strongly suspect that a human driver's consciousness causes more danger through distraction, than a computer's lack of semantic understanding. But then I live in Los Angeles and I seriously doubt that the awful drivers here have a semantic understanding of what cars, pedestrians, bicycles, houses, dogs, cats, chickens, trees, store windows, bus benches, stop signs, road markings, parks, hills, clouds, rain, and snow are because they are too focused on money, weed, blow, prescriptions, restaurants, botox, Starbucks, routes to avoid traffic, dating apps, and the latest...
Marco...that is why it WAS, not "is" so hard to build self driving-cars. Having a semantic awareness is a must, but too much awareness can lead to distraction such as being disturbed by i.e. a scantly clad lady on the sidewalk.
We have the ability to build in (code) what is relevant and what is not. I would say that preventing hallucination is very important. "I don't know!" should and must be part of AI!
As each year passes, Turing's thought experiment becomes less relevant and more moot, as a measure
of our current AI achievements. Let's just conclude that AI passes it already, and move on to "better" metrics.
But even more recent metrics such as ARC could be considered limited in some ways, for example an AI that understands understands rotation, translation, shapes, and analogues, could probably pass ARC. But such an AI could fail simple numeric tests. So we will still not have arrived. It's like we need a multi-ARC spanning ten disciplines.
In any case, when the ARC metric is bested, we will likely be back here discussing its limitations
like we are now discussing those of the Turing test.
Thinking about this piece in relation to your note in Science back in March, I'm struck by your point that so many of our intuitions about intelligence are wrong. In dismissing commons sense definitions,
The Turing Test become something like what Adam Mastroianni calls a proto-paradigm, meaning a way of doing things: “bundles of assumptions and practices that get handed down and spread around.” So, not a full-blown paradigm, just a way to get to work on an interesting problem without the messy effort of establishing fully developed models and definitions for what you’re trying to accomplish.
With the Turing Test's proto-paradigm comparing machine intelligence to that of humans no longer useful, maybe we'll move on to other (better?) bundles of assumptions and practices for evaluating thinking machines.
I really like your notion of using astrobiology as a framework because it treats machine intelligence as "something that has never been seen and might not even exist." Such a foundation treats intelligence as some we really do not understand, which seems right.
Thanks for your article and for explaining the Turing Test in the Science article, and human attempts at testing the capability of computers (notwithstanding the threshold of the test seems to have simplified over time). One wonders whether there is an unconscious desire in humans to devolve authority to computers rather than take the time to understand, challenge, and ethically control computer capability.
Melanie M., thank you again for this fantastic article.
You mentioned the Turing Test should be a philosophical experiment offered by Turing. Well, he already avoided the question of what is “thinking” beforehand so I don’t find in this test even a philosophical value. Thinking now, can be also attributed to animals too and it is proven that animals dream when they sleep. The Turing test should in my opinion be just an historical checkpoint in ai where we have to know about but not doing more about it as we do while learning about other historical facts. The hype around it I believe is just a marketing strategy for certain parties to make their work seem relevant while they just hide most of the time non-sense. And I am sorry for the words.
I find the last paragraph about language, a pure artefact of wisdom as always, you always write with precision. Language can definitely not be the means to General intelligence because of it dissociative characteristic to the other cognitive processes. However, with language we can reach a deep understanding and representation of what intelligence could be. This is because language acts as a proxy or a translator of all those processes happening. So the more we know how language translate them, the more we can get insights on reasoning, leaning and understanding to certain degrees. There is no any being on hearth living without using any language. Not all languages are using words!!
So maybe the definition of language in ai might also be discussed too.
Great article, as always. We still have a lot to learn about the nature of intelligence. For me, this part says it all:
"It’s likely that the Turing Test will become yet another casualty of our shifting conceptions of intelligence. In 1950, Turing intuited that the ability for humanlike conversation should be firm evidence of 'thinking,' and all that goes with it. That intuition is still strong today. But perhaps what we have learned from ELIZA and Eugene Goostman, and what we may still learn from ChatGPT and its ilk, is that the ability to sound fluent in natural language, like playing chess, is not conclusive proof of general intelligence."
Melanie I thoroughly enjoy your articles and work.
I have never been concerned about the proliferation of AI, though I do have reservations about where HI (Human Intelligence) leads us at times.
I am very worried about the anthropomorphication, by humans, of AI. It should be used as a tool - nothing else. Accountability when using AI, in any shape or form, should fall on the user.
Example - a law firm writes an important letter to a client that spellchecked by your Word program. It the legal meaning changes because a comma or wording was changed by the spell check (or translation program) the fault falls squarely at the person who prepared the text.
Unfortunately we hear and see of mistakes, accidents and catastrophes where blame is placed on the program.
The accountability for the use of AI should be no different from the use of a hammer.
By the way, I found out that Eugene Goostman participated in one of the Leobner prizes, ending in second place. Also that the secretary of ELIZA's creator was convinced that ELIZA had something like a soul...
Melanie, I am still puzzled by these pronouncements while AI has barely mastered the visual and audio sensorial intelligence. You have explained the nuances of Turing test and what AI is able to do today. Thanks for sharing.
Turing's proposed game was an important landmark in computer history, but suffers from actually being a test of deception; it encourages faking. In my book I encourage the industry to develop tests for understanding, not just knowledge or riddle solving. Consciousness can't be measured from a third-person standpoint, but is not necessary (in the human sense) to demonstrate behavior significant of understanding ("It's understanding, Jim, but not as we know it"). It is understanding in AI co-workers that we need so as to integrate them into our future.
As a seasoned NLP practitioner with a recent PhD (2020), my observation, particularly regarding the stunning progress in machine translation—once considered the holy grail of AI—leads me to hypothesize that many language processing tasks require little reasoning. They are primarily System 1 tasks, fast and intuitive, as described by the late Daniel Kahneman, who sadly passed away recently. « Turing test », aka thought experiment, is too much based on NLP to be a broader test for AGI.
This realization should lead us to reconsider the nature of human intelligence and its place within the broader spectrum of natural and artificial intelligence. That said, just as a plane doesn't fly like a bird, a computer won't necessarily think like a human.
Establishing an efficient and reliable protocol to determine whether an AI system is "truly intelligent", "AGI level" or "conscious" is a challenging problem—perhaps even an ill-posed or intractable one without any satisfactory solution. By the way, I'm a big fan or François Chollet's ARC tests.
AI will undoubtedly change the world! But this immense power must be wielded with great caution.
If intelligence becomes artificial, let's hope it frees us from our natural stupidity—particularly our actual inaction in the face of the environmental crisis.
The ability of today’s AI systems to pass as human and go undetected is real and already possible. No, it says nothing about the intelligence of such systems, but it says everything about their ability to imitate and to a certain degree deceive.
ARC-AGI comes a lot closer like a real test of intelligence and how AI holds up against humans.
If we are to revise the Turing Test at all we need to have a well considered (meaning bullet-proof) definition of intelligence. People like Sammy should NOT be able to define intelligence this way or that way. People are using terms like “think,” “reason,” “understand’ and so on in the most informal and undisciplined way. We should begin with the science and go from there and not invent a new definition for the sake of calling something intelligent. So long as we let OpenAI and Google and all of the data scientists use these terms as they wish we will continue to be successful n the most precarious position when it comes to defining intelligence. After working for more than 1/2 century trying to understand what “to understand” means, I will not accept any definition chosen or used by these computational early primates.
You (and pretty much everyone else) have described the Turing test incorrectly.
Turing asks the computer not to fool people, but to do as well as a human would in playing a particular, and quite subtle*, game. Thus it asks "Can a computer _function as well as, or similarly to, a human_ in a certain situation."
Please. Go back and read Turing's paper. Carefully. What he was saying was actually quite sensible. Simply fooling people isn't a sensible test at all. A test that asks the computer to _function_ as a human is a much more interesting idea.
*: The subtlety in Turing's proposal is related to his sensitivity as to what it means to be male, or female, or gay.
Roger Schank figured this out (there's a Yale AI paper on this). Daniel Dennett claimed (in response to a comment on 3 Quarks Daily) to be the one who told Roger this idea, but that didn't prevent Dennett from saying silly things about the incorrect idea of the Turing test anyway.
Thank you for a very lucid expose, Melanie. Clearly, this isn't the solution to defining AGI. So what is?
I am presently thinking of the following, and, as an AI Expert on Stuart Russell's OECD AI team, would appreciate the latitude to discuss it with you:
From my draft:
"A stepped-only structure implies a linear path to AGI, which does not accurately represent the complex, interconnected development of various cognitive abilities. A better way to conceptualize a roadmap to AGI would be a hierarchical network of different types of tasks (intellectual, social, embodied, etc.) that are necessary for success in a given domain/occupation.
With that matrix of steps vs types of tasks, one could design:
* A series of tasks that AI is capable of performing, rank-ordered by their Complexity, in three representative occupations. Occupations have the merit of being better-defined than other life endeavors (for instance, “political engagement”), even if their definitions are far from tight.
** A series of tasks rank-ordered by their Usefulness, across three representative occupations. Usefulness here means “actionable and “worthwhile”.
In both cases, there should be a **solidly reasoned** justification in choosing the occupations for their representativeness, as well as defining what “complexity” and “usefulness” mean. This can be achieved by expert consensus, as imperfect as that may be."
Although the Turing Test, as originally formulated, might have seemed at the time to be a perfectly plausible test of human-level intelligence, its recent application to actual systems has shown it to be too weak a test to be "the ultimate milestone of AI". We need, in effect, a much stronger, more demanding test, such as "would a large number of profit-motivated businesses, after extensive evaluation, irreversibly choose to employ this system (or these systems) rather than a human?" I therefore propose the Turner-Turing Test, as follows: When, in the absence of any confounding disruption such as a global pandemic, human unemployment in the G7 countries (including the EU) exceeds 30% for more than 5 years, then we will know that human-level AI has been achieved.
I think that, sooner or later, we will have to acknowledge how the word “intelligence” can't be separated so simplistically from consciousness. Because true intelligence necessitates a semantic understanding of concepts, data, sensory perceptions of the world and the environment, etc. Meaning, the semantic content, the real understanding of a symbol, picture, word, or sentence isn't in the symbol, picture, word, or sentence themselves. Even knowing all the relationships isn't sufficient. Because meaning is intrinsically connected with a subjective experience, with sentience, with some sort of sensation. You can't understand what a color means if you haven't had the experience of the redness of a tomato, the blueness of the sky or the greenness of a plant. You can't really understand what light is if you are congenitally blind, other than talking about it based on inferences starting from what others say. You can't understand what the taste of chocolate is without having tasted chocolate. That's why it is so hard to build self-driving cars. I would never take a ride with a human who hasn't a semantic awareness of the environment, and “sees” only numbers or streams of bit instead of experiencing the world and, thereby, acquiring a semantic understanding of what cars, pedestrians, bicycle, houses, etc. are. There is nothing in the machine that “understands” in the human sense, other than being quite clever in imitating its intelligence. That's why the so impressive LLMs might sound so human, but suddenly and unexpectedly at some point might furnish an utterly nonsensical answer (e.g., you ask how many rocks one should eat per day, and it recommends “at least one”.) No consciousness, no semantic awareness, no intelligence.
Funny that I'm with you entirely on being unable to uncouple intelligence from consciousness, but your view on self-driving cars is the bit I recoil from. I strongly suspect that a human driver's consciousness causes more danger through distraction, than a computer's lack of semantic understanding. But then I live in Los Angeles and I seriously doubt that the awful drivers here have a semantic understanding of what cars, pedestrians, bicycles, houses, dogs, cats, chickens, trees, store windows, bus benches, stop signs, road markings, parks, hills, clouds, rain, and snow are because they are too focused on money, weed, blow, prescriptions, restaurants, botox, Starbucks, routes to avoid traffic, dating apps, and the latest...
Marco...that is why it WAS, not "is" so hard to build self driving-cars. Having a semantic awareness is a must, but too much awareness can lead to distraction such as being disturbed by i.e. a scantly clad lady on the sidewalk.
We have the ability to build in (code) what is relevant and what is not. I would say that preventing hallucination is very important. "I don't know!" should and must be part of AI!
Level V self-driving cars won't become a reality anytime soon.
Well written, thought-provoking article.
As each year passes, Turing's thought experiment becomes less relevant and more moot, as a measure
of our current AI achievements. Let's just conclude that AI passes it already, and move on to "better" metrics.
But even more recent metrics such as ARC could be considered limited in some ways, for example an AI that understands understands rotation, translation, shapes, and analogues, could probably pass ARC. But such an AI could fail simple numeric tests. So we will still not have arrived. It's like we need a multi-ARC spanning ten disciplines.
In any case, when the ARC metric is bested, we will likely be back here discussing its limitations
like we are now discussing those of the Turing test.
Thinking about this piece in relation to your note in Science back in March, I'm struck by your point that so many of our intuitions about intelligence are wrong. In dismissing commons sense definitions,
The Turing Test become something like what Adam Mastroianni calls a proto-paradigm, meaning a way of doing things: “bundles of assumptions and practices that get handed down and spread around.” So, not a full-blown paradigm, just a way to get to work on an interesting problem without the messy effort of establishing fully developed models and definitions for what you’re trying to accomplish.
With the Turing Test's proto-paradigm comparing machine intelligence to that of humans no longer useful, maybe we'll move on to other (better?) bundles of assumptions and practices for evaluating thinking machines.
I really like your notion of using astrobiology as a framework because it treats machine intelligence as "something that has never been seen and might not even exist." Such a foundation treats intelligence as some we really do not understand, which seems right.
The « Turing Test » is more a thought experiment like famous one's of Albert Einstein that were very trendy in the early 20th century.
Thanks for your article and for explaining the Turing Test in the Science article, and human attempts at testing the capability of computers (notwithstanding the threshold of the test seems to have simplified over time). One wonders whether there is an unconscious desire in humans to devolve authority to computers rather than take the time to understand, challenge, and ethically control computer capability.
Melanie M., thank you again for this fantastic article.
You mentioned the Turing Test should be a philosophical experiment offered by Turing. Well, he already avoided the question of what is “thinking” beforehand so I don’t find in this test even a philosophical value. Thinking now, can be also attributed to animals too and it is proven that animals dream when they sleep. The Turing test should in my opinion be just an historical checkpoint in ai where we have to know about but not doing more about it as we do while learning about other historical facts. The hype around it I believe is just a marketing strategy for certain parties to make their work seem relevant while they just hide most of the time non-sense. And I am sorry for the words.
I find the last paragraph about language, a pure artefact of wisdom as always, you always write with precision. Language can definitely not be the means to General intelligence because of it dissociative characteristic to the other cognitive processes. However, with language we can reach a deep understanding and representation of what intelligence could be. This is because language acts as a proxy or a translator of all those processes happening. So the more we know how language translate them, the more we can get insights on reasoning, leaning and understanding to certain degrees. There is no any being on hearth living without using any language. Not all languages are using words!!
So maybe the definition of language in ai might also be discussed too.
Thank you a lot for your article!!!
Waiting the next
Great article, as always. We still have a lot to learn about the nature of intelligence. For me, this part says it all:
"It’s likely that the Turing Test will become yet another casualty of our shifting conceptions of intelligence. In 1950, Turing intuited that the ability for humanlike conversation should be firm evidence of 'thinking,' and all that goes with it. That intuition is still strong today. But perhaps what we have learned from ELIZA and Eugene Goostman, and what we may still learn from ChatGPT and its ilk, is that the ability to sound fluent in natural language, like playing chess, is not conclusive proof of general intelligence."
Melanie I thoroughly enjoy your articles and work.
I have never been concerned about the proliferation of AI, though I do have reservations about where HI (Human Intelligence) leads us at times.
I am very worried about the anthropomorphication, by humans, of AI. It should be used as a tool - nothing else. Accountability when using AI, in any shape or form, should fall on the user.
Example - a law firm writes an important letter to a client that spellchecked by your Word program. It the legal meaning changes because a comma or wording was changed by the spell check (or translation program) the fault falls squarely at the person who prepared the text.
Unfortunately we hear and see of mistakes, accidents and catastrophes where blame is placed on the program.
The accountability for the use of AI should be no different from the use of a hammer.
Great read Melanie, thanks!
The only point where I'm not with you is in considering that the Turing test has still much to deliver.
You can check my explanation in the post: "Why the Turing Test Became Obsolete" (Medium friend link: https://medium.com/towards-data-science/why-the-turing-test-became-obsolete-efe941cb7aec?sk=100ed2ba85b68f6533161675ad2e5200)
By the way, I found out that Eugene Goostman participated in one of the Leobner prizes, ending in second place. Also that the secretary of ELIZA's creator was convinced that ELIZA had something like a soul...
Thanks! I enjoyed your Medium post.
Melanie, I am still puzzled by these pronouncements while AI has barely mastered the visual and audio sensorial intelligence. You have explained the nuances of Turing test and what AI is able to do today. Thanks for sharing.
Turing's proposed game was an important landmark in computer history, but suffers from actually being a test of deception; it encourages faking. In my book I encourage the industry to develop tests for understanding, not just knowledge or riddle solving. Consciousness can't be measured from a third-person standpoint, but is not necessary (in the human sense) to demonstrate behavior significant of understanding ("It's understanding, Jim, but not as we know it"). It is understanding in AI co-workers that we need so as to integrate them into our future.
As a seasoned NLP practitioner with a recent PhD (2020), my observation, particularly regarding the stunning progress in machine translation—once considered the holy grail of AI—leads me to hypothesize that many language processing tasks require little reasoning. They are primarily System 1 tasks, fast and intuitive, as described by the late Daniel Kahneman, who sadly passed away recently. « Turing test », aka thought experiment, is too much based on NLP to be a broader test for AGI.
This realization should lead us to reconsider the nature of human intelligence and its place within the broader spectrum of natural and artificial intelligence. That said, just as a plane doesn't fly like a bird, a computer won't necessarily think like a human.
Establishing an efficient and reliable protocol to determine whether an AI system is "truly intelligent", "AGI level" or "conscious" is a challenging problem—perhaps even an ill-posed or intractable one without any satisfactory solution. By the way, I'm a big fan or François Chollet's ARC tests.
AI will undoubtedly change the world! But this immense power must be wielded with great caution.
If intelligence becomes artificial, let's hope it frees us from our natural stupidity—particularly our actual inaction in the face of the environmental crisis.
The ability of today’s AI systems to pass as human and go undetected is real and already possible. No, it says nothing about the intelligence of such systems, but it says everything about their ability to imitate and to a certain degree deceive.
ARC-AGI comes a lot closer like a real test of intelligence and how AI holds up against humans.
If we are to revise the Turing Test at all we need to have a well considered (meaning bullet-proof) definition of intelligence. People like Sammy should NOT be able to define intelligence this way or that way. People are using terms like “think,” “reason,” “understand’ and so on in the most informal and undisciplined way. We should begin with the science and go from there and not invent a new definition for the sake of calling something intelligent. So long as we let OpenAI and Google and all of the data scientists use these terms as they wish we will continue to be successful n the most precarious position when it comes to defining intelligence. After working for more than 1/2 century trying to understand what “to understand” means, I will not accept any definition chosen or used by these computational early primates.
I used the term successful and it should have been unsuccessful.
You (and pretty much everyone else) have described the Turing test incorrectly.
Turing asks the computer not to fool people, but to do as well as a human would in playing a particular, and quite subtle*, game. Thus it asks "Can a computer _function as well as, or similarly to, a human_ in a certain situation."
Please. Go back and read Turing's paper. Carefully. What he was saying was actually quite sensible. Simply fooling people isn't a sensible test at all. A test that asks the computer to _function_ as a human is a much more interesting idea.
*: The subtlety in Turing's proposal is related to his sensitivity as to what it means to be male, or female, or gay.
Roger Schank figured this out (there's a Yale AI paper on this). Daniel Dennett claimed (in response to a comment on 3 Quarks Daily) to be the one who told Roger this idea, but that didn't prevent Dennett from saying silly things about the incorrect idea of the Turing test anyway.
Thank you for a very lucid expose, Melanie. Clearly, this isn't the solution to defining AGI. So what is?
I am presently thinking of the following, and, as an AI Expert on Stuart Russell's OECD AI team, would appreciate the latitude to discuss it with you:
From my draft:
"A stepped-only structure implies a linear path to AGI, which does not accurately represent the complex, interconnected development of various cognitive abilities. A better way to conceptualize a roadmap to AGI would be a hierarchical network of different types of tasks (intellectual, social, embodied, etc.) that are necessary for success in a given domain/occupation.
With that matrix of steps vs types of tasks, one could design:
* A series of tasks that AI is capable of performing, rank-ordered by their Complexity, in three representative occupations. Occupations have the merit of being better-defined than other life endeavors (for instance, “political engagement”), even if their definitions are far from tight.
** A series of tasks rank-ordered by their Usefulness, across three representative occupations. Usefulness here means “actionable and “worthwhile”.
In both cases, there should be a **solidly reasoned** justification in choosing the occupations for their representativeness, as well as defining what “complexity” and “usefulness” mean. This can be achieved by expert consensus, as imperfect as that may be."
Although the Turing Test, as originally formulated, might have seemed at the time to be a perfectly plausible test of human-level intelligence, its recent application to actual systems has shown it to be too weak a test to be "the ultimate milestone of AI". We need, in effect, a much stronger, more demanding test, such as "would a large number of profit-motivated businesses, after extensive evaluation, irreversibly choose to employ this system (or these systems) rather than a human?" I therefore propose the Turner-Turing Test, as follows: When, in the absence of any confounding disruption such as a global pandemic, human unemployment in the G7 countries (including the EU) exceeds 30% for more than 5 years, then we will know that human-level AI has been achieved.