I really enjoyed this article. I support the OpenAI's vision in scaling these models with size and data. I enjoy the fact that they are testing the hypothesis that all we need is scale. However, I also do not want to be hallucinated. Before reading your article, I believed that fact that ChatGPT has passed an MBA exam. This is not cool.
Thanks for pointing out the lack of “common sense knowledge of the world”. Which is a vast domain. Spoken like a phenomenologist! It’s sort of a good thing that we have these more elaborate examples of how AI falls short of what we call “understanding”, which is more than assembling a grammatical and mostly relevant set of words from a review of existing texts.
This is brilliant! Thank you for this great recap on why AI models lack the ability to generalize like humans. Such an important concept that I feel many people don't quite understand & succumb to the media hype + overestimate model abilities.
If we don't figure out how to differentiate humans from bots in the upcoming era of deep fakes and "generative AI", democratic society is in even more trouble than we thought. I put together a prototype cryptographic "identity proof" (proving common ownership of digital social handles). The same could apply to ownership of content. Here's the proof of concept. Try clicking 'Review cryptographic proof'.
(not sure I can paste url links so...)
whosum dot com slash t slash scafaria
Good luck, fellow humans.
Dr. Mitchell, I founded a non-profit focused on building toward a human-centric (positive sum dot net). Please let me know at "vince at positivesum dot net" if you think there could be interest from groups you know and trust. (I enjoyed your Complexity book very much by the way.)
AI is breaking the standard g correlations! Fun to see. Inventing an airplane was a great intellectual achievement but that doesn’t mean birds are the new Wright Brothers.
How could we define the underlying ability that brains have and that transformers lack?
It has some knowledge of the world, and some sense of how concepts interact.
Is it that it fails to apply consistent inference rules?
Here, if it had learned to associate numerical values with units (which could well be encoded into a few dimensions in the embedding space), it would have inferred the incorrect unit of its result.
I wonder if the limited number of self-attention heads constraint the inference abilities to too few rules, so that some rules end up only being followed some of the time.
I received a much different response when I tried the "world antiques" problem. It told me I needed "$6666.67 worth of inventory." Personally I wouldn't trust ChatGPT for elaborate problems. It's good at fairly straightforward non-trick questions for now without much math.
That's an interesting answer it gave to your "Boll" question: it got to the right answer, but then ... kept going and ended up wrong in the end. One hypothesis is that the ultimate result of training on a large number of question-answer pairs where the answer text represents logical reasoning is an LLM architecture which approximately has many rules of the form [reasoning step a_i -> reasoning step a_j] or [reasoning step a_i -> stop]. This could make over- or under-"reasoning" quite common in question answering on the test set if the LLM didn't learn to precisely constrain its "stop" timing based on the original question.
I really enjoyed this article. I support the OpenAI's vision in scaling these models with size and data. I enjoy the fact that they are testing the hypothesis that all we need is scale. However, I also do not want to be hallucinated. Before reading your article, I believed that fact that ChatGPT has passed an MBA exam. This is not cool.
Thanks for pointing out the lack of “common sense knowledge of the world”. Which is a vast domain. Spoken like a phenomenologist! It’s sort of a good thing that we have these more elaborate examples of how AI falls short of what we call “understanding”, which is more than assembling a grammatical and mostly relevant set of words from a review of existing texts.
This is brilliant! Thank you for this great recap on why AI models lack the ability to generalize like humans. Such an important concept that I feel many people don't quite understand & succumb to the media hype + overestimate model abilities.
If we don't figure out how to differentiate humans from bots in the upcoming era of deep fakes and "generative AI", democratic society is in even more trouble than we thought. I put together a prototype cryptographic "identity proof" (proving common ownership of digital social handles). The same could apply to ownership of content. Here's the proof of concept. Try clicking 'Review cryptographic proof'.
(not sure I can paste url links so...)
whosum dot com slash t slash scafaria
Good luck, fellow humans.
Dr. Mitchell, I founded a non-profit focused on building toward a human-centric (positive sum dot net). Please let me know at "vince at positivesum dot net" if you think there could be interest from groups you know and trust. (I enjoyed your Complexity book very much by the way.)
AI is breaking the standard g correlations! Fun to see. Inventing an airplane was a great intellectual achievement but that doesn’t mean birds are the new Wright Brothers.
How could we define the underlying ability that brains have and that transformers lack?
It has some knowledge of the world, and some sense of how concepts interact.
Is it that it fails to apply consistent inference rules?
Here, if it had learned to associate numerical values with units (which could well be encoded into a few dimensions in the embedding space), it would have inferred the incorrect unit of its result.
I wonder if the limited number of self-attention heads constraint the inference abilities to too few rules, so that some rules end up only being followed some of the time.
I received a much different response when I tried the "world antiques" problem. It told me I needed "$6666.67 worth of inventory." Personally I wouldn't trust ChatGPT for elaborate problems. It's good at fairly straightforward non-trick questions for now without much math.
That's an interesting answer it gave to your "Boll" question: it got to the right answer, but then ... kept going and ended up wrong in the end. One hypothesis is that the ultimate result of training on a large number of question-answer pairs where the answer text represents logical reasoning is an LLM architecture which approximately has many rules of the form [reasoning step a_i -> reasoning step a_j] or [reasoning step a_i -> stop]. This could make over- or under-"reasoning" quite common in question answering on the test set if the LLM didn't learn to precisely constrain its "stop" timing based on the original question.