On Evaluating Understanding and…

May 15, 2023

In a previous post I wrote about the Abstraction and Reasoning Corpus (ARC), an idealized domain created by François Chollet for evaluating abstraction and analogy abilities in machines and humans. My collaborators—Arseny Moskvichev and Victor Odouard

Read →

7 Comments

AnnB2

May 15, 2023

As a very old human, I propose that the ultimate role of any machine is to be a customizable assistant. PDAs, for example, were widely adapted because they allowed the end user to enter and track notes, details of a meeting, or contacts without the need for a secretary. At Air Products and Chemicals in the mid-1980s managers and engineers who had no interest in computing became enthusiastic users of email because it provided a time-saving alternative to phone calls and meetings. An app powered by AI or a "dog" that can go where humans cannot will have as many uses as the people who direct them. On the other hand, our Roomba regularly gets into a spot where it has to call for help.

Expand full comment

jazzbox35

May 15, 2023

The single most common theme related to "understanding" debate in AI is the idea of levels of understanding. Some people tend to think of understanding as a kind of absolute.

Expand full comment

Davide

Jul 1, 2024

Would it be possible to create ARC tests similar to the ones currently available, but designed so that humans cannot exploit their visual apparatus? The idea is to prevent humans from using the advantage of their visual apparatus to solve the problems. Maybe tasks in dimensions beyond 2D/3D? For example, 4D, 5D, 6D?

Expand full comment

Kevin

May 15, 2023

I really like the idea behind ARC and ConceptARC but I feel like the actual problems have a lot of extraneous complexity. Having different input and output sizes, having different colors that are sometimes relevant and sometimes not, having structural embeddings like "there's a grey line separating two logical inputs" that only exist in one of the test cases, these all make it harder for an AI system to learn logical patterns, and just as a software engineering point of view it bogs things down too.

But abstract reasoning shouldn't really require these things, right? We should be able to learn abstract reasoning on a much lower-dimensional input space. Something like, sequences of tokens that are transformed into other sequences of tokens. The pqr -> pqs type stuff, or maybe even a space that doesn't involve alphabetical ordering. Aren't there problems like this that are also beyond the reach of any current learning models?

Basically it feels like ARC is three steps beyond what's possible, I'm tempted to work on it but it just seems out of reach. We need a task like recognizing handwritten digits was in 1995, as simple as possible while still being out of reach of current technology.

Expand full comment

Reply (2)

Melanie Mitchell

May 15, 2023

I think the open-endedness of ARC is essential. The tasks we created tried to balance reducing the difficulty with maintaining this open-endedness. So maybe the tasks are two-steps (rather than three-steps) beyond what can be done now? :-)

Expand full comment

Comment deleted

May 15, 2023

Comment deleted

Expand full comment

Kevin

May 15, 2023

Counting problems do seem interesting. In theory an LLM should be able to count things that it can recognize, at least up to the depth of its network. But an LLM would probably need far more examples of counting to learn the pattern than a human would. Although, a modern human does generally solve hundreds of counting problems before they hit first grade, so maybe it does just take a lot of practice.

Expand full comment

Reply (1)

Comment deleted

May 15, 2023

Comment deleted

Expand full comment

Kevin

May 15, 2023

Hmm, really? This reminds me of something I was reading about, the Piraha language -

https://en.wikipedia.org/wiki/Pirah%C3%A3_language

I guess it's hard to say since we don't run many tests on keeping children isolated from society til they become adults and then quizzing them on abstract reasoning skills....

Expand full comment

AI: A Guide for Thinking Humans

On Evaluating Understanding and…