Discussion about this post

User's avatar
Art Keller's avatar

Great article. I posit that reluctance to try to reproduce results and examine failure is why true progress in AI research gets sidelined in favor of the elusive search for clever new ways to hack benchmarks. There is no percentage in asking, "Did this model get the correct results for the wrong reasons," if you work for one of the frontier labs. It's a common-sense question and over a beer, I'm sure a lot of researchers would agree it's a worthwhile thought experiment. In practice, you'd find research teams reluctant to carry it out, precisely BECAUSE of the notion that everything is moving so fast, which is to say, a culture of manic FOMO, they don't want to be slowed down by what they'd frame as "side issues," when those systemic eval flaws are actually central to ongoing performance failure. IMO, one reason frontier labs avoid digging into failure modes too much is it will point to architecture issues that are not easily fixed. All of which comes down to the old saying, "It is hard to get somebody to understand something (e.g. the need for curiosity abt both success and failure) when his job depends on NOT understanding it. Also suspect the data contamination that quite conveniently allows models to score higher on benchmarks is not accidental, but a kind of contamination that is avidly sought for training data. Meta admitted as much.

Alif Wahid's avatar

I always enjoy your articles. This is a great read and the recorded keynote presentation is wonderful to watch.

I remember from your 2019 book a fantastic chapter called "Metaphors We Live By," which you cited was based on the eponymous book by Lakoff & Johnson from 1980. They say at the beginning of their book that "argument is war" metaphor can also be changed to a different metaphor: "argument is a dance."

So regarding your sixth principle and trying to promote publication of more negative results - I wonder if we need a new metaphor to fundamentally change the discussion?

It seems to me that the singular focus on novelty and positive results in the science publication industry is a version of the "argument is war" metaphor. But real progress only happens, as you point out, when we have a figurative situation where "argument is a dance."

Just thinking out loud :) Anyway, thanks for your great writing!

Cheers.

40 more comments...

No posts

Ready for more?