Discussion about this post

User's avatar
rif a saurous's avatar

I'm wondering if the quick appearance and disappearance of Galactica (e..g, https://twitter.com/ylecun/status/1593293058174500865) changes your view. It's not precisely defined, but this looks like something that a (smart) human with domain-specific knowledge of could create vastly better output than what galactica was putting out. In the language of "it can attempt basically all tasks a human with access to a text input, text output console and nothing more could and make a reasonable go at them", I'd say galactica's output was not "a reasonable go", and it was instead a mix of some good details with highly-confident nonsense.

Expand full comment
rif a saurous's avatar

(I work at Google Research, not on but somewhat adjacent to large language models.) I have a different objection, which is essentially that the commonsense benchmarks are far too easy and that large language models don't demonstrate even a modicum of common sense. Kocikjan et al.'s "The Defeat of the Winograd Schema Challenge" (https://arxiv.org/abs/2201.02387) is worth a read.

Expand full comment
19 more comments...

No posts