7 Comments
User's avatar
Brian B.'s avatar
3dEdited

Speaking as someone overall deeply opposed to LLMs -- the sort of person who'd ban them beyond a spring-2022-level size limit given the chance -- I nonetheless think that, while we're stuck with them, this project seems a brilliant use of their talents. Like, I don't think the potential societal goods of LLMs could ever justify the urgent harms they do, but they're great at sifting through large quantities of data, and apparently, somehow, that even includes pattern-recognition about good writing. Cool!

If LLMs have showed this level of talent at sorting like Astral Star Codex readers -- whose finalist entries tend to be excellent reading -- we should let 'em use that talent to find good unknown writers and bring them to the attention of would-be interested readers. Because we sure don't have a good framework for that sort of treasure-sifting at the moment. And we should.

Philosophy bear's avatar

Thanks for your kind words. I am going to setup a contest, and you are most welcome to join! Pick your best two essays.

Results will only be published for the top 20 percent, so no need to fear the publication of a bad result. Will publish details soon.

Bob Bobberson's avatar

I'm not a huge fan of the whole "fucking around with the form" thing. It kinda seems like it's just a method to hijack the contest to get more exposure for something you wanted to write anyway. Perhaps that's a lowbrow prole opinion, but lowbrow proles keep society running, so have a little respect.

I think this project could have a ton of merit though, as long as people don't abuse it to set up one particular standard as some kind of "objectively correct" ranking system. I think having different AIs with subtly different standards is a great way to do this, actually. I could mostly listen to the Haiku ratings but also occasionally check the top Opus ratings to see if there's something great I would have overlooked. Sort of like fractional distillation of different categories of writing out of one gigantic disorganized slush pile.

Jesse Amano's avatar

The “extraction bug” bothers me — was that actually random? It sounds like the files themselves were concatenated somehow, but that makes no sense if they were obtained by some method like running a simple script to crawl the site hosting the contest submissions. I think I’d reluctantly agree that it’s more likely than not it didn’t bias or contaminate the samples in any particular direction (and the measure was “how likely is the AI to agree with human readers” instead of “which ideas are meritorious” or whatever, so corruption effects would be limited), but I think I have somewhat greater reservations than you do about this part.

Character limits and context budgets seem like a pretty big limitation, too — aren’t larger samples exactly the ones where a human user would want to know if it’s worth your time before starting?

I observed in myself an ephemeral disgust reaction on seeing the cost — “that’s where my subscription money went?” — but you generally do good work and I still want to support it/you. I’ve likely caused far more indirect funding to go toward LLM usage just by working for arbitrary tech companies and doing things for them that “save” money or make processes “more efficient”.

Kenny Easwaran's avatar

I think the files were originally hosted in a single big Google Doc. If someone used a “header 3” where they should have used a “header 2”, it could cause this kind of conjunction of documents.

Jesse Amano's avatar

Oh, that makes sense. Thanks.

Philosophy bear's avatar

Confirming Kenny is exactly right. The automated extraction was based on thing like matching strings to titles and header levels, turns out that's very easy to stuff up- especially because I foolishly used Sonnet for that part.