I have my own challenge: I think LLMs can do everything that a human can do and ...

qsera · 2026-03-05T18:40:14 1772736014

> For now this challenge is text only.

That is like saying, "My program is better than any human, but binary inputs only!"..

Even restricted to text, the LLMs are not better than a human who is expert in a domain. Try talking to it regarding any topic. Even in topics that I am not an expert it, the responses from LLMs quickly becomes bland and uninteresting and devoid of additional information..

This is true even in tech things that after a certain point, I stop talking to it and search for a blog/post/so answer written by a human, which if found, would immediately break the plateau of progress that I was facing with the LLM.

seanhunter · 2026-03-05T17:12:31 1772730751

This is a “no true scotsman” challenge. People are going to say llms can’t do certain things and you are going to say they can.

Not very interesting.

simianwords · 2026-03-05T17:18:09 1772731089

Let’s ask in good faith. Can you suggest something that it can’t do? Functional things. I’ll reply in good faith and consider it.

seanhunter · 2026-03-05T17:22:22 1772731342

Say I suggest something : Play a valid game of chess at club level (elo approx 1200 say) using algebraic notation.

Then you’re either going to say it can or you’re going to say that requires more than 10000 tokens.

This isn’t an interesting conversation and I don’t think you are presenting this challenge in good faith for the reason I gave above.

simianwords · 2026-03-05T17:31:01 1772731861

https://chessbenchllm.onrender.com

There are several models with greater than 1200 elo

Also https://dubesor.de/chess/chess-leaderboard

psvv · 2026-03-05T17:50:51 1772733051

I'll admit that's better than I expected, but these ratings also imply there are plenty of humans who will beat LLMs at chess.

stanford_labrat · 2026-03-05T17:57:43 1772733463

every few months i like to ask chatgpt to do the "thinking" part of my job (scientist) and see how the responses stack up.

at the beginning 2022 it was useless because the output was garbage (hallucinations and fake data).

nowadays its still useless, but for different reasons. it just regurgitates things already known and published and is unable to come up with novel hypotheses and mechanisms and how to test them. which makes sense, for how i understand LLMs operate.

doomslayer999 · 2026-03-05T22:15:01 1772748901

I am also a scientist and had the same conclusion. I just use it to summarize papers, occasionally write boilerplate, and sometimes do some google search primitives if its an easy question.

simianwords · 2026-03-05T18:35:18 1772735718

It is used in pure math research already

stanford_labrat · 2026-03-05T18:41:31 1772736091

sadly it looks like seanhunter was correct, shame.

simianwords · 2026-03-05T18:53:30 1772736810

He was literally wrong about chess

seanhunter · 2026-03-05T20:08:55 1772741335

I said “say I said they couldn’t play chess, you will say they can” and you did. That’s literally not wrong.

rhubarbtree · 2026-03-05T18:44:29 1772736269

LLMs without tooling can not do most tasks. It’s the tools and skills that enable them to do things.

So you may as well ask “is there anything a python script can’t do”. It’s just not a meaningful question.

am17an · 2026-03-05T17:39:48 1772732388

Sure. “Tell me a joke”

logicchains · 2026-03-05T17:30:11 1772731811

They can't beat even a mediocre chess player at chess.

badgersnake · 2026-03-05T16:48:36 1772729316

* code

* write interesting prose

* generate realistic images

simianwords · 2026-03-05T16:50:23 1772729423

It can do all of them. I also said text only.

infecto · 2026-03-05T17:12:09 1772730729

> Only really dumb people think that. Or maybe you are an LLM.

You deleted it but still come on. Why would you even think to write that?