I have my own challenge: I think LLMs can do everything that a human can do and typically way better if the context required for the problem can fit in 10,000 tokens.
That is like saying, "My program is better than any human, but binary inputs only!"..
Even restricted to text, the LLMs are not better than a human who is expert in a domain. Try talking to it regarding any topic. Even in topics that I am not an expert it, the responses from LLMs quickly becomes bland and uninteresting and devoid of additional information..
This is true even in tech things that after a certain point, I stop talking to it and search for a blog/post/so answer written by a human, which if found, would immediately break the plateau of progress that I was facing with the LLM.
every few months i like to ask chatgpt to do the "thinking" part of my job (scientist) and see how the responses stack up.
at the beginning 2022 it was useless because the output was garbage (hallucinations and fake data).
nowadays its still useless, but for different reasons. it just regurgitates things already known and published and is unable to come up with novel hypotheses and mechanisms and how to test them. which makes sense, for how i understand LLMs operate.
I am also a scientist and had the same conclusion. I just use it to summarize papers, occasionally write boilerplate, and sometimes do some google search primitives if its an easy question.
For now this challenge is text only.
Can we think of anything that LLMs can’t do?