Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I further predict that this will spark a creative gold rush among talented amateurs to train similar models and adapt them to a variety of purposes, including: mock news, “researched journalism”, advertising, politics, and propaganda.

The first mention of 'Elon Musk' (who left the board) and this sentence alone gave me the tip-off that GPT-3 had generated that (and the whole blog) and it's following prediction makes no sense.

Sure, it may be used for nefarious purposes, but no-one can train GPT-3 in any acceptable time except for those with access to large GPU/ASIC compute power (OpenAI, Microsoft, Google, NVIDIA, etc.) Without the model, it is not possible to adapt it to any other purpose, unless OpenAI does it for them. Without a detection mechanism, it is very dangerous.

Nice try and a great GPT-3 hype experiment, (mostly by friends of OpenAI). I look forward to the day that GPT-3 gets proper scrutiny from the actual wider tech industry before we can safely use it with detection methods.



> no-one can train GPT-3 in any acceptable time except

Gwern, who has spent probably as much time with GPT-3 and GPT-2 as any 'amateur' out there, is publicly out there saying that for most use cases, GPT-3 + creative use of prompts gets you better results than GPT-2 with finetuning.

That's an amazing capability that Gwern elaborates on more here:

> A new programming paradigm? The GPT-3 neural network is so large a model in terms of power and dataset that it exhibits qualitatively different behavior: you do not apply it to a fixed set of tasks which were in the training dataset, requiring retraining on additional data if one wants to handle a new task (as one would have to retrain GPT-2); instead, you interact with it, expressing any task in terms of natural language descriptions, requests, and examples, tweaking the prompt until it “understands” & it meta-learns the new task based on the high-level abstractions it learned from the pretraining. This is a rather different way of using a DL model, and it’s better to think of it as a new kind of programming, where the prompt is now a “program” which programs GPT-3 to do new things.

https://www.gwern.net/GPT-3#prompts-as-programming

Almost all the GPT-3 results you see on Twitter are via the OpenAI API - no finetuning, only prompting.

That implies...

> Without the model, it is not possible to adapt it to any other purpose, unless OpenAI does it for them

... that we're actually very far from plumbing the possible ranges of behavior of GPT-3 with different sorts of prompting.

This is a new ballgame folks. The old rules don't quite apply here.


I don't really understand your post (and don't know much about GPT3) are you suggesting that the model is stateful in that it can continue learning from successive prompts?

Maybe you can elaborate on this:

>instead, you interact with it, expressing any task in terms of natural language descriptions, requests, and examples, tweaking the prompt until it “understands” & it meta-learns the new task based on the high-level abstractions it learned from the pretraining.

Or are you suggesting that it has such a deep network of abstractions that once a user starts to map that out, the mileage they can extract back out of the model via prompts is very exciting.


The model is not stateful, but you can emulate state (certainly with GPT-3, but also with other language models) by simply feeding back earlier output.

For example, to simulate a chatbot, you start with a prompt. You then successively feed longer and longer chunks of the full chat back to the model, taking incrementally generated lines as the new AI's reply.

This is essentially how some of the 'use GPT-2 as a chatbot' front ends work in the world. This is also extended to make things like AI dungeon work: you can force the model to keep context within its attention by providing a good summary in the prompt.

To speculate a bit on why this seems to work, these models are massive and have read millions of texts in their corpus. Instead of 'retraining' on text which the model probably has already seen, the prompt is nudging the model to identify where in its on weights its encoded the knowledge before.


I don't think the claim is that the model is "stateful" in that it continues to learn from prompts. I think it's that the model no longer requires retraining for different situations; instead it has "learned" a set of lower (higher?) level abstractions from which those same (and possibly new) situations can be constructed dynamically from the input prompt.


> no-one can train GPT-3 in any acceptable time except for those with access to large GPU/ASIC compute power (OpenAI, Microsoft, Google, NVIDIA, etc.)

Any state actor has access to large compute power


I thought it cost about $10 million to train. Honestly, seems fairly cheap all things considered.

If it could be 100x smarter for only 100x more (whatever handwavey thing that really means) it would be a steal considering how the same model could be reused by thousands of companies without retraining.


But it might be hard to attract talented AI engineers to live in, say, North Korea


"... it might be hard to attract talented AI engineers to live in, say, North Korea ..."

There are other ways to bring talent into North Korea:

"... Choi was abducted and taken to North Korea by the order of Kim Jong-il. While searching for Choi after her abduction, Shin was also abducted and taken to North Korea soon after."

"... In North Korea, Choi and Shin were remarried, at Kim's recommendation.[5] Kim had them make films together ..."

https://en.wikipedia.org/wiki/Choi_Eun-hee#Abduction_and_yea...


How much of a choice do they have? I can't imagine many of the top engineers in North Korea are allowed to leave


When most of your population is borderline starving the number of top engineers can be counted on very few hands in NK.


Still enough to build a nuclear arsenal. So that's probably enough to build AI talent


Their nuclear arsenal is built with Chinese and Russian brains, not NK's native technology.


That being said, North Korea has a fairly decent cyberespionage program from what I hear.


Koreans can become talented AI engineers.

The DPRK has a history of simply abducting people with the talents that the DPRK requires. A plausible (although unlikely) scenario is that the DPRK abducts a handful of AI experts and forces them to train Koreans in AI.

Alternatives are for Koreans to learn abroad or for Koreans to learn online.


I'm not saying that North Koreans cant be good engineers, I'm saying that it's going to be an uphill struggle for NK to compete for tech talent when US/Chinese companies have so much more of the capital required to build these models


The DPRK doesn't have to compete for tech talent in the same way most countries do, because other states probably won't be able to poach tech talent away from the DPRK.

And how much talent would the DPRK really need to do impactful work with GPT-3 (assuming that it really can be used for be used for nefarious purposes)?


Yes, as if in North Korea they had access to the Google Maps API or AWS or at least github just to mention a few, what you mention is a bad example. How about a less biased example?


What about North Koreans desperate to be part of the privileged elite? They can take courses and learn just like anyone else.


I’m genuinely curious how accessible that sort of material is to the average North Korean.


Propaganda, advertising, and politics (arguably all the same thing) absolutely have access to the funds for this.

Facebook and Google, the biggest investors in state of the art ML are advertising companies. Their customers have the funds to do this independently.


Giving certain people, highly visible on social media, pre-public access to the model, and letting them cherry pick their completions to post without the prompt or amount of tries, is a smart form of propaganda/hype building/PR management, that we have come to expect from "GPT-2 is too dangerous to release" openAI

Sometimes I forget that, while this model was created by scientists, and released with a scientific paper, it is essentially a for-profit business product, and such cheap tricks deserve harsh criticism.


> Sometimes I forget that, while this model was created by scientists, and released with a scientific paper, it is essentially a for-profit business product, and such cheap tricks deserve harsh criticism.

Sure, but this is akin to seeing bad science journalism and tarring the science itself with the same brush. GPT-3 still factually has certain properties, independently of anyone making grandiose assertions about those properties.

What those properties are, we can only say slightly—e.g. we know it’s capable of generating certain texts eventually, among an unbounded corpus of other texts it may have generated that were then human-discarded. But the fact that it can generate those texts at all—faster than brute-force, I mean—is an interesting fact on its own, worthy of scrutiny independent of whatever airier claims are being made.


It is certainly impressive, and I don't want to discard GPT-3. Just critiquing the (smart) release: make a select few feel special by giving them API access, and watch your product dominate the tech - and news cycle for weeks. You'll have VC money in the bank before showing actual worth or business value.

Maybe a bit simplistic, but I view GPT as a Markov chain text generator, operating on word vectors instead of word tokens, and having a larger look-back. It's like a child copying a joke, because she heard adults laughing about it, but she does not understand the punchline. You wouldn't say that child understands or even displays humor, despite substituting "horse" with "donkey" when retelling the joke.


If you want to play with GPT-3, you can do so right now.

Go to https://play.aidungeon.com Make an account, and select the "Dragon" model. That's GPT-3.

I've spent ten hours playing with it over the last two days. It isn't perfect, and it feels short of the hype it's generating about itself, but it's an amazing leap nonetheless. It really seems to have an understanding of causality, biology, all sorts of fictional themes...

It isn't perfect. You frequently have to back it up and try again. Unless you make good use of the site's long-term memory function, it'll forget anything that happened over a page ago, and a lot of the time its idea of what should happen next doesn't match the plot I had in mind. I'm getting better at that.

However, as a writer myself, I can say that this is just as true for human writers as well. For every final draft you see there are ten discarded ones, and a hundred that never made it to paper.

Viewed that way, GPT-3 is actually much better at the core part of writing than I am! It's more creative, it uses English better, it's better at matching the narration to the characters than I am...

It's just that this isn't enough. It's missing a full model of the world, and it doesn't know how to look at what it's written and decide if it matches its intent, or whether it'll break consistency or get in the way later.

It doesn't have an intent. It doesn't know about consistency.

But that's also true for that part of me.

GPT-3 isn't a human-level writer. What I've determined, however, is that it's a huge part of one, and it's more than good enough to fulfill the role of that part already. Now we just need the other nine tenths.


> it doesn't know how to look at what it's written and decide if it matches its intent, or whether it'll break consistency or get in the way later.

And we can build other models specifically for this. We don't need to add this stuff to GPT-3; GPT-3 can literally act as a part, a component. GPT-3 can serve the role in a larger model that "imagination" does in a human brain—being fed inputs; having corresponding outputs scavenged through by the rest of the model; and then being "fed back" with input that relates to the scavenged outputs.

One thing I'd be very curious to see tried, is to get a system consisting of GPT-3 as "writer", and some other (summarization?) model as "editor", to attempt to dramatize or adapt into prose fiction, a machine-readable sequence of events (e.g. a machinima recording of a stage-play enacted within an MMO game.)

We already have models that turn machine-readable sequences of events directly into prose; see e.g. baseball news reporting. Such models can work just as well in reverse, summarizing in-domain prose back into machine-readable facts.

So if you take such a prose-to-factual-assertions "reading comprehension" model, and feed it GPT-3's output; and then measure the distance between the set of events comprehended by the "reading comprehension" model from GPT-3's output, and the source data (which is also in the form of a set of factual assertions), then you can iterate GPT-3 — maybe even one additional line of prose at a time — to find a story that is a consistent adaptation of the source. In this sense, GPT-3 is acting as a programmer, and the "reading comprehension" model as a compiler — with the compiler reaching out and erasing any line that doesn't compile.

Of course, you're limited in this by the "reading level" of the reading-comprehension model. But this is also true of humans; you can't get out a literary classic if the writer's editor and alpha-readers were five-year-olds.


The domain is play.aidungeon.io and the GPT3 based version is only available to sponsors right now.

After seeing that the domain name didn't work I thought for a moment that your post was GPT3 output-- imaginary URLs is a good GPT2 tell--, but some research shows that there actually is a GPT3 version:

https://medium.com/@aidungeon/ai-dungeon-dragon-model-upgrad...


It's only $10 to get access.


> no-one can train GPT-3 in any acceptable time except for those with access to large GPU/ASIC compute power

That's not true.

a) Premium AWS customers, for example, can request to have instance limits removed which then gives you access to all of the AWS GPU-enabled instances available worldwide.

b) People ramble on about GPUs but Intel DLBoost/AVX-512 enabled CPUs can get you comparable performance to a medium end GPU in many situations. That then opens the door to training across all of the cloud and VPS providers.

Money is the limiting factor here not available compute resources.


Do you have any links I can read in the latter claim? Using avx512 for ai with comparable power to a GPU?


Uh-huh. Except anyone can verify the results themselves, sans specific ones like code generation - through (paid version) of AI Dungeon.

Amount of people confidently posting bullshit on Hacker News is astounding. Reacting like everything we know about GPT is just a bunch of tech demos. Supposedly everyone who has access to the API is just a shill. Eh.


I just spent an hour or so playing around with the paid version of AI Dungeon, and was super unimpressed. It's pretty fun for a moment, sure, and I assume some really heroic work went into building it. I'm not saying the creators did a poor job so much as the task is really hard and the final result is...lacking.

The "Dragon" (GPT-3) engine responds reasonably to any particular input, but clearly lacks a coherent state of the world. Objects appear and disappear; plot cues are given and then can never be summoned again if not immediately grabbed, environments change dramatically without explanation, etc.

Do you feel otherwise?


Right. It has some type of language ability, but no world modeling. So overall it really doesn't make sense.

But since that is so obvious, I assume many people are trying to figure out how to improve it. So I am excited to see if they can make progress in the next few years.

It is going to be quite difficult though. I think it might require integrating a totally different type of subsystem, if it is possible at all.

But the ability to make realistic sounding language is a step forward it seems to me.


Why do you believe they won't release the model? It's Open AI after all.

Google, NVIDIA and Microsoft also make their models freely available. Google already trained a model which is bigger than GPT-3.

There might be some delay, but no fundamental problem with it.

Fine-tuning the entire GPT-3 is impossible on a single GPU. But it's still possible to fine-tune specific layers. Plus, I'm sure somebody will release a distilled version of it which is more manageable.


>Why do you believe they won't release the model? It's Open AI after all.

Because they want to commercialize the model "to cover the costs of research", it's stated in their blog post FAQ[0].

>Google already trained a model which is bigger than GPT-3.

Source?, a fast Google search yielded no results and I'm curious.

[0]https://openai.com/blog/openai-api/


They commercialize the API. The model is so large you can't run it on a single GPU, AFAIK. OpenAI developed an infrastructure to run the model efficiently. So many companies would rather use API than deploy the model on their own hardware.

They will probably release the baseline model, not model optimized for deployment. There are many optimizations possible such as precision reduction, pruning, distillation, and they don't have to share these optimizations.

> a fast Google search yielded no results and I'm curious.

List of pre-trained models on huggingface: https://huggingface.co/models

You can see some of them are prefixed with "google". Also ALBERT is from Google Research.

That's just natural language processing, I dunno what they have in other fields.


They trained a >600B parameter translation model (GPT-3 is 175B parameters).

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (https://arxiv.org/abs/2006.16668)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: