My hive mind connection must be good because I literally finished this course ye...

tysam_and · on April 6, 2023

I made a smaller GPT model that started from Andrej's code that converges to a decent loss in a short amount of time on an A100 -- just under 2.5 minutes or so: https://github.com/tysam-code/hlb-gpt

With the original hyperparameters, it was 30-60 minutes, with a pruned down network and adjusted hyperparameters, about 6 minutes, and a variety of optimizations beyond that to bring it down.

If you want the nano-GPT basically feature-identical (but pruned down) version, 0.0.0 at ~6 minutes or so is your best bet.

You can get A100s cheaply and securely through Colab or LambdaLabs.

callistus · on April 6, 2023

> his A100 GPU (a $10,000 GPU)

These are available to rent per hour at much lower costs. The author mentions this in the video description.

Buttons840 · on April 6, 2023

True, as much as I enjoy owning and controlling my own hardware, buying an A100 and then letting it sit idle while I procrastinate and play video games probably isn't the best use of resources. He did say "my GPU" (or similar) at one point, and I thought maybe he does enough ML stuff that he bought his own.

nl · on April 6, 2023

If you have an NVIDIA gaming GPU you can train reasonable transformers.

oh_sigh · on April 6, 2023

Approximately 40 cents USD for 15 minutes from cursory research.

Buttons840 · on April 6, 2023

I'm completely unfamiliar with this market. Do you rent these on AWS? Or where?

sva_ · on April 6, 2023

https://jarvislabs.ai/pricing/

$1.29 per hour for a 40gb a100 apparently

https://lambdalabs.com/service/gpu-cloud#pricing

$1.10 per hour

ranguna · on April 6, 2023

If you're ok with 24GB you can use a 3090 https://www.genesiscloud.com/pricing for 0.70$/h