Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My hive mind connection must be good because I literally finished this course yesterday.

It was very satisfying to learn how transformers worked, to finally be able to turn the obscure glyphs of the research papers into real code, but I think transformers are too big for what I can do on my own computer. The author mentioned that the toy transformer he was building in the final video took 15 minutes to train on his A100 GPU (a $10,000 GPU), and the results weren't even that good; the transformer was spelling words correctly using character level tokens, I guess that's something, but it's not GTP4.

Even so, there were a lot of good tips to pick up along the way. This is a great series that I'm thankful to have. The "Backprop Ninja" video was hard work, you manually calculate the gradients and then compare your calculations against PyTorch. It's great to have instant feedback telling you whether your gradients are correct or not.



I made a smaller GPT model that started from Andrej's code that converges to a decent loss in a short amount of time on an A100 -- just under 2.5 minutes or so: https://github.com/tysam-code/hlb-gpt

With the original hyperparameters, it was 30-60 minutes, with a pruned down network and adjusted hyperparameters, about 6 minutes, and a variety of optimizations beyond that to bring it down.

If you want the nano-GPT basically feature-identical (but pruned down) version, 0.0.0 at ~6 minutes or so is your best bet.

You can get A100s cheaply and securely through Colab or LambdaLabs.


> his A100 GPU (a $10,000 GPU)

These are available to rent per hour at much lower costs. The author mentions this in the video description.


True, as much as I enjoy owning and controlling my own hardware, buying an A100 and then letting it sit idle while I procrastinate and play video games probably isn't the best use of resources. He did say "my GPU" (or similar) at one point, and I thought maybe he does enough ML stuff that he bought his own.


If you have an NVIDIA gaming GPU you can train reasonable transformers.


Approximately 40 cents USD for 15 minutes from cursory research.


I'm completely unfamiliar with this market. Do you rent these on AWS? Or where?


https://jarvislabs.ai/pricing/

$1.29 per hour for a 40gb a100 apparently

https://lambdalabs.com/service/gpu-cloud#pricing

$1.10 per hour


If you're ok with 24GB you can use a 3090 https://www.genesiscloud.com/pricing for 0.70$/h




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: