The way things are going, we'll see more efficient and faster methods to run tra...

The way things are going, we'll see more efficient and faster methods to run transformer arch on edge, but I'm afraid we're approaching the limit because you can't just rust your way out of the VRAM requirements, which is the main bottleneck in loading large-enough models. One might say "small models are getting better, look at Mistral vs. llama 2", but small models are also approaching their capacity (there's only so much you can put in 7b parameters).

I don't know man, this approach to AI doesn't "feel" like it'll lead to AGI—it's too inefficient.