That's a rather odd comparison to make. First of all, OP, like llama.cpp, doesn't use the GPU – in contrast to most Python ML code. It's not hard to write Python code that "optimally exploits" the GPU. You might call the GPU a "specialized environment to build and run" but it's arguably much better suited to the problem.
Second, OP, like llama.cpp, produced efficient and highly specialized code after it was clear the model being specialized for (StableDiffusion / LLaMa / …) works well. Where Python shines, though, is the prototyping phase when you have yet to find an appropriate model. We have yet to see this sort of easy & convenient prototyping in C++.
Now, this is not to take away anything from the fantastic work that's being done by the llama.cpp people (to whom I also count OP) in the "ML on a CPU" space. But the problems being solved are entirely different.
> Where Python shines, though, is the prototyping phase when you have yet to find an appropriate model. We have yet to see this sort of easy & convenient prototyping in C++.
+1.
To produce a highly-optimized C/C++ kernel that utilizes the CPU to the fullest extent, it requires tremendously amount of talent and expertise. For example, not everyone can write a hand-vectorized kernel with AVX2 intrinsics (outside a few specialized applications like 3D graphics, media encoding, and the likes), and even fewer people can exploit the underlying feature of the algorithm for optimization, such as producing usable output at greatly reduced numerical precision. The power of LLM provides strong motivation to drive the brainpower of countless programmers all over the world to do just that. New techniques are proposed and implemented on a monthly basis, with people thinking and applying every possible trick on the LLM optimization problems. In this regard, moving from Python to C is totally reasonable.
In comparison, right now I'm working on optimizing a niche open-source scientific simulation kernel with a naive C codebase. Before me, there were hardly any contributors in the last decade.
Python has its place because not everyone has a level of resource and expertise comparable to ML. In particular, when the bulk of the data processing of a Python script is in done in a function call to a C++ or FORTRAN kernel like scipy, the differences between naive C and naive Python code (or Julia code if you're following the trend) are not that much, especially when it's a one-off project for just publishing a single paper.
Yeah i make a living in the GPU space. I think my comment comes from colleagues having to hold my hand to set up their ML / Python environments with all of their picadellos. In fact its bad enough that i have to use docker to create an insular environment tailored to their specific setup. And Python is like a 1000 times slower when its not using other libs like numpy.
Everyone has their own way to do this. Every step is broken by some unfamiliar dependency that requires special arcane knowledge to fix. Part of me is a grumpy old man that doesn’t gravitate to the shiny new tools that come out every week that the younger devs keep up with :)
pip and venv are neither shiny nor new, it's the standard way of doing things for a while. I am an outsider to python and am incredibly thankful for this standardization, because i agree getting python env set up correctly before venv was a huge pain
If your guys arent on this I'd suggest you get them on it, it dramatically simplifies setup
Here is a tiny excerpt try to get dvc to work just so I could get the training weights for deployment ... remember I don't develop much w Python...
$ dvc pull
Command 'dvc' not found, but can be installed with:
sudo snap install dvc
$ sudo snap install dvc
error: This revision of snap "dvc" was published using classic confinement and thus may perform
arbitrary system changes outside of the security sandbox that snaps are usually confined to,
which may put your system at risk.
If you understand and want to proceed repeat the command including --classic.
ok I get dvc installed somehow -- don't remember. Time to get the weights...
$ python3 -m dvc pull
ERROR: unexpected error - Forbidden: An error occurred (403) when calling the HeadObject operation: Forbidden
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
Finally I just have my colleague manually copy the weights. This kind of thing went for hours.
Thanks… i know my colleague uses it a lot. I generally use his models and don’t do much ML development yet. At some point I need to properly learn all of this. It seems ML tools are only for developers not for those who simply want to deploy and use the resulting NN.
> Are they not using venvs or something? It should be as simple as python -m venv venv; ./activate; pip install -r requirements.txt
In most cases, it would be possible to do close to that, but it is extremely common to run into things being distributed in the AI/ML space with install instructions that don’t include that, and instruct you to have a global install of a certain Python version, and then to pip install the dependencies (and globally install non-Python package dependencies, if there are any), so even if they’d work in a venv, you have to (1) indepently know you should be doing that, and (b) translate the instructions – which where (1) applies is usually trivial if all the dependencies are proper python packages, but can be more involved otherwise.
So, yeah, I can see that a lot of the time the path of least resistance is just to create an isolated container environment for it.
Unfortunatly its not that simple expecially for NVIDIA driver and cuda install. That's why we usually use conda that can handle cuda install but even with that some time it work flawlessly and some time not.
>You might call the GPU a "specialized environment to build and run" but it's arguably much better suited to the problem.
I feel like the person you're replying to knows that the GPU is better suited than the CPU to do this task, and your argument doesn't really make sense. I think they were referring to the python venv environment with all the library dependencies as the "specialized environment"
The point is that as awesome as this repo is it doesn't do much to ween the "ML folks" off of Python since it doesn't provide the flexibility and GPU support that people designing and training DL systems rely on.
I don't disagree that Python environments are a mess. I'm actually a developer on quite a prominent large scale neural network training library and a DL researcher that uses said library. With my developer hat on I like to have minimal dependencies and keep Python scripting as decoupled as possible from the CUDA C++ implementation. With my researcher hat on I don't want to be slowed down by C++ development every time I want to change my model or training pipeline. At least for me, C++ development is slower and more error prone than modifying Python.
Obviously doing any heavy lifting in Python is a bad idea. But as a scripting language I think it's good, especially if you keep the environment simple. I don't think the answer for DL training is to dump Python entirely and start over in pure C/C++/Rust/Julia/whatever. Learning C/C++ is too big of an ask for everyone working on the model design and training side and it would slow down progress significantly - most of that work is actually data munging and targeted model tweaks. But I do think there's still a lot that can be done to decouple Python from the underlying engine and yield networks where inference can be run in a minimal dependency environment. There's lots of great people working on all these things.
>That's a rather odd comparison to make. First of all, OP, like llama.cpp, doesn't use the GPU
When was the last time you looked at llama.cpp? It has supported GPU, GPU+CPU, and distributed inference using OpenMPI for awhile now. It also supports training, as well as negative prompting and grammars! The ease of getting llama.cpp running on just about anything has already started innovation.
not sure what "It's not hard to write Python code that "optimally exploits" the GPU", exactly means but Python is so far from exploiting the GPU resources even with C/C++ bindings that it's not even funny. I am sure that HPC folks would have migrated way from FORTRAN and C/C++ long time ago if it was so easy.
I wasn't trying to claim that Python is great at fully exploiting GPU resources on generic GPU tasks. But in ML applications it often does, at least in my experience.
Second, OP, like llama.cpp, produced efficient and highly specialized code after it was clear the model being specialized for (StableDiffusion / LLaMa / …) works well. Where Python shines, though, is the prototyping phase when you have yet to find an appropriate model. We have yet to see this sort of easy & convenient prototyping in C++.
Now, this is not to take away anything from the fantastic work that's being done by the llama.cpp people (to whom I also count OP) in the "ML on a CPU" space. But the problems being solved are entirely different.