Interesting that they reduced the memory usage by half. This would address what is IMO the biggest problem with local LLMs: the limited number of parameters resulting in answers that are not very good.
Also it's funny that they are saying that Llama 4 Maverick performs about the same as GPT-4.1 Nano.
Also it's funny that they are saying that Llama 4 Maverick performs about the same as GPT-4.1 Nano.