

3·
14 days agoI’ve had better luck with llama.cpp for opencode. I’m guessing it does formatting better for tool use.


I’ve had better luck with llama.cpp for opencode. I’m guessing it does formatting better for tool use.


Gemma4 doesn’t Turboquant. But it is leaner on the KV cache.
edit: looks like there are forks that do turboquant already
I’ll bring two theories to the table.
a) they got caught distilling for their own models b) they re-sold their $200/mo plans as APIs