In search for a new self-hosted LLM

Tanka@lemmy.ml · edit-2 2 months ago

In search for a new self-hosted LLM

SuspciousCarrot78@lemmy.world · edit-2 28 days ago

deleted by creator

ejs@piefed.social · 2 months ago

I suggest looking at llm arena leaderboards filtered by open weight models. It offers benchmarks at a very complete and statistically detailed level for models, and usually is quite up to date when new models come out. The new Gemma that just came out might be the best for 1x GPU, and if you have a bunch of vram check out the larger Chinese models

James R Kirk@startrek.website · 2 months ago

Just curious, what does “some automation” entail? I thought LLMs could only work with text, like summarize documents and that sort of thing.

SuspciousCarrot78@lemmy.world · edit-2 28 days ago

deleted by creator

James R Kirk@startrek.website · 2 months ago

That’s cool, it just… does those things? How does it connect to those apps? I can’t even get Gemini to set a reminder and that’s on a Google device.

SuspciousCarrot78@lemmy.world · edit-2 28 days ago

deleted by creator

James R Kirk@startrek.website · 2 months ago

That was actually super helpful, thank you.

SuspciousCarrot78@lemmy.world · edit-2 28 days ago

deleted by creator

Jozzo@lemmy.world · 2 months ago

It’s done by software using an LLM, not just a raw LLM. They do only work with text, but you can get it to output the text “get_weather(mylocation)”, and instead of just outputting that directly to the user, the software running on top of the LLM runs a " get_weather" function that calls some weather API. The result of that function is then output to the user.

Any time you see an “AI” taking “actions”, this is what happens in the background for every action.

SuspciousCarrot78@lemmy.world · edit-2 28 days ago

deleted by creator

iceberg314@slrpnk.net · 2 months ago

I also recommend gemma4 or qwen3.5. Both super solid in my experience for how lightweight they are

NoFun4You@lemmy.world · 2 months ago

Still can’t get my gemma to give me complete unbuggy components

iceberg314@slrpnk.net · 2 months ago

I guess I have been using gemma4 fro more role playing games. Qwen3.5 seems to be better coder actually

nutbutter@discuss.tchncs.de · 2 months ago

Have you tried the new gemma4 models? The e4b fits in the 12gb memory and is pretty good. Or you can use 31b too, if you’re okay with offloading to CPU.