LLMs fail in 8 out of 10 early differential diagnosis cases

supersquirrel@sopuli.xyz · 12 days ago

LLMs fail in 8 out of 10 early differential diagnosis cases

Meron35@lemmy.world · 12 days ago

This is commonly done for the purposes of replicability, but is not at all how these models are deployed in practice.

Larger institutions, especially those with strict data privacy requirements, are deploying locally hosted models permanently RAGed to their own internally vetted documentation.

It would’ve been much more interesting to see how much RAG setups fail, contrary to their marketed promises.

From experience, RAGs do help reduce hallucinations, but LLMs still do dumb things, like jumble up numbers. There were many cases where the LLM confidently presented some numerical results, but the number existed somewhere else entirely, like a footnote on the same page.