Led by Harvard medical student Arya Rao, a research team published in JAMA Network Open this week the results of a study that examined 21 leading off-the-shelf AI models in 29 standardized clinical vignettes. The bots all did fairly well when provided a full portfolio of medical information and asked to make a final diagnosis, with leading models correct 91 percent of the time. Early differential diagnosis, where clinicians try to rule out certain conditions while weighing various possibilities, is where that more than 80 percent failure rate comes in.

“Every model we tested failed on the vast majority of cases,” Rao told The Register in an email. “That’s the stage where uncertainty matters most, and it’s where these systems are weakest.”

In other words, it’s the midnight anxiety-fueled WebMD rabbit hole of yesterday all over again, just supercharged with AI that’s probably even more likely to get things wrong than you are without it.