What's New in Surgical AI: 11/19/22

AI Tools for Research, and Reevaluating how we Evaluate Surgery

Dhiraj J Pangal

and

Daniel Donoho

Nov 19, 2022

Welcome back!

This week, we focus on two topic areas and highlight one paper of the week in surgical aritifical intelligence.

🧬A.I. Based Research Tools ~~are~~ were here
🤗 How might AI assistants for medical writing actually work?
🏆 Best paper(s): What does US News and World Report have in common with “The Wire”, and what can surgeons learn from smoking cessation programs?

🪐 Galactica: The hype cycle of LLM’s claims another victim

Meta META 0.00%↑ is laying off 11,000 employees, but didn’t stop the production machine that is Meta A.I. Their most recent endeavor is was Galactica, a large language model (LLM) trained on over 100 billion text sequences representing scientific information (and nearly 50 million papers). The model was touted as outperforming most state-of-the-art language models in summarizing scientific text, writing code, and also produces citation suggestions.

Papers with Code @paperswithcode

🪐 Introducing Galactica. A large language model for science. Can summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more. Explore and get weights: galactica.org

But, as you may have guessed it, not all that glitters is gold.

Right on cue, Galactica was tested by users, and the “hallucinatory” abilities of LLM came into conflict with the need for precision and accuracy in scientific and biomedical contexts.

Michael Black @Michael_J_Black

I asked #Galactica about some things I know about and I'm troubled. In all cases, it was wrong or biased but sounded right and authoritative. I think it's dangerous. Here are a few of my experiments and my analysis of my concerns. (1/9)

What is LLM hallucination, you might ask? Natural language generation models can produce provocative and lifelike answers, but they are known to produce authoritative sounding text that lacks fidelity to the input source data. For example, generating a new article for a “Meryl Streep - Jerry Seinfeld theorem” even though no such collaboration ever occured (aka Fake News).

Using specific words or phrases such as “think step by step” or “cite your sources” can overcome these difficulties, but the risks are just too high. In biomedical text generation, LLM hallucinations create wild mixtures of fact and fiction that can only be described as dangerous bullshit (more). In 2022, using an LLM to substantially assist with scientific writing is like closing your eyes and trusting your car to drive itself through rush hour. It might get you there, but you might end up in New Jersey by mistake.

It all comes down to the specific use case. There is no such thing as a generalizable LLM (SOTA 2022). The best LLM are those that are trained and designed for use within a specific community to perform a specific function, such as copywriting for websites that sell consumer products.

🚨Academic Medical Use Cases for LLM 🚨

What does this all mean? tl;dr: A.I. based research tools are coming, but will be developed and optimized individually for specific use cases.

One of the first use cases that might impact our lives in academic medicine is literature search and review. Tools like Elicit can rapidly search and summarize the corpus of PubMed data better than its’ native search.

For example: A paper you know well just eludes your PubMed searches. Or you know the [insert University here] group just published on this topic, but you don’t recall where. Or was it JAMA or JAMA Surgery you read that review in? No way to find that, but with some assistance from an Elicit-type engine, you might just have a shot.

Another use case is in helping with manuscript formatting for publications. We all know the pain of just missing the correct reference format, page widths, abstract format, etc. An LLM could actually be helpful here - “translate from Nature: Digital Medicine to Operative Neurosurgery format” with one click (of course, the better answer is to junk all of the formatting requirements and let the editors handle it after acceptance, but hey…)

I would pay for that, wouldn’t you?

It may also help brainstorm new areas of interest: “has anyone investigated the impact of a surgical data science newsletter on interests in AI?” (the answer is no)

What does that mean for us?

Get smart. Play with these tools, with the understanding they may not make your life much easier right now. But familiarity with them will allow for ease of adaptation once they become better integrated into the research workflow
Proofreading citations? The Galactica example demonstrated the dark side to these tools. It will not be long until a nonexistent paper makes its way into your very existent bibliography via an enterprising graduate student or resident hoping to streamline their citations.

🏆Paper(s) of the Week: US News and World Rankings, The Wire, and Value-Based Care

Two pieces in JAMA Surgery caught our attention this week, and get at one of the central themes of this newsletter.

US News and World Report (a defunct newsmag from the late 90’s) has successfully created a borderline-unethical arms race by allowing “reputation” to drive their rankings schema

Whistleblowers at Cedars-Sinai highlighted the flawed metrics used by the US News and World Rankings (USNWR) to judge hospitals- namely:

Marketing campaigns undertaken by hospital systems to accrue votes from board certified physicians
A lack of correlation between rankings and objective metrics
That opinions of specialists thousands of miles away are at best worthless (more likely, harmful) in assessing hospital quality.
The incentives for people to “upvote” their own training programs, and “not vote” for competing hospitals1

Still, these rankings MATTER. Last year, I had a patient choose to come to my clinic rather than a competitor simply because we were ranked 10 spots higher in USNWR. And like all rankings that matter, the incentives to game the rankings are stronger than the incentives of the publisher to maintain their “accuracy” (whatever that even means). David Simon taught us this lesson in “The Wire” (season 4 episode 9 amongst many), and we still haven’t learned.

Low-value care persists within surgery. Fixing it requires changing habits; more data isn’t (the only) the solution

Even after surgical procedures are shown to deliver minimal value to patients, surgeons rarely change practices.

Simply providing surgeons with “the evidence” doesn’t improve care - perhaps not shocking to anyone who has encountered human begins, but perhaps quite shocking to those who haven’t seen the other side of the surgical drapes.

Instead, “extinguishing” (“extincting?”) these low-value practices requires a process oriented approach, just as we might expect from every other arena of human life (think smoking cessation, weight loss, etc.)

(Pitt and Dossett, 2022, *JAMA Surgery*)

We are given a few examples of successful practice-changing interventions in surgery2:

“Clinician education and audit and feedback were successful in reducing the use of unnecessary electrocardiogram before cataract surgery from 97% to 13%
..A strategy leveraging clinician education, local practice champions, and peer comparison reduced the use of low-value sentinel lymph node biopsy in older women with breast cancer
…a pay-for-performance program reduced unnecessary postacute care after joint replacement.”

In short, surgeons are humans too and their habits are driven by 1) their knowledge on a topic, 2) motivation to change from people they trust, 3) comparison to their peers and 4) the right incentives.

We think data and A.I. can help drive this process. As operating room metrics and advanced analytics become increasingly prevalent (we are working on this …), they provide an opportunity to identify low-value practices. Ultimately, surgeon’s want to provide high-value care. And patients should know their surgeon can deliver high-value care.

Our current system does a poor job of both. However, improving surgical care depends on more than “just the facts”. We will have to develop advanced human-centered systems for improving care in conjunction with the data that we generate.

Who’s ready for that challenge?

Feeling inspired? Drop us a line and let us know what you liked.

Like all surgeons, we are always looking to get better. Send us your M&M style roastings or favorable Press-Gainey ratings by email at ctrl.alt.operate@gmail.com

Ctrl-Alt-Operate

Discussion about this post