What’s New in Surgical AI: 6/18 Edition
State of the A.I. Union: What has actually been implemented?
Welcome back! If you’re new to ctrl-alt-operate, we do the work of keeping up with AI, so you don’t have to. We’re grounded in our clinical-first context, so you can be a discerning consumer and developer. We’ll help you decide when you’re ready to bring A.I. into the clinic, hospital or O.R.
This week, medicine x AI was featured in front-page articles in the NYT and WSJ, which we will recap below. We’ll even give you a small mission statement based on some of our earlier lessons learned in AI and ML. Our deeper dive builds on the news updates with a more incisive look at AI use today.
Table of Contents
📰 The News: From the front page of the NYT to the reference list of Nature, AI is everywhere
🤿 Deep Dive: …OK so Where is AI in the hospital?
🪦 Best of Twitter: RIP
📰The News: From the front page of the NYT to the reference list of Nature, AI is everywhere
Here’s a quick rundown of the top articles in mainstream media this week in AIxmedicine:
Are physicians using chatGPT to be more … humane? AI doesn’t get tired and doesn’t have the same counter-transference problems as humans (but it very well may have its own hidden biases). Caveats abound
On the other hand, hospitals are misusing AI to overrule human clinical decisions. It should surprise NO ONE that healthcare IT systems’ implementations of AI are at high risk of making things worse for patients, caregivers, and hospital systems. Here are the underlying laws that we should consider:
Goodhart’s Law (Once metric becomes a target, it stops measuring the underlying)
Smith’s Law (“Murphy was an optimist”)
Arthur C Clark’s three Laws (especially #3, advanced tech is indistinguishable from magic)
the Dilbert/Peter principle (at any moment, all managers in an organization are incompetent)
Brandolini’s law (the effort required to refute “BS” is an order of magnitude greater than that required to produce it … and it took all the GPUs to create the latest shiniest LLM)
Shirky’s principle (“Institutions preserve the problem to which they are the solution.”)
So where does this pessimistic view take us? Healthcare AI will provide magic-sounding solutions that are flawed but irrefutable, attracting minimally competent leaders & mis-incentivized organizations that are targetting useless metrics, and everything that can go wrong, will.
Rather than giving up, we propose that:
we, the people in the arena,
undertake unglamorous and difficult work that
demystifies AI/ML technologies,
to continually implement and re-align AI/ML applications
with their actual effects on human health.
That’s why we are here.
Ever wanted to know how much doctors make? A guy with no coding experience built a data aggregator from physician postings in one month using chatGPT. Click here to learn how he did it. Is it accurate? No. Is it useful? YES! This is a demonstration of going from 0 experience → web app, and an example of the power of aggregating information only present in disparate corners of the web! You could do this (better).
The eyes are a mirror: From high-quality images of eyes, neural nets can rebuild the reflections that the subject is looking at behind the camera. Watch out
!
The Citation Question - what you need to know in June 2023. With ChatGPT inching towards mainstream adoption, major journals and publishers have begun to develop policies, style guides, and references regarding the use of generative AI in text, images, audio, and video.
Generative AI (such as large language models (LLM) and Generative Pretrained Transformers) is a family of algorithms that create a “most likely” output to follow the inputs. The output might be more text, an image, an audio file, a video, or some other type of data, depending on the underling algorithm. For example: If you type a question (called a “prompt”) into chatGPT4(3.14.2023,no plug-ins), it will read your prompt and generate a series of characters in response. Owing to the process through which the algorithm was created, it is overwhelmingly likely that you, a human, will recognize its output as a response to the question.
Nature says: “no” to images and video, “yes” to text. In a recent position paper, the Nature publishing group says that it will not publish images and video created using generative AI. Text, for now, is permitted as long as it is appropriately identified as such.
APA provides a style sheet. For those of you using the APA publication style, here’s how to cite LLMs. And here’s the AMA policy.
What about Grammarly, gMail, and MS Office? We’ve all lived in a world where spell-check and grammar-check are baked into our word-processing systems, and all of these products are increasingly incorporating generative AI. This article, along with most of what I write, is written using Grammarly. But do its recommendations come from a generative model or a “different” AI? And is it so obvious that one is acceptable, and the other isn’t? Furthermore, these proprietary details are obscured in the user interface. There is a separate AI product that is being sold for explicit text generation (“Write an email to Mike telling him he’s fired, be polite, reference the Noodle Incident and the Salamander Incident.”) Should I report the entire “tech stack” to Nature in my next submission?
🤿Deep Dive: How is A.I. Being Used Today?
We, including on this newsletter, oftentimes discuss A.I. as a monolith that will one day “arrive”, as if it is a space shuttle we expect to land somewhere. The reality is it can be treated more like a class of drugs, like immunologic therapies. Advances in technology / research enabled the development of AI (like immunotherapy). As more therapies are developed, some will be duds and others ultra mega hits which become mainstays of treatment from here on out. And as the years go, immunotherapy (and AI) will get integrated into the care continuum so it’s no longer a question of “is it here yet?”
But, for now, let’s dive into where we are and discuss how AI has actually been used in the hospital setting. We're kicking off with a study from our neurosurgical colleagues at NYU, published in Nature, which is perhaps one of the best AI in medicine papers we’ve read.
In this study, the researchers harnessed the power of unstructured clinical notes from electronic health records. They took their own patients’ notes, and pre-trained and fine-tuned tuned a large language model across a number of operational tasks: including predicting 30-day readmission, in-hospital mortality, and length of stay.
The team then not only evaluated the model’s performance, but subsequently deployed the model into their EMR, and in front of real clinicians for a trial period. The beauty of this work is that it isn't just about building a model; it's about creating a framework that can be developed and deployed. This takes effort and significant institutional buy-in.
Now, not all that glitters is gold, and the positive predictive value of the model was quite poor (meaning: if the model over-called things). But, as we have argued time and time again, the standard should not be 100%, it should be the ability of clinicians today. How good would you be at predicting whether a patient would be readmitted just by reading their notes? What about if you took care of them?
Impressively, the team then deployed this model prospectively, and attained qualitative feedback from clinicians regarding the readmissions the model caught. From the figure above it’s clear that a significant portion of these were predicted, unplanned, and penalizable (by Medicare) readmissions.
The paper also hits on an interesting concept of bias- that a model trained on data may not be representative of the data it is deployed on. In this case, the trained : deployed data could not be more 1:1. They are all patients within one health system. Is this a better model for training? Creating frameworks and structures but training small, non-generalizable models?
But what about the Operating Room? A.I. has already made strides here as well. Recently, SAGES announced an extended partnership with Theator, an A.I. computer vision company providing data analytics in the OR. They're striving to enhance surgical outcomes by providing real-time insights and identifying areas for improvement. The ambition? More precise surgeries, reduced surgical times, and ultimately, better patient outcomes. What is left to be seen is how these insights are not only delivered back to the ones in the OR, but actually taken up. As we’ve seen, things we do for no reason is a common refrain in medicine, and changing practice may take much more than simply data.
Separately, Proprio recently received FDA clearance for their spine navigation system. They are leveraging computer vision to circumnavigate the typical constraints of navigation techniques. Their systems enable real-time visualization of complex anatomical structures, reducing variability in navigation systems once the surgery begins.
But, the integration of these tools is not without its own tough questions. A few for you to consider:
Do we ever mandate the use of AI?
How do we reconcile differences when seasoned clinicians disagree with A.I. recommendations?
The recent piece in the Wall Street Journal discussed the conflict between nurses looking after patients and an AI model telling them the patient was septic. This raises these questions, highlighting the dichotomy of decision-making in a health crisis: human versus machine.
Many physicians might baulk at the idea of allowing an AI model to sway their decision-making process. But as always our radiology colleagues are leading the way in this field, and have shown that at all skill levels, we are biased towards believing the A.I.
If anything, the sepsis algorithm vs nurse story tells us that not all A.I. is created equal. It’s widely known that the Epic sepsis alerts are at best horribly annoying but actually likely detrimental to care. But, it’s also widely known in the community that the landscape of A.I. in 2023 is starkly different from when the sepsis algorithm was introduced. Would a sepsis alert 2.0, built on the backs of GPT-4 (or NYUTron), do better?
Ultimately, we must answer two fundamental questions:
At what point (Positive Predictive Value, Specificity, Sensitivity) do we allow A.I. models to enter the patient care pathway?
At what higher threshold do we actually mandate their use?
A.I. isn’t a fad or some far-away concept which one day may “arrive.” It has been slowly working its way into our workflows for the past few years and will continue to do so with more force.
As described in our mission above, it’s our job as those in the arena to demystify and adjudicate and ask these tough questions. We owe this to our colleagues, our profession, and to our patients.
🪦 Best of Twitter
Any suggestions for what you’d like to see here instead?
Feeling inspired? Drop us a line and let us know what you liked.
Like all surgeons, we are always looking to get better. Send us your M&M style roastings or favorable Press-Gainey ratings by email at ctrl.alt.operate@gmail.com