What's New in Surgical AI: 11/12
Vol 4: AI vs Amateurs, Lawsuits, and how to build a new scrub tech
This week, we focus on two topic areas and highlight one paper of the week in surgical aritifical intelligence.
Table of Contents
🤖 Kata-Go, the world’s best (AI) Go player, loses to an amateur
💼We told you this was coming - class action lawsuits against AI products
🏆 Paper of the week: How can cooking-AI forecast a more efficient OR for you?
Hot Seat 🔥💺: Alpha-Gos?
KataGo, one of Alpha-Go’s younger, more hip cousins, could similarly beat professional Go players. Both Alpha and KataGo use a type of learning called reinforcement learning where a system learns play strategies by playing against itself. However, a recent vulnerability was exposed when it lost to human amateurs who uncovered an unusual (for experts) move combination that was difficult for the algorithm to foresee.
Unusual scenarios that lie outside the scope of the training data the model had “seen” lead to dire underperformance.
Let’s bring this into the OR. One of the key proposed use cases for AI in surgery is to assist with routine tasks. However, much like our adversarilly minded Go players, the patient’s disease process may create unrecognized and unusual situations, leading to rare complications.
One of the hallmark features of a rare complications is it’s unexpectedness, making the KataGo experience rather sobering. The answer relies on a central premise:
Data is king.
Two points on this:
If we train our model using only expected events, we cannot expect it to perform adequately during unexpected events. Large datasets focused on unexpected events (in our field, complications) are difficult to accrue but vital to longevity of these models.
Distribution of data: think of the amateurs (in this example) as a specific patient population. Models trained on large, population level data will do fine for benchmarks, but will repeatedly fail when applied to specific patient populations it has not been fine tuned on. This is a challenge the medical AI community has faced (and failed at) in the past.
The next version of KataGo will almost undoubtedly be trained on a larger volume of amateur Go players.
In chess, some of the best performing systems use centaur approaches, where humans work with an AI to choose the best strategy- many of these centaurs can beat even the best AI.
A.I. Needs Malpractice Insurance
This week we saw the first class action lawsuit against an AI product, specifically Github Copilot - the A.I. code writing assistant which uses natural language to help write, document and debug code. Don’t say we didn’t warn you. Here is an interview with the lawyers filing the suit.
We’re not here to opine on the legal framework behind mass data scraping of open-source to copyrighted data, and how AI models fit into that milieu. What this does demonstrate though is the following: we (collective) need to build data sharing collaboratives where building AI models is the open and transparent goal 📈
This brings us to one of the chief missions for starting this newsletter:
Data should be provided for model training within a data contribution structure that is fair, equitable and with transparent credit (+/- monetization) strategies. This path enables collaboration, and prevents a degradation of trust between the institutions developing AI technologies and the general public.
This is especially important for surgeons building AI models, and even moreso for medical device companies and digital health companies building models which scrape large volumes of medical data. Looking at you, AI scribe companies 🔎
Another food for thought: what about data that is automatically and systematically de-identified before being trained? Radiology without identifiable labels, operative video of internal organs only, images of your colon polyps, might technically be personal data, but it’s hardly personal, right?
What do you think?
🏆Paper of the Week: Computer Vision to Predict Steps of Chocolate Chip Cookie Recipe… and your most likely next surgical instrument?
For this week’s deep dive, we discuss the preprint by Sener and colleagues  looking at zero-shot anticipation for procedural actions - in the kitchen👨🏼🍳! The authors evaluated a models’ ability to predict the next action in a recipe using only video as the input. The model was trained on a large corpus of text recipes, and a much smaller corpus of labeled videos. The third catch - the model had never seen the video it was being evaluated on. In short, the model would:
Be fed (pun intended) video of someone adding flour and butter to a bowl to make Chocolate Chip Cookies.
The model would be responsible for predicting the next step: “Add butter to the bowl”
This would require the model to recognize the key steps of the procedure, key ingredients- based on novel video- correspond that data with textual data it had trained on, and reproduce a most likely next step
Lets bring this into the OR
This is a scrub tech (or junior resident’s) dream. Imagine a model, built and trained on a large volume of cases so it knows every surgical instrument in the book, but fine tuned on your case log. So the predictions it makes for the “next step” are based on what you like to use.
It is always discussed - “we need the A-team for this case”. What goes into the “A-Team” from a scrub tech and circulator standpoint? It centers on an intimate knowledge of operator preference, procedural anticipation, knowledge of systems, and of course an element of technical capability.
No AI system will outperform the team dynamic of a well-oiled, familiar team of experts who can practically finish each others sentences. These types of procedural assistance systems may help speed up the training process for new OR staff, may elevate the performance of under-performers, or may simply allow efficient teams to operate even more efficiently.
The need for these models to be specific cannot be overstated. While large scale models may show promising results, these must be fine-tuned such that any individual surgeon’s recommendations are exactly that - a daunting and computationally intensive task.
Best of Twitter 🟦
Unsurprisingly, DARPA cares a lot about AI-Human teaming
Midjourney v4 is really good.
Deep-fake CXRs with pleural effusions created with Stable Diffusion looks pretty real. Real enough to be detected as a pleural effusions using another AI model…
Feeling inspired? Drop us a line and let us know what you liked.
Like all surgeons, we are always looking to get better. Send us your M&M style roastings or favorable Press-Gainey ratings by email at firstname.lastname@example.org