The AI Went Live. Nobody Was Watching.

The AI went live. The committee celebrated. The radiologists went back to reading. Nobody was assigned to watch what happened next.

This is not a technology problem. It is a governance problem. And it is nearly universal across imaging departments that have deployed AI in the last several years.

The post-deployment monitoring conversation in radiology almost always starts and ends in the same place: accuracy metrics. False positives. False negatives. Sensitivity and specificity measured against a benchmark dataset, usually supplied by the vendor, usually derived from a population that may bear limited resemblance to the one walking through your doors. That framing is not wrong. It is just the smallest and most tractable part of a much larger problem, and organizations that stop there have confused measuring one dimension of performance with actually governing a clinical tool.

What genuine post-deployment governance requires is harder to package and considerably easier to ignore.

The Audit Trail Nobody Built

Every AI recommendation that influenced a clinical decision needs to be traceable. Not in principle. In practice, with infrastructure that actually captures it. Which version of the model was running at the time of the read. What the confidence score was. What the radiologist did with the output and why. Whether a flagged finding was dismissed, acted on, or noted and deferred. Without this infrastructure, you cannot reconstruct the clinical decision sequence when something goes wrong. You cannot answer the question a plaintiff's attorney will eventually ask, which is: what did the AI say, what did the radiologist do with it, and why.

Most radiology departments deploying AI do not have this infrastructure. The AI output appears in the workflow. The radiologist reads. The report is generated. What happened between the AI recommendation and the final report lives nowhere retrievable. That gap is not an inconvenience. It is a liability exposure that has not yet been tested in court at scale, but will be.

The Medicolegal Landscape Is Shifting Faster Than Most Organizations Realize

The legal standard for what a radiologist should have done is not static. It is set by the professional community, which means it moves as practice moves. As AI becomes more prevalent in imaging workflows, the standard of care argument in a malpractice case begins to incorporate AI. Not immediately, not uniformly, but directionally and irreversibly.

This creates an asymmetric liability problem that almost no organization has thought through carefully. Acting on an AI recommendation creates liability. So does ignoring one. A radiologist who overrides a flagged finding and turns out to be wrong faces a different legal argument than a radiologist who never had AI assistance at all. A radiologist who follows a recommendation that turns out to be a false positive faces a different argument still. The presence of AI in the workflow does not reduce the radiologist's accountability. In some scenarios, research suggests it increases perceived culpability when the radiologist and the AI disagree and the radiologist is wrong.

There is also the question of disclosure. If AI error rates are not communicated, juror perception of AI tends toward the infallible. Studies examining mock jury responses to radiology malpractice scenarios have found that disclosing AI error rates materially shifts juror judgment in the radiologist's favor. Organizations that deploy AI without thinking through how that deployment will be characterized in a legal proceeding are making a significant and avoidable mistake.

Drift, Versioning, and the Problem of the Invisible Update

A model validated on a curated dataset will behave differently in your department, on your scanners, with your patient population. This is not a theoretical concern. It is a documented phenomenon, and it tends to worsen over time as the gap between the training environment and the real-world deployment environment widens. Scanner software updates. Technologist acquisition practices shift. Patient demographics change. Referral patterns evolve. Each of these variables moves the real-world input distribution further from the conditions under which the model was validated.

What makes this harder to manage than it first appears is the vendor update problem. Vendors revise their models. Sometimes these updates are announced with formal change notifications. Sometimes they are not. A model performing at an acceptable level in January may be a materially different model in July, with different performance characteristics across different finding types or patient subpopulations. Without version tracking and change control on the clinical side, an organization cannot determine whether a change in observed performance represents drift in the input data, a change in the model itself, or an interaction between the two. The liability implications of that ambiguity are significant. If performance has degraded and harm has occurred, the question of which version of the model was running and whether the organization knew it had changed becomes central to the legal narrative.

Monitoring this requires more than a vendor-supplied dashboard. It requires clinical ownership, defined performance thresholds, a process for investigating when those thresholds are crossed, and a clear accountability structure for what happens next. It requires someone who understands both the clinical and technical dimensions of what they are looking at, not someone who can read a chart but cannot interpret what a change in performance means for a specific diagnostic task.

The Workflow Effects That Accuracy Metrics Cannot Capture

When AI is introduced into a radiology reading workflow, radiologist behavior changes. Some of these changes are intended and beneficial. Many are not, and most go undocumented.

Attention allocation shifts in ways that are not always clinically appropriate. Findings flagged by AI receive more scrutiny. Findings not flagged by AI may receive less. Research has shown that radiologists can be less accurate interpreting cases with incorrect AI feedback than with no AI at all, and that this effect is stronger when radiologists believe the AI output is being recorded in the patient's file. The cognitive load of adjudicating AI recommendations is not neutral. It adds a layer of decision-making to every read, and that layer interacts with fatigue, volume pressure, and case complexity in ways that have not been systematically studied in most deployment environments.

None of this shows up in the accuracy metrics an organization typically tracks after deployment. It requires a different kind of monitoring, one that treats the radiologist-AI interaction as a clinical process with its own quality indicators, not simply as an input-output system where the output is the report and everything in between is invisible.

The Downtime Problem Nobody Planned For

There is one more exposure that almost no deployment plan addresses, and it may be the most operationally immediate of all.

When the AI goes down, what happens?

Radiology AI systems experience downtime. Servers fail. Integration points break. Vendor maintenance windows occur. In a department that has incorporated AI output into its standard reading workflow, an AI outage is not simply an inconvenience. It is a workflow disruption that carries clinical and medicolegal weight that most organizations have not assigned to it.

If radiologists have come to rely on AI triage to prioritize their queue, an outage changes their prioritization process. If AI flags are part of the cognitive framework a radiologist uses to approach a study, their absence changes how the study gets read. If AI output is normally documented in the workflow record and it is absent, does that absence look, retrospectively, like something was skipped or ignored?

These are not hypothetical questions. They are operational questions that should have been answered before go-live and almost never were. The contingency plan for AI downtime is as much a part of responsible deployment as the monitoring plan for AI performance. The organizations that have thought this through are a small minority.

What Responsible Deployment Actually Looks Like

Post-deployment governance in imaging AI is not an IT function. It is a clinical quality function that requires radiologist ownership, defined performance metrics, version tracking, a formal process for investigating performance changes, clear documentation standards for AI-assisted reads, a disclosure framework for medicolegal purposes, and a contingency protocol for downtime.

This is not an exotic standard. It is the minimum that responsible deployment of a clinical tool with this level of consequence should require. The fact that most organizations have not built it is not a reflection of bad intentions. It is a reflection of how AI has been sold and adopted in healthcare: as a procurement decision rather than a clinical governance decision, with the hard work assumed to happen after go-live and then quietly never happening at all.

Go-live is not the finish line. For most organizations, it is where the real work should have started.

← Back