A radiologist signs a report. Six months later, a lawyer asks what an AI tool told them before they signed it. Nobody in the organization can answer.
This is not a hypothetical. It is the position most radiology organizations deploying AI are already in, whether they know it or not. The report itself, the one document everyone agrees matters, was never built to hold this kind of information. It captures a conclusion. It does not capture what an AI generated, what was on screen, what was changed, or why. AI introduced a new category of evidence into a workflow that has no place to put it.
Most conversations about AI governance in radiology skip past this and go straight to accuracy. Sensitivity, specificity, false positive rates, the questions vendors are built to answer well. Those numbers describe whether a model is good. They say nothing about whether an organization can reconstruct, after the fact, what actually happened in a specific case. That second question is the one that matters when a case is challenged, and it is the one almost nobody has answered before deployment.
This article lays out what answering it actually requires.
Why Radiology Is a Distinct Governance Problem
Radiology AI governance is not the same problem as general enterprise AI governance, or even broader healthcare AI governance. The differences matter for how the framework gets built.
Radiology reporting is fast, language-dense, and legally signed. That signature creates a specific evidentiary artifact, one that captures the radiologist’s final conclusion and nothing about how the AI shaped it. The gap between what the AI generated and what got signed is the core governance problem, and it exists in every AI-assisted read until an organization deliberately closes it.
Radiology also differs structurally from most other clinical AI contexts. Radiologists work at high volume with limited time per case, and they are frequently exposed to several AI tools at once: triage algorithms, CAD overlays, PE detection, nodule detection, fracture detection, prior-report summarization, impression generation. This is not one AI making one recommendation. It is a radiologist navigating a complex AI information environment, case by case, often in seconds.
A governance framework that adds significant friction will fail in practice. One that adds no structure will fail under legal or regulatory scrutiny. The right design works mostly through passive, automated capture, with structured input reserved for the events that are actually clinically material.
The Decision Chain: What Governance Must Reconstruct
Every AI-influenced report should be reconstructable through a complete decision chain. That is the minimum standard for defensible governance, not an aspirational one.
The chain starts with input context: what data the AI had access to at inference time, including structured findings, dictation, clinical indication, prior reports, and any retrieved context. The input matters because it defines what the AI could have known.
Next is model identity. A specific output came from a specific model at a specific configuration state, and that identity needs to be logged alongside the output. Knowing the vendor’s product name is not enough. The record needs the deployed model identifier, version, configuration package, and deployment timestamp, because AI systems change and the record needs to identify which version produced a given result.
Instruction set identity is just as important and far more often overlooked. Impression-generation tools run under prompts or configuration parameters that shape expected behavior, things like preserving uncertainty, avoiding unsupported diagnoses, or flagging critical findings. Prompt changes can alter clinical behavior without ever triggering a model version change, so the prompt package version needs its own place in the record.
Display state has to be captured separately from generation. An output the model produced but the radiologist never saw is not equivalent to one that appeared in the workflow. A radiologist cannot be said to have accepted or ignored something they were never shown.
Radiologist interaction with the output needs a level of granularity that most systems don’t capture today. Accepted verbatim, partially accepted, edited, deleted, contradicted, ignored: these are materially different events, and the record needs to show what changed between the AI-generated text and the final signed text at a level that identifies diagnosis changes, added or removed critical findings, follow-up changes, laterality corrections, and shifts in confidence.
The final signed report, with its timestamp and signing identity, closes the chain, and the governance record needs to link that closing event to everything that came before it.
What “Audit” Actually Requires
The word audit gets used loosely in clinical AI discussions. For governance purposes, the real standard is not operational logging. It is legal defensibility, and those are different things.
Operational logs are built for debugging and performance monitoring. They are often editable by administrators, overwritten after a retention window, and stored inside vendor-controlled systems that may not export in complete form. None of that was designed to function as evidence, and most AI vendor dashboards today were not built to satisfy a forensic challenge.
A legally defensible audit trail looks different. Records need to be append-only or tamper-evident, written once and not modifiable after creation. Hash-chaining, where each record links cryptographically to the one before it, provides tamper evidence without requiring anything as heavy as blockchain. Timestamps should come from an external or independently verified source rather than the application itself, since self-generated timestamps are easier to challenge. The audit system needs its own access controls and its own log of who accessed it and why. And the record needs to exist somewhere the client controls, not only inside the vendor’s infrastructure, because vendor business failure, contract termination, or litigation-related disputes can all put a vendor-only record out of reach exactly when it matters most.
Organizations also need legal hold functionality, the ability to preserve records beyond normal retention without relying on manual intervention, and genuine export capability. A dashboard view is not an export. A PDF summary generated after the fact is not a primary record. What gets exported needs to include the underlying event objects, timestamps, identifiers, and integrity evidence, not just a polished readout.
Model and Configuration Governance
AI systems change. That is not an exception to plan around. It is inherent to the technology, and most radiology organizations have no mechanism to detect when it happens.
A model registry, a controlled record of what is deployed, when, by whose approval, and with what validation, is the foundation here. Every output in the audit trail should reference the model, version, and configuration state that produced it, because that is the only way to know months later whether a given output came from a validated configuration or one nobody signed off on.
Vendor change control is a specific risk worth naming directly. AI vendors can update models or configurations without giving clients adequate clinical detail, sometimes without notice at all. A solid governance framework requires vendor disclosure of planned changes, a way to classify changes by materiality, validation evidence for material changes, a mechanism to accept or reject a change before deployment, rollback capability, and notification to clinical leadership before anything goes live. The FDA’s Predetermined Change Control Plan framework points in this direction for AI medical devices, and the underlying principle applies even to tools that aren’t currently regulated that way. No organization should discover that its AI tool changed only after a liability event forces the question.
Prompt adherence monitoring belongs in this category too. If a tool runs under an instruction set, that instruction set defines what it’s supposed to do. Systematic deviation from it is a configuration failure, and an observability layer should be watching for output patterns that drift from expected instruction adherence at the population level, not just the case level.
Override Governance and Clinical Delta Capture
In medical coding AI, override is simple: accepted, modified, or rejected. In radiology, the decision space is wider and the stakes are higher.
A radiologist might accept an AI impression verbatim, accept it with minor edits, accept part and delete the rest, change one diagnosis while keeping others, contradict a finding entirely, ignore the AI and dictate independently, add a critical finding the AI missed, or amend the report after the fact. Treating all of these as one undifferentiated “override” category erases exactly the distinctions that matter.
The audit layer needs text-level deltas between the AI-generated impression and the final signed one, with particular attention to differences that are clinically material: diagnosis changes, added or removed critical findings, added or removed follow-up recommendations, laterality changes, and shifts in expressed certainty. Minor stylistic edits matter less, and a well-calibrated system should surface meaningful divergence without treating every word change as a governance event.
Override patterns at the population level are themselves a quality signal. A subspecialty with a systematically higher edit rate than the rest of the practice is worth investigating. A jump in override rate right after a vendor update is worth investigating immediately. An observability layer should be built to surface these patterns before they become a clinical risk, not after.
Observability: The Prospective Layer
Auditability looks backward. It answers what happened in a specific case. Observability looks forward. It answers whether something is changing in how the system behaves, before a case-level harm occurs.
AI drift in radiology rarely shows up as a vendor alert. It shows up as rising override rates, deeper edits, inconsistent follow-up patterns, subspecialty-specific rejection trends, more frequent addenda, or growing divergence between AI output and signed reports across a practice. A governance framework that only operates at the case level will miss exactly this kind of practice-level deterioration.
That means tracking population-level metrics over time: override rate by modality, subspecialty, radiologist, and model version; edit depth; critical-finding agreement rate; addendum frequency correlated with AI exposure; and how well AI confidence calibrates against actual radiologist agreement. When a vendor pushes a model update, before-and-after metrics should be compared systematically. An organization that doesn’t run that comparison has a governance gap whether or not anything actually went wrong.
Multi-AI Conflict and the Information Environment Problem
Most governance discussions still treat AI as a single tool making a single recommendation. That picture is already out of date.
A radiologist today might face a triage prioritization, a CAD overlay, a PE detection alert, a nodule detection recommendation, a prior-report summary, and an AI-generated impression draft, all on the same case. These tools can conflict. One may flag something another omits. A summarization tool may misrepresent a prior report in a way that quietly shapes the impression that follows.
The audit record needs to capture not just the final AI-generated impression but the full AI information environment visible to the radiologist at the moment of decision. If two tools disagreed and the radiologist followed one, that choice should be reconstructable. This is, to be candid, a design challenge without a fully solved implementation in wide clinical deployment today. But organizations adding a third or fourth AI tool should have a plan for multi-AI exposure capture before the information environment becomes too tangled to reconstruct after the fact.
Legal Survivability: The Hardest Test
Legal survivability is the standard every other piece of this framework should be measured against. The question isn’t whether records exist. It’s whether they would hold up to a forensic challenge, a regulatory investigation, or an adverse event review.
A forensic review will ask who created each record and when, whether the timestamp can be altered, who can edit or delete it, what happens during a system upgrade, what happens if the vendor won’t cooperate or the contract ends, what survives a platform migration, whether a legal hold mechanism exists, whether the full record can be exported, and whether the audit log itself can be audited. Most vendor dashboards and most operational logs cannot answer these questions today. The governance framework needs to specify, before deployment, where the authoritative record lives, who controls it, how it’s protected, how long it’s retained, and what format makes it usable as evidence.
Security and Privacy as Governance Requirements
An audit layer capturing PHI, report text, clinical history, prompt packages, and user behavior is itself a sensitive data environment. A system built to reduce AI risk can become its own liability if it isn’t secured properly.
The baseline includes encryption at rest and in transit, scoped access control across read, write, and administrative roles, BAA execution for HIPAA-covered deployments, data residency options for provincial health information law where relevant, SOC 2 Type II certification or equivalent for the audit infrastructure operator, a defined retention and disposal policy, administrator-access logging, and a response plan for a breach of the audit system itself. PHI minimization matters too: the audit layer should hold the minimum identifiable information needed for evidentiary reconstruction, not function as a second clinical repository, and access should be scoped to quality assurance, governance, legal, and incident-response roles rather than general administrative or vendor staff.
Organizational Governance: Who Owns What
Architecture alone doesn’t make any of this work. Without defined ownership, an audit layer just sits there.
Every deployment needs four roles assigned explicitly. A clinical owner, the radiologist or physician leader who judges whether AI output is clinically appropriate and whether a model change needs clinical review before going live. A technical owner, the IT or informatics function responsible for integration, logging, security, and system integrity. A vendor owner, the named point of contact for change notifications, validation documentation, and incident response. And a legal and privacy owner, responsible for HIPAA, provincial law, and incident response from a data governance standpoint.
The gap between these roles is exactly where governance fails in practice. IT assumes the vendor owns clinical validation. The vendor assumes the client owns clinical risk. Clinical leadership assumes IT has the logging covered. Six months later, under challenge, nobody can reconstruct the chain, because everyone assumed someone else had built it.
Pre-Deployment Diligence as Governance Practice
Governance works best before deployment, not after an adverse event forces the question. A radiology organization should be able to demonstrate, before a tool goes live, that it can reconstruct the complete decision chain for that tool.
If it can’t demonstrate that, it isn’t ready to deploy yet. If the vendor can’t support it, the vendor isn’t ready for clinical use yet either, and that conclusion has immediate value at the procurement stage rather than after a contract is signed. A solid pre-deployment review covers the vendor’s audit capability, the contractual terms around change control, retention, export rights, and legal hold, a clinical workflow review confirming the governance layer doesn’t impose unacceptable friction, a security review of the audit infrastructure, and confirmation that clinical, technical, legal, and vendor roles are assigned and documented. After deployment, the observability layer keeps watching for drift and configuration change, and the same diligence process should run again whenever a model update lands. Governance here isn’t a one-time gate. It’s an ongoing practice.
Where This Leaves Most Organizations
None of this is exotic. It’s the minimum a clinical tool with this level of consequence should require, and most organizations haven’t built it, not because of bad intentions but because AI in radiology has largely been adopted as a procurement decision rather than a clinical governance one. The hard work gets assumed to happen after go-live, and then quietly never happens at all.
The standard worth holding every deployment to is simple to state and harder to satisfy: can this organization reconstruct one AI-influenced report, completely and accurately, months after the fact? Organizations that can’t answer that question are not ungoverned because they were unaware of the risk. They are ungoverned in the specific sense that they cannot prove, after the fact, that the risk was managed. That distinction is the one that matters when a case is actually challenged.
Boridy Imaging Advisory works with imaging organizations and the investors who back them to build this kind of governance framework before deployment, not reconstruct it after a record has already failed under scrutiny.