If human oversight fails, how can we build AI systems that don’t?
Michael A Santoro argues that in high-stakes AI deployments, human overrides are unreliable, true safety comes from upfront design, ethical trade-offs, and governance.

Image by syda_productions | Freepik
When AI systems are deployed by governments and non-profits, the default intuition is that higher-stakes environments require tighter human control. The more serious the consequences, the stronger the pull toward ‘human-in-the-loop’ intervention. That intuition is understandable, but it is also, in important respects, mistaken.
In routine functions, such as traffic management, permitting, and service delivery, errors are visible, distributed, and often reversible. In crisis environments, by contrast, decisions are compressed in time, stakes are elevated, and the margin for error is narrow.
However, the critical management insight is the same: late-stage human intervention is not a reliable safeguard. It is often too slow, too inconsistent, and too dependent on the same biases and informational gaps that the system was meant to address. The more effective approach is not tighter downstream oversight, but stronger upstream design.
Consider a policing scenario in which an AI-supported dispatch system prioritises responses based on predicted risk. Following a series of incidents, it becomes apparent that the system is disproportionately directing police presence into minority neighbourhoods. The immediate reaction is often to intervene at the point of output: adjusting recommendations, second-guessing the system, or requiring human override before action is taken.
Yet this response addresses symptoms rather than causes. If the system is producing biased outputs, the issue lies in the model’s construction: the data it was trained on, the proxies it uses for risk, and the value judgements embedded in its optimisation criteria.
Effective guardrails, in this context, must operate at the level of model design. This includes scrutinising training data for historical bias, explicitly modelling disparate effect, and making normative choices about how to weigh competing objectives such as efficiency, fairness, and harm reduction.
The most important factor to keep in mind is the same that applies any time governments design policies that will be implemented at the community level. These are not purely technical decisions; they are policy choices that require democratic accountability. Attempting to correct for them after the fact, in the midst of live deployment, is operationally more complicated than it might seem. A human override in one part of the systemic terrain will have unintended inefficiency effects on other communities. The right time to make the requisite trade-offs is at the outset of the process.
A second example arises in counter-terrorism or emergency response. Imagine a system tasked with identifying potential threats based on vehicle movement and behavioural signals. A white van, in isolation, is not a meaningful indicator; there are too many such vehicles for the signal to be useful. Human operators on the ground may rely on contextual cues, such as unusual driving patterns, location, timing, or combinations of behaviours, to identify which vehicles warrant attention. The common assumption is that these contextual judgements must remain with human responders.
But, this assumption overlooks a critical point: if these cues are sufficiently systematic to guide human decision-making, they can and should be incorporated into the model itself. The goal is not to replace human judgement with a crude proxy, but to formalise and integrate that judgement into a system that can apply it consistently at scale. When these signals are left unmodelled, the system remains blunt, and the burden shifts back to human operators operating under pressure, with all the attendant risks of inconsistency and bias.
In both cases, the temptation is to treat AI systems as tools that require continuous human correction. A more effective framing is to treat them as public institutions whose behaviour must be shaped by human governance in advance. This requires moving from an oversight paradigm to a guardrails paradigm. Oversight assumes that errors will occur and must be caught; guardrails aim to prevent those errors from arising in the first place.
For practitioners in risk management and crisis response, this shift has practical implications. First, governance must begin before deployment, with structured processes for defining objectives, identifying trade-offs, and stress-testing models under realistic conditions. Second, evaluation must include not only accuracy metrics, but also distributional effects, particularly in communities that have historically borne disproportionate risk. Third, accountability mechanisms must be part and parcel of design decisions. When a system fails, the question should be: why was the failure not anticipated in the system design process?
None of this eliminates the role of human judgement. Managers must be ever vigilant in monitoring results so that unintended consequences are caught early. Human expertise is most valuable in setting the parameters within which systems operate, in determining what counts as risk, in how competing values are balanced, and in what constraints are non-negotiable.
The paradox is that the highest-stakes applications of AI are precisely those in which we must rely less on ad hoc human intervention and more on disciplined, transparent system design. Trust in these systems will not be built through the promise of human override, but through demonstrable evidence that the systems and those who design them are worthy of trust.
Michael A Santoro is Professor of Management and Entrepreneurship at Santa Clara University, USA, specialising in business ethics, governance, and AI systems.