Ask a calculator for 17 × 23 and you will get 391 today, tomorrow, and on the day the company is sold. Ask a large language model to summarise the same meeting twice and you may get two different summaries, both plausible, neither identical. That gap is not a bug to be patched. It is the difference between a deterministic system and a probabilistic one, and once you can see it, half the confusion around AI at work falls away.

The quick version

  • Deterministic = same input, same output, every time. A spreadsheet formula, a payroll calculation, a tax rule. Predictable and auditable; you test it by checking the answer is right.
  • Probabilistic = the system produces a best guess drawn from likelihoods, so the same input can give different outputs. Recommendation engines, fraud scores, and every generative-AI tool live here. You test these by checking the answer is good enough, often enough.
  • The skill is not preferring one over the other, it is knowing which kind each part of your process needs, and never letting a probabilistic tool do a job that demands a deterministic guarantee.
  • Most failed AI projects are really a category error: a guessing machine wired into a place that needed an exact answer.

The idea in depth

Strip away the jargon and the distinction is about repeatability. A deterministic system follows fixed rules: given the same starting conditions, it always lands in the same place. A hash function, a sorting algorithm, an IF statement in a spreadsheet, feed them identical inputs and they return identical outputs forever. A probabilistic system instead models uncertainty directly. It does not "know" the answer; it estimates the most likely answer from patterns in data and hands you that, often with the variability baked in (RudderStack's primer frames the split cleanly for non-engineers).

None of this is new, which is the reassuring part: the engineering disciplines for handling it already exist. Software teams have fought non-determinism for decades. In his 2011 essay "Eradicating Non-Determinism in Tests", Martin Fowler describes a non-deterministic test as one that "passes sometimes and fails sometimes, without any noticeable change in the code, tests, or environment", and warns that such tests are "useless," with an "infectious quality" that can "completely destroy the value of an automated regression suite." The historical instinct was to hunt non-determinism down and remove it: isolate it, quarantine it, make the system behave the same way twice.

Generative AI flips that instinct on its head. With an LLM you are not trying to eliminate variability, variability is the product. The job becomes containing it: working out where a confident guess earns its keep, and building a hard, deterministic boundary around the places where it doesn't.

flowchart LR
    I("Same input") --> D("Deterministic system")
    I --> P("Probabilistic system")
    D --> DO(["One fixed output, every time"])
    P --> PO1(["Likely output A"])
    P --> PO2(["Plausible output B"])
    P --> PO3(["Less-likely output C"])
					
One input, two worlds: a fixed answer versus a distribution of likely answers. Leaders Loop

Why generative AI is unavoidably probabilistic

An LLM generates text by repeatedly predicting the next most-likely token from a probability distribution. Andrej Karpathy, former director of AI at Tesla and a founding member of OpenAI, described these models in his June 2025 talk "Software Is Changing (Again)" as "stochastic simulations of people," and pointedly as "simultaneously superhuman in some ways, but also fallible in many others," prone to "hallucinations, inconsistency, and poor memory." That fallibility is not a defect to be debugged out of the model. It is what a probabilistic system is. This is exactly why first-principles thinkers separate the engine from the guarantees around it, see our note on first principles vs heuristics.

So stop asking an AI tool "is it accurate?" as though accuracy were a yes-or-no property. Ask instead: how often is it good enough, and what does one bad answer cost me when it slips through? Those are the questions you put to a probabilistic system, and they are the right ones here.

Humans are probabilistic systems too, and that's the honest limitation

Here is the part most explainers skip. The deterministic ideal, same input, same output, is something humans almost never deliver either. In Noise: A Flaw in Human Judgment (2021), Daniel Kahneman, Olivier Sibony and Cass Sunstein document how experts asked to judge the same case reach wildly different conclusions: underwriters at one insurer set premiums for identical fictional clients that varied by a median of 55%, five times what their own executives expected. Two psychiatrists independently diagnosing the same patients agreed only about half the time. The authors call this unwanted variability noise, and it is everywhere human judgment operates.

That matters for two reasons. First, it punctures the lazy criticism that "AI is unreliable because it's inconsistent", so are we, often more so. Second, it sets the honest limitation on this whole framing: the deterministic/probabilistic line is a spectrum, not a wall. A weather model is probabilistic but disciplined; a tired manager at 5pm is probabilistic and undisciplined. The useful question is never "is this random?" but "how much spread is there, and can I live with it here?"

Most failed AI projects are a category error: a guessing machine wired into a place that needed an exact answer.

Putting the boundary in the right place

The practical art is drawing the line, letting the probabilistic part do what it's brilliant at (handling ambiguity, generating options, reading messy human input) while a deterministic layer enforces the rules that cannot bend. This is the pattern behind almost every well-built AI feature: a creative, fuzzy front end, a strict, checkable back end.

flowchart TD
    U(["Messy human request"]) --> A("Probabilistic layer: LLM drafts, suggests, interprets")
    A --> G{"Deterministic gate: rules, validation, calculation"}
    G -->|"Passes the rules"| Y(["Action taken / answer shipped"])
    G -->|"Fails the rules"| H(["Held for a human, or rejected"])
					
The reliable pattern: a fuzzy front end for ambiguity, a strict gate for anything that must be exact. Leaders Loop

A worked example

Picture a mid-sized firm whose finance team drowns in supplier invoices. Someone proposes "let AI handle the invoices," and the project quietly heads for the rocks, because invoice processing is two jobs wearing one coat, and only one of them is probabilistic.

Reading the invoice is the fuzzy job: every supplier uses a different layout, fields move around, scans are crooked. That is exactly where a probabilistic model shines, it tolerates the mess and extracts "vendor: Acme, amount: 4,210, due: 30 days" from a chaotic PDF far better than any rigid rules engine. Letting the AI interpret here is the right call.

Paying the invoice is the exact job. The amount that leaves the bank must equal the amount owed, not "probably," not "94% of the time." So the well-designed version routes the AI's extracted figures into a deterministic layer: does the total match the purchase order? Is the supplier on the approved list? Is this a duplicate? Anything that fails those fixed checks stops and waits for a person.

Run that on, say, 10,000 invoices a month (an illustrative figure). If the model reads 97% correctly, the deterministic gate catches the other 300 before any money moves, you have captured most of the speed and surrendered none of the control. Now imagine the naïve build, where the AI's number flows straight to payment: those same 300 errors become 300 wrong payments, and the project becomes a cautionary tale. Same model, same accuracy. The only difference is where the boundary sits. Choosing where that line falls is itself a reversible-vs-irreversible decision, money leaving the bank is hard to undo, so it earns the strict gate.

Frequently asked questions

Is deterministic always the "safe" choice, then?

No, deterministic just means predictable, not correct or appropriate. A rigid rules engine will confidently apply a wrong rule the same wrong way a million times. And many real problems (reading handwriting, ranking search results, spotting fraud) have no clean rules to write down; forcing them into deterministic logic produces something brittle and worse than a good probabilistic model. Match the tool to the shape of the problem, not to your appetite for tidiness.

Can't I just make an LLM deterministic by fixing its settings?

You can reduce its variability, lowering the "temperature" setting makes outputs more repeatable, but you cannot make a guessing machine into a guarantee. Lower variability is not the same as correctness: a model can be perfectly consistent and consistently wrong. If you need a hard guarantee, put a deterministic check downstream; don't try to squeeze it out of the model itself.

How do I test something that gives different answers each time?

You stop testing for one right answer and start testing the distribution. Run many examples, measure how often the output is acceptable, and watch that rate over time. This is closer to how a factory measures defect rates than how an accountant checks a sum, you are managing a quality level, not verifying a single fact.

Where do AI "agents" fit?

Agents chain many probabilistic steps together, so small uncertainties compound, a 95%-reliable step run ten times in a row is reliable far less often end to end. That is why the better agent designs lean heavily on deterministic checkpoints between steps, and reserve full autonomy for low-stakes, easily-reversed tasks. More on this in AI capabilities & limits.

Is this just a technology question?

It's a leadership one. Deciding which decisions in your organisation deserve a deterministic guarantee, and which can run on a good-enough probabilistic estimate, is a judgement about risk, reversibility and cost. The tools are new; the decision is old.

Related in the Toolkit

Where to go next