Set theory, probability & combinatorics: the basics, for leaders

A recruiter tells you 70% of your applicants have a degree and 60% have hands-on experience, so "most candidates have both." A board member is sure the rare-but-positive test result means the problem is almost certainly real. A planner promises to "try every combination" of a six-step process by Friday. Each statement is wrong, and the same three pieces of basic maths explain why.

The quick version

Sets are about overlap. Two groups that each cover most of a population can still share very little, "70% and 60%" does not mean "most have both."
Probability is about how often something happens in the long run, and it bends hard around base rates, how common the thing was before you got any evidence.
Combinatorics is about counting arrangements. The number of ways to combine choices explodes so fast that "test everything" is usually a fantasy, not a plan.
You don't need the formulas. You need the three habits of mind they install, and to spot when a confident claim has skipped one.

The idea in depth

None of this is new, and none of it is hard once you see it as three questions rather than three branches of mathematics. The questions are: what belongs together? (sets), how likely is it? (probability), and how many ways can it be arranged? (combinatorics). Below, each one comes with the theory, the leadership trap it explains, and the move it asks you to make.

Sets: overlap is not addition

Set theory was built by the German mathematician Georg Cantor in a series of papers from 1874 onward, the formal study of collections, what's in a group, what's shared between groups, what's left out (MacTutor History of Mathematics; Britannica). You met it as Venn diagrams. The one rule that matters at work is the inclusion–exclusion principle: to count how many things are in group A or group B, you add A and B and then subtract the overlap, because you just counted it twice.

That subtraction is where intuition fails. Two big sets in a small world are forced to overlap a lot, but two big sets in a large world need not overlap at all. "70% have a degree, 60% have experience" tells you the overlap is somewhere between 30% and 60%, and nothing narrower until you actually measure it. So the move is: when someone hands you two percentages and a conclusion about "both," ask for the intersection directly. Never add or assume it.

flowchart LR
  A(["Have a degree, 70%"]) --- O(("Overlap:
measure it,
don't assume"))
  B(["Have experience, 60%"]) --- O
  O --> R("'Both' could be
anywhere from 30% to 60%")

Two overlapping groups: the share with both traits is the intersection, which the two headline numbers alone can't pin down. Leaders Loop

Probability: the base rate does the heavy lifting

Probability is the long-run frequency of an outcome, and the most useful tool in it is Bayes' theorem, a rule for updating a belief when new evidence arrives. You rarely need to compute it. You need its core lesson: evidence is read against a starting point, the base rate, and people routinely ignore that starting point. Daniel Kahneman and Amos Tversky described this, they called it "insensitivity to prior probability," and it's since become known as base-rate neglect, in "On the psychology of prediction" (Psychological Review, 1973): shown a vivid description and a known population split, people judge by how well the description fits a stereotype and discard how common each group actually is.

This is why a "highly accurate" test for a rare problem still produces mostly false alarms. If something occurs in 1 in 100 cases and your test is 95% accurate, a positive result is still more likely to be wrong than right, because there are simply far more of the 99 to misfire on than the 1 to catch. The fix is presentation. Gerd Gigerenzer and colleagues showed that doctors given the same facts as natural frequencies ("10 out of every 1,000…") reason far more accurately than when given the identical facts as percentages: in one study, 16 of 24 physicians found the right answer with frequencies versus just 1 of 24 with probabilities (Gigerenzer, "What are natural frequencies?", BMJ, 2011; Hoffrage et al., Frontiers in Psychology, 2015).

A 95%-accurate test for a 1-in-100 problem cries wolf more often than it spots the wolf.

The same pair of researchers found a related trap, the conjunction fallacy: in the famous "Linda" problem, most people rate "Linda is a bank teller and a feminist" as more probable than "Linda is a bank teller", which is impossible, since every feminist bank teller is already a bank teller (Tversky & Kahneman, 1983; Kahneman, Thinking, Fast and Slow, 2011). A richer, more specific story feels likelier even as each added detail makes it less likely. So the move is: before you act on a striking signal, ask "how common was this to begin with?" and re-state the odds as counts out of a real population. Distrust any forecast that gets more confident as it adds detail.

flowchart TB
  S("Striking new signal
(a positive test, a vivid case)") --> Q(["Ask: how common was
this BEFORE the signal?"])
  Q --> F(["Restate as counts:
'X out of 1,000'"])
  F --> D(("Now judge
the real odds"))
  S -. "the trap" .-> T("Judge on the signal alone
→ base-rate neglect")

The base-rate habit: anchor on prior frequency, convert to plain counts, then weigh the new evidence. Leaders Loop

Combinatorics: counting before you commit

Combinatorics is the maths of counting arrangements, and its central fact for a leader is how fast the count grows. Two rules cover most cases. If order matters, you have permutations; if it doesn't, combinations. The number of ways to order n things is n factorial (n!), and factorials are violent. Five tasks can be sequenced 120 ways; ten tasks, 3.6 million; fifteen, over a trillion. The "test every combination" promise dies on contact with that growth.

The same explosion is the honest case for structured exploration rather than brute force. You can't try everything, so you reason about the shape of the option space instead, which is exactly what MECE structuring and issue trees are for, and why hypothesis-driven problem solving beats exhaustive search. So the move is: before promising to evaluate "all the options," count them. If the count runs to thousands, the real plan is a way to prune the space (eliminate whole branches), not to walk it.

An honest limitation. These tools assume you can define the set, name the outcomes, and know (or estimate) the base rate. Plenty of leadership lives outside that, genuine novelty, where no frequency exists yet, and politics, where "the options" aren't a fixed list. The maths sharpens the questions; it does not answer the ones where the inputs are unknowable. Treat it as a lens for catching errors, not a machine for producing decisions.

A worked example

You run a 50-person engineering org. A new automated screen flags code commits as "likely to contain a security flaw," and the vendor says it's 90% accurate. Last week it flagged 30 of your 600 commits. A nervous exec wants every flagged engineer pulled into review. Should you?

Run the three questions. (All figures below are illustrative, to show the method.)

Base rate first. Suppose genuinely flawed commits are rare, say 2% historically. Out of 600 commits, that's about 12 real problems and 588 clean ones.
Apply the accuracy as counts. "90% accurate" catching the 12 real ones gives ~11 true flags. But 10% of the 588 clean ones get wrongly flagged too, about 59 false alarms. Of roughly 70 flags you'd expect, only ~11 are real: a flagged commit is clean far more often than not.
Count the combinations before you scale. "Pull in every flagged engineer" sounds tidy until you note that reviewing 30 flags, each needing two reviewers chosen from a pool, multiplies fast, and most of that effort lands on false alarms.

The set-and-probability reading reframes the call: don't review all 30 in a panic, and don't ignore the tool either. Use it to rank, route the highest-confidence flags to a quick triage, and tell the exec the plain-English truth, "most flags will be false, so we'll spot-check, not stop the line." That sentence is the whole argument, and it's one a busy room can act on.

Frequently asked questions

Do I actually need to do the maths in a meeting?

No. You need to ask three questions out loud: what's the overlap, what was the base rate, and how many options are there really? The arithmetic, when you need it, is addition and division, and it's far easier if you convert everything to counts out of 100 or 1,000 first.

What's the single most common error to watch for?

Base-rate neglect. Reacting to a vivid signal, a positive test, a dramatic anecdote, one churned customer, without asking how common the underlying thing was to start with. It's the error behind most false alarms and most overreactions.

Why convert percentages into "X out of 1,000"?

Because people reason far more accurately with frequencies than with percentages, a finding Gigerenzer and colleagues have replicated many times over. "10 out of 1,000" keeps the size of the population in view, which is exactly the piece percentages quietly hide.

Isn't combinatorics just for puzzles?

It's the reason "we'll test every option" is usually undeliverable, and the reason structured problem-solving exists. Counting the option space before you commit to searching it is one of the best-spent thirty seconds in planning.

Where does this stop being reliable?

When the inputs aren't knowable, true novelty with no track record, or situations where "the options" aren't a fixed set. The tools catch errors of reasoning; they can't manufacture certainty where the world hasn't supplied any.

Related in the Toolkit

First principles vs heuristics vs analogical reasoning, when to compute the odds and when a good rule of thumb is the smarter call.
Deductive, inductive & abductive reasoning, probability is how you reason honestly from incomplete evidence.
Formal logic, argument structure & fallacies, the conjunction fallacy and base-rate neglect are probability's most common reasoning traps.
MECE structuring, issue trees & driver trees, set thinking made operational: clean, non-overlapping buckets.
Hypothesis-driven problem solving, the answer to combinatorial explosion: prune the option space, don't walk it.
Mental models & cross-disciplinary latticework, these three are core members of any thinker's latticework.
Macroeconomics: GDP, inflation, interest rates, the cycle, where probabilistic thinking meets forecasts you can't fully trust.
Descriptive statistics (mean, median, mode, variance, SD), the natural next step once you can read a probability.

Where to go next

Daniel Kahneman, Thinking, Fast and Slow (2011), the definitive, readable account of base-rate neglect, the Linda problem and how intuition mishandles probability.
3Blue1Brown, "Bayes theorem, the geometry of changing beliefs" (YouTube, 2019), a 15-minute visual that makes updating-on-evidence genuinely intuitive.
Gerd Gigerenzer, "What are natural frequencies?" (BMJ, 2011), short, practical, and the reason you should restate every risk as counts out of a population.
MacTutor, "The beginnings of set theory", a clear history of Cantor's work if you want the foundations behind the Venn diagram.