When people talk about intelligence tests, they usually mean IQ. A single number, derived from a battery of tasks, meant to summarise cognitive capacity in the same way a blood pressure reading summarises cardiovascular health.
The problem is that IQ is a composite — a blend of different cognitive abilities that can move largely independently. Treating it as a single, unified measure is like using one number to describe both your strength and your flexibility. The components matter, and for most practical purposes, they matter separately.
Fluid intelligence is one of those components, and arguably the most important one for understanding how someone handles genuinely new problems.
The Gf/Gc distinction
The foundational distinction in intelligence research comes from Raymond Cattell (1963), who proposed separating fluid intelligence (Gf) from crystallised intelligence (Gc).
Crystallised intelligence is accumulated knowledge — vocabulary, factual recall, learned procedures, domain expertise. It grows throughout life as you acquire more knowledge and experience. A legal expert’s command of case law is crystallised intelligence. So is a pianist’s muscle memory, a chef’s flavour intuition, and a mechanic’s ability to diagnose an engine by sound.
Fluid intelligence is the capacity to reason in novel situations, independent of prior knowledge. When you encounter a problem you’ve never seen before, with no established method for solving it, what you’re drawing on is Gf. It involves pattern recognition, working memory, and the ability to hold multiple pieces of information simultaneously while deriving relationships between them.
Crystallised intelligence increases with age and experience. Fluid intelligence peaks in early adulthood — typically in the mid-to-late twenties — and declines gradually thereafter. This divergence is what makes the distinction practically important: the two measures tell you different things, and they move in opposite directions over a lifetime.
What the matrix reasoning research established
The most rigorous work on what fluid intelligence tests actually measure came from Carpenter, Just, and Shell (1990) in a landmark cognitive analysis of the Raven’s Progressive Matrices — the gold-standard fluid intelligence task.
In Raven’s matrices, you’re shown a grid of abstract visual patterns with one missing entry, and must identify which of several options correctly completes the pattern. It requires no language, no domain knowledge, and no prior training. The only variable is your ability to detect rules in novel configurations and apply them.
Carpenter and colleagues used think-aloud protocols and eye-tracking to model exactly what cognitive operations participants were performing. Their key finding: the difficulty of matrix problems is systematically determined by two factors.
-
The number of rules that must be held simultaneously. Easy problems have one rule (shape changes in a single dimension); hard problems have five or more interacting rules (shape, size, orientation, number, and shading all change simultaneously).
-
The ability to manage goal decomposition. Hard problems require decomposing the overall task into sub-goals, solving each one, and holding the partial solutions in working memory while solving the rest.
This meant that fluid intelligence, as measured by matrix tasks, was essentially a measure of working memory capacity under relational complexity. Not raw speed. Not knowledge. The ability to keep multiple relational constraints active simultaneously and manipulate them toward a solution.

The working memory connection
Kyllonen and Christal (1990) followed Carpenter’s work with a series of studies directly measuring working memory capacity alongside fluid intelligence assessments across large military samples. Their finding: working memory capacity and fluid intelligence were so highly correlated — often above r = .80 — that they were effectively measuring the same construct.
This is the deep insight behind modern fluid intelligence assessment: what separates high-Gf from low-Gf individuals is not how much they know, not how fast they process, but how many relational units they can hold active in working memory at once and operate on without losing track.
This has specific practical implications. It means Gf is most predictive in tasks that require:
- Multi-constraint reasoning — problems with several interacting variables
- Novel problem structures — situations where no existing schema applies
- Transfer across domains — applying a principle learned in one context to a structurally similar situation in another
It is less predictive in tasks that are primarily about pattern matching to existing knowledge, procedural execution, or speed on well-practised skills. Those draw on crystallised intelligence and processing efficiency — related but distinct constructs.
Why a single IQ number is insufficient
Standard IQ batteries blend Gf and Gc into a single composite, along with measures of processing speed, spatial reasoning, and verbal ability. This composite predicts many outcomes reasonably well — educational attainment, job performance across a wide range of roles, health outcomes over a lifetime.
But it conceals variation that matters.
A person with very high Gc and modest Gf will perform well in established knowledge domains and struggle in genuinely novel ones. An expert in a fast-moving technical field may have high Gf but average Gc in domains outside their expertise. The IQ composite gives both of these people similar scores while describing quite different cognitive profiles.
Primi (2001) extended Carpenter’s LLTM (Linear Logistic Test Model) framework to measure the specific difficulty parameters in matrix reasoning — identifying exactly how many cognitive rules each item requires — which enabled more precise measurement of Gf that was less confounded by general academic achievement. This research line is the basis for modern adaptive fluid intelligence assessments: rather than administering a fixed battery, the test updates item difficulty in real time based on performance, converging on a precise estimate with fewer items.

What Gf assessment looks like in practice
A well-designed fluid intelligence assessment:
Uses novel abstract material. Items that rely on verbal or numerical knowledge introduce Gc contamination — you’re partly measuring what someone has learned, not their raw reasoning capacity.
Varies relational complexity systematically. Items should span a range from single-rule to multi-rule problems, so the test can locate where an individual’s working memory capacity ceiling sits.
Accounts for fast-guessing and effort variation. Wise (2017) established that test scores are meaningfully distorted by rapid, low-effort responses — particularly in low-stakes testing contexts. Valid Gf measurement requires distinguishing genuine difficulty from effort withdrawal.
Reports with appropriate precision. Because Gf is estimated from behavioural data, there is inherent measurement uncertainty. Presenting a point estimate alongside confidence information is more honest and more useful than a single score presented as fact.
The practical value of measuring Gf separately from Gc is that it gives you information that a general intelligence score or an academic record doesn’t provide: how well does someone reason from scratch when their prior knowledge doesn’t transfer? That’s the question that matters most in novel environments — and it’s the question that fluid intelligence assessment is specifically designed to answer.