It is frequently stated that there is no consensus definition of intelligence, particularly for artificial general intelligence. Legg and Hutter (2007) list about 71 definitions for intelligence. Also see Agisis.org for more definitions. One reason that there are so many definitions is that they each approach intelligence the way the “blind men” of legend approached describing an elephant. Each definition represents some aspect of intelligence. Each definition is a partial description of what a person or system with intelligence should be able to do, but none of them really defines what intelligence is.
Just what kind of thing is intelligence?
Let’s tentatively define machine (artificial) intelligence as the ability to achieve the range of cognitive activities that can be achieved by a human. These activities include using memory and knowledge, understanding, imagination, abstraction, planning, adapting, autonomous problem solving, problem identification and selection, identifying and resolving missing information, creating original solutions to problems. Some of the suggested definitions that can be easily measured could be used as an operational definition of intelligence.
An operational definition specifies a procedure by which a concept can be measured. It is a tentative definition, used as a pragmatic shortcut. For example, the Paris school system asked Alfred Binet at the start of the 20th Century to find a way to identify students who would need help to receive an effective education (van Hoogdalem, & Bosman, 2024). Binet and Theodore Simon looked for indicators of the constituents of intelligence, such as language skills, memory, reasoning, the ability to follow directions, and the ability to learn.
They devised tests of these abilities that they could use to operationally define intelligence by behaviors that they thought were correlated with intelligence and were relatively easy to score. To the extent that their chosen behaviors were valid predictors of students who would need help, their operational definitions were sufficient for the task. Similarly, Terman, et al. (1918) found that vocabulary size was also correlated with the kind of measures that Binet and Simon and others used in their tests. So, vocabulary size is also an operational definition for intelligence. These tests could not directly assess an individual’s intelligence, rather the test developers needed proxies or surrogates for intelligence that they hoped would be independent of the amount of schooling the child had received.
Operational definitions can be useful, but they can also be misleading or even wrong. Operational definitions are not benchmarks, they are merely tentative indicators. They are subject to revision as we learn more about the concept they are intended to measure and the relationship between the operational definition and the concept.
As more tasks were developed to assess parts of intelligence and more intelligence tests were developed, psychologists noted that scores on these various tests are correlated (Spearman, 1904). People who score high on one test are likely to also score high on other tests, and those who score poorly on one are also likely to score poorly on others. This correlation suggests that there might be something that is shared by these tests, and by the people who take them. Spearman called this shared statistical factor “general intelligence.” The tests were correlated because they each engaged some test-specific abilities along with a general ability. The shared general ability was responsible for the correlation. In Spearman’s view, this general ability was not just a statistical artifact, it represented a common ability that was assessed in each subtest. In other words, he suggested that general intelligence was not an “intervening variable,” but a “hypothetical construct” or more properly, a “theoretical construct,” although those precise terms were not coined until later.
An intervening variable is a summary of a statistical relationship. McCorquodale and Meehl (1948) cite as an example of an intervening variable, the notion of resistance in electricity. This notion summarizes the relationships among volts, amps, and conductors. Hypothetical constructs, on the other hand, are theoretical entities (constructs) or processes that are so far unobserved, but are thought to actually exist.
The intelligence definitions described earlier are ambiguous as to what kind of entity intelligence is because they describe intelligence in terms of input contexts, for example, specified problems, and machine-produced outputs. Both kinds of entities can map from inputs to outputs, but the distinction between intervening variables and theoretical constructs matters critically to predicting what a putatively intelligent system will do in novel situations.
McCorquodale and Meehl used some basic physics properties as an example of their distinction. When an electrical current is run through a conductor the voltage at the far end is lower than is the voltage near the source (for example, a battery). Ohm’s law provides a simple equation that describes the observed relationship. The electric current through a conductor is proportional to the voltage, depending on the resistance of the conductor. Resistance is an intervening variable. It describes the relationship between current and voltage, but it does not explain it.
Each conductor has its own resistance value that is independent of voltage. We now have a catalog of materials and their resistance values, but the only way to know the resistance of a new material is to measure it (by passing a known voltage and measuring the current). Predicting the resistance of a substance requires a theory of just why substances differ in their conductivity. It requires a theoretical construct, in this case, electrons.
Electrons are posited to be particles, and thus entities, but resistance is just an abstractive property. Electrons do not have the property of resistance, but they explain it. Conductors have free electrons that can move through the crystal lattice of the conductor’s structure. The more easily these electrons can move, the lower the resistance. A metal conductor consists of a lattice of atoms. Each atom has shells of electrons, the outermost ones freely dissociate from atoms and flow through the lattice. The more regular the lattice structure, the more easily the electrons flow, and thus, the lower the resistance. If we know the lattice structure, for example, if we know the impurities in the conductor that disrupt the lattice, then we can predict what the resistance of this conductor will be.
Intervening variables allow one to predict how known systems will behave, theoretical constructs allow one to predict how known and unknown systems will behave. Ohm’s law is sufficient for electricians to wire a building but would provide little help for a physicist analyzing a new material.
Intelligence appears to be a theoretical construct
Computer science and statistics make use of a concept closely related to an intervening variable, called a latent variable. A latent variable is not observable as either an input or output, but it captures the statistical properties of the phenomena in which it is used. For example, a hidden Markov model is used to capture the regularity of a sequence of events and latent Dirichlet allocation is used to identify the topics in a set of texts. Latent variables capture the statistical properties of the phenomena in which they are used. Latent Dirichlet allocation does not explain the topics in a document, it summarizes them. Another latent variable is the gradient along which an AI model adjusts its parameters. No one credible posits that there is an actual hill inside a computer program that implements gradient descent. Latent variables are important in artificial intelligence; they represent statistical relations among the variables, Their importance suggests that AI models depending on latent variables may consist of intervening variables.
John Searle (1980) described a thought experiment that implies that intelligence is a theoretical construct and not an intervening variable. He asks his readers to imagine a room with a small window through which a Chinese speaker passes tokens of written Chinese. Inside the room sits Searle, who speaks no Chinese. What he does have is book that describes perfectly what Chinese tokens to pass back out through the window in response to the tokens passed in. The book indexes each token by number, for example, and describes where in the room to find it. To the person outside the room, it appears as if she or he is having an odd kind of conversation in Chinese, presumably with someone who understands and also speaks Chinese. But Searle does not speak or understand Chinese.
The room, if you will, passes the Turing test and appears to understand the conversation, but no one in the room actually does understand. The rule book is just a book. It cannot be said to understand anything. Searle knows that he does not understand. The only one who understands is the Chinese speaker on the outside of the room.
The book is a complex intervening variable. It contains the statistical relation between inputs and outputs for a very large number of inputs. The room (with Searle and book inside) behaves as if it understands, but it does not. In Searle’s view something is missing that would be needed for understanding.
Searle argues that only brains have this missing something and can understand. Computers are limited to just syntax, to just manipulating meaningless symbols. He does not, however, describe just what is missing from the room or what property of brains allows them and not machines to understand. I disagree with Searle that brains have some ineffable understanding property that is absent from machines, but I do see the point that reading from a codebook is not what we mean by intelligence.
The distinction between intervening variables and hypothetical constructs is mostly missing in artificial intelligence thinking. For example, on the one hand, reasoning is defined as: “a cognitive process that involves using evidence, arguments, and logic to arrive at conclusions or make judgments.” And “However, despite the strong performance of LLMs on certain reasoning tasks, it remains unclear whether LLMs are actually reasoning and to what extent they are capable of reasoning.” The definition of reasoning views it as a hypothetical construct. But then the second statement uses reasoning as an expression of the statistical relationship between inputs and outputs, “as if" the model were reasoning, then it is being used as an intervening variable (Huang & Chang, 2023).
The reason that the distinction between intervening variables and theoretical constructs is important is the need to predict the future of artificial intelligence. Just as it is impossible to predict the resistance/conductance of an unknown substance from Ohm’s law, so it is impossible to predict the apparent intelligence of an AI model outside of its training distribution. The Chinese room could answer any questions asked of it in Chinese, provided that the answer was in the book. Large language models can answer any question asked of them, provided that the answer is, or is similar to, a pattern represented in the parameters of the model. Beyond that similarity distribution, they can have no accurate answers. Given the physics knowledge up to 1905, the ideas about the photoelectric effect and relativity written about by Einstein would surely have been out of distribution. Earlier, the change from a geocentric to a heliocentric view of the solar system would also have been out of distribution. These are scientific examples, but the same principles apply to everyday intelligence. Every significant invention requires the inventor to think differently from her or his predecessors, not just summarize existing relations.
As an intervening variable, generalization from known problems to future problems depends on the statistical similarity of the inputs and desired outputs. As a hypothetical construct, generalization depends on the application of the system's intelligence. Electrical resistance is a statistical relation, electrons and their properties are hypothetical constructs that explain a lot more than just voltage drops across a conductor. Narrow artificial intelligence can be described as an intervening variable, but artificial general intelligence necessarily requires that intelligence be a construct that is not dependent solely on the statistical relations between inputs and outputs. Describing intelligence as an intervening variable would be equivalent to claiming that everything that can be invented already has been.
Most in the field of artificial intelligence rely on benchmarks to evaluate the intelligence of the available models, but benchmarks are all but useless for the task. Binet and Simon chose intelligence test tasks that they thought would be independent of education but would be correlated with the need for added help in education. Those relationships are broken among modern machine intelligence benchmarks. The models are all strongly dependent on their “education,” that is, their training content. The performance on benchmarks cannot serve as independent proxies for intelligence or aptitude because they can be hacked and because any inference drawn from them is based on the logical fallacy of affirming the consequent. One cannot infer intelligence from the finding that the computer gave the answer an intelligent system would when there are other potential explanations for that answer—paraphrases of training patterns, in particular. One cannot infer that Searle understands Chinese from the fact that he is responding appropriately to Chinese questions.
Intelligence is both an engineering concept (something to build) and a natural science concept (something that we would like to understand). Benchmarks may be used as operational definitions of intelligence, but, as mentioned, they are weak definitions. Success on a benchmark means success on the selected operational definition, but it is unknown whether these operational definitions have any relationship to a theoretical concept of intelligence. At one time, it was widely believed that successful chess play would require general intelligence, but it turned out that excellent play could be achieved using a tree of potential moves and advanced algorithms for navigating that tree. Tree navigation is a good theory for how to build a chess player, but it is not sufficient for general intelligence.
A proper definition of intelligence as a theoretical construct will require a theory of intelligence. The theory would suggest the building blocks of intelligence and how those blocks contribute to the explanation of the behaviors by which it is observed, just as electrons provide the building blocks for understanding conduction and resistance. A rigorous theory of anything might be expected to provide the necessary and sufficient conditions for measuring the thing, but such definitions are rare in the sciences, at least when referring to the foundational concepts of the field. Biology does quite well without a rigorous definition of life, physics does well without a rigorous definition of mass, etc. Generally speaking, foundational natural concepts are typically difficult to define and even more difficult to define rigorously. Fortunately, science does not require rigorous definitions to make progress.