© December 2006 Paul Cooijmans
Hereafter I will explain which ability types I think make up the common factor in scores on high-range intelligence tests, and discuss some related facts. First some general remarks:
Contrary to what one might think, the ability types that contribute, and their respective weights, are not determined primarily by the construction and contents of the tests. They are rather a phenomenon of nature that expresses itself through the tests by means of answering behaviour. For instance, if a test has 5 verbal items and 95 numerical ones, it is tempting to assume the test will measure numerical ability more than verbal ability. But in reality this depends only on the variance of scores on the item types; if verbal scores go from 0 to 5 and numerical scores from 40 to 43, verbal ability still has the greater weight. It is answering behaviour, rather than test construction, that determines which ability types are measured and in what relative proportions they contribute to total score. And answering behaviour in turn reveals a state of affairs in the human brain which is, according to me, like a fossil record of evolution. The ability types that have provided the greatest evolutionary advantage account for the greatest variance in test scores, and have the highest loading on the general factor g (with the exception of abilities that have been near-necessities for survival, and as a result have gone in fixation, that is, are possessed by nearly all individuals; those abilities have near-zero variance, and therefore near-zero correlations with any other abilities, and therefore near-zero g loadings).
Next, I will here only deal with ability types that high-range tests are intended to measure, and not with other correlates of the tests. In other words, only with "construct validity", not other types of validity. Construct validity is a test's correlation with what it aims to measure. For instance, the negative correlation between high-range I.Q. and the presence of psychiatric disorders I have found is not included here, as these tests are not intended to measure the absence of disorders.
Also not discussed are ability types that require a supervised or laboratory situation to be measured, and are therefore absent in unsupervised high-range tests. Fortunately, the ability types that are covered by high-range tests are those with the highest loadings on general intelligence, so that the tests do not suffer much from the absence of the other types.
Finally I wish to make the observation that the extent to which an ability type can be improved or trained is inversely related to the extent to which that ability type contributes to total score, so to its loading on the general factor. Highly loaded types are less trainable than lowly loaded ones. And because lowly loaded types also contribute less to total score, this means that any training effect is thus more or less compensated, so that total score, or intelligence itself, is only to a small degree at most improvable. Also, the possible improvement through training is the complement or inverse of that ability type's evolutionary (genetic) component. In other words, the ability types with the highest g factor loadings are those with the largest genetic component; that have been subject to natural selection to the greatest extent; that have offered the greatest evolutionary advantage. For better understanding of this paragraph one may consider that ability types with a large variance and high g loading are bound to have low trainability, because otherwise the individual differences in those abilities would be evened out by education and would not have their large variance and high g loading to begin with. In other words, if g were trainable, any individual differences in it would long have been removed by education, and all adults would have the same I.Q.
The ability to understand, comprehend, grasp, what one is perceiving, observing, noting, seeing, reading. This is virtually identical to intelligence itself and does not show up as a separate factor in statistical analysis. It is like reading comprehension, only it is not restricted to verbal matters but operates in the numerical, spatial and visual-spatial areas as well. The reason to name it separately even though it is so close to g itself is that one intuitively feels it as a separate process. Pattern recognition can be understood as the connecting of entities in the outer world to symbols in the brain.
The ability to manipulate the symbols in the brain that correspond to entities in the outer world, and manipulate them according to the rules of logic which are intuitively known. This too operates in all the areas named under pattern recognition, and this too is so close to intelligence itself that it is not a factor on its own. But we feel it as a separate process.
Pattern recognition and Reasoning are, as it were, two aspects or processes of the same - intelligence - that each cover all of the lower factors. To perform a mental task, one first has to recognize pattern, and then one can reason. Reasoning is the menial helper of pattern recognition. It does the straightforward, linear processing, the labour.
From trying to create test problems I have some experiences with regard to pattern recognition and reasoning that I wish to report:
Items that focus on pattern recognition and have little or no reasoning requirement can span a vast range of difficulty, from the very easiest to the very hardest. But they fail to discriminate well in the low and average ranges. They do not always put those with low or average intelligence into their place. In the absence of a significant reasoning aspect, candidates with low or average intelligence will now and then penetrate into score ranges far above their true level. Even hard problems of pattern recognition are occasionally solved by people of very modest ability. The inclusion of the slightest reasoning requirement at once puts them into their place. Reasoning is an unsurmountable barrier for those below a certain range (the important paradox that most individuals of high intelligence can not believe or imagine this absence of reasoning ability in so many will be discussed elsewhere).
Items that require reasoning are therefore needed to obtain discrimination in the low and average ranges. It is hard though to create a high ceiling with pure reasoning problems. Reasoning seems to "max out" around I.Q. 135-140. Trying to extend reasoning tests beyond that one typically ends up relying on certain complex formal systems such as those in mathematics or formal logic. One then enters the realm of learnt skill. And learnt skills lose their g loading by the sheer fact that they are learnt, so that one is or may no longer be measuring g in that range then. What also happens is that a test creator believes to have created very difficult reasoning problems, but has really made pattern recognition problems without realizing the difference.
So for the top part of the range, pattern recognition becomes important. How well pattern recognition works to measure in that range, and if and to what extent "extended" reasoning problems that may require learnt skill must also be used or mixed in, is a matter of ongoing experimentation and analysis. A problem with pattern recognition items, other than the one already mentioned, is that the risk of ambiguity is greater than with straightforward reasoning problems.
The best problems may be those that combine pattern recognition and reasoning, whereby the reasoning aspect serves not only to measure reasoning ability but also to prevent the problem from occasionally being solved accidentally by people with poor reasoning skills who happen to get the pattern recognition aspect right.
The things just said on pattern recognition and reasoning apply to all three of the ability types discussed below: verbal, numerical and spatial.
Verbal ability can be seen as the application of pattern recognition and reasoning - so, of g - in the field of language. For at least thousands of generations, as good as every single human has been exposed to language from the very first to the very last moment of one's life. As a result, verbal ability has become the broadest component of intelligence, with the greatest spread over the intelligence spectrum (even deeply retarded persons can learn language) and the highest loading on g. A wide spread in an ability allows much room to correlate with other abilities, and as g loadings are derived from correlations, a wide spread may after many generations result in a high loading, provided that ability offers an evolutionary advantage.
Aspects of verbal ability that occur in high-range tests include vocabulary, knowledge, reading comprehension and grammar. Typically, the use of reference aids is allowed because otherwise it would be very easy to cheat by using them secretly, and experience with high-range tests that do not allow reference aids shows that many people indeed cheat shamelessly then, so that the test becomes a test for willingness to commit fraud, for dishonesty. Allowing reference aids also partly but not entirely removes the disadvantage a candidate has when taking a test in a non-native language. This disadvantage can further be reduced by avoiding idiomatic elements that do not transcend language barriers, and by avoiding anything dealing with pronunciation. Because of what is said in the previous sentence, the best verbal test items tend to be those designed in a language that is not native to the test creator (in case it is not at once clear why: in a non-native language one will be less inclined to use idiomatic and pronunciation elements, if only because one does not know many idiomatic expressions in a foreign language, and because one does not know the correct or usual pronunciation as well as a native does).
Some disadvantage always remains for non-natives, so one has tried to use tests without verbal items. Such non-verbal tests have their value but are less good measures of general intelligence than are tests that do require verbal ability - not because g was defined to include verbal ability (which it was not) but because they have lower g loadings when analysed among a variety of tests. Also, with non-verbal tests one will regularly select candidates with poor verbal communication skills (in any language, including their own), and especially purely pictorial tests tend to let through rude and uncivilized people. The verbal aspect is so important in g that one can almost not afford to leave it out. Tests that combine visual-spatial and numerical content do better than tests with only visual-spatial or only numerical content though.
This is the application of g in the field of numbers or quantities. It lies just under verbal ability in the hierarchy of g, requiring a little bit more pure g, therefore being mastered by a smaller group and having less variance. A smaller variance results in lower correlations with any other variables and therefore in a lower g loading, given the same evolutionary advantage. Do note the paradox in this paragraph: the fact that numerical problems are harder, are mastered by a more select group, tends to reduce their g loading within the general population.
There are a few serious problems when including numerical items in high-range tests. For instance, mental arithmetic can not be used as there is no way of checking if candidates indeed do it mentally and not on paper or on a calculator or computer. This is a pity, because mental arithmetic is a sublime test of numerical ability, of working memory, and of retrieval of information form the long-term memory, especially for the less than highest ranges. It would keep the more or less "innumerate" from scoring above their true level on these tests, from penetrating into the high range; it would act as a barrier, like reasoning does.
And, number series have become a problem as one can easily find solutions to many of them on the Internet due to criminal activities of a few regrettable specimens. This is a pity, because number series are sublime problems for pattern recognition. And contrary to what some think, they have little or no mathematical bias in the sense of required learnt skill.
So one has to experiment with other forms of number items, with as the main challenge how to make it hard enough without relying too much on learnt skill from mathematics. Existing, formalized, mathematical methods for solving certain complex problems have lost their g loading. Anything that significantly levers the mind after a learning process loses its g loading. These matters have not been solved yet, and the inclusion of numerical problems in unsupervised tests remains a serious problem. The best solution may be numerical pattern recognition problems of a non-series nature.
The application of pattern recognition and reasoning to spatial and visual matters. The essence of spatial ability is the mental rotation of objects. It tends to have somewhat less g loading than have verbal and numerical ability, possibly not because it is less important, but because it has a higher threshold in g and therefore is mastered by a more elite group with a smaller spread over the spectrum, leaving less room to correlate as a result of "restriction of range".
Something to watch out for when creating spatial problems is that some types of them can be solved, or in certain cases can only be solved, through learnt mathematical skill rather than pure spatial insight, which is less desirable for the reason already given above: learnt skill has no g loading. This is again the phenomenon of trying to extend reasoning beyond the point where it maxes out, and then unfortunately entering the realm of learnt skill, of existing formalized method.
These three ability types intercorrelate positively because they are all applications, in different fields, of the same: pattern recognition and reasoning. The correlations are however imperfect, leaving room for individual profiles over these fields. To get insight into one's profile it will usually be needed to take a number of tests of different nature. This is so because a single test, to be able to give a reliable profile, would have to be so large and comprehensive that almost no one would choose to take it. Candidates unfortunately have a preference for one-sided tests that allow them to focus on their strongest point. This popularity of one-sidedness is a problem in itself, and I am inclined to think all high-range tests should best consist of a mixture of item types, to force people out of this laziness. Note that a test containing a mixture of item types need not necessarily yield a profile, as the number of items per type is normally too low for sufficient reliability per type.
For most candidates, the differences between their levels in verbal, numerical and spatial ability are not very great (which is equivalent to saying the ability types intercorrelate positively). For a smaller group they do differ much, and such uneven profiles are often indicative of psychiatric or neurological disorders. A mistake sometimes made by individuals with uneven profiles is to say "I have an uneven aptitude profile so the concept of general intelligence does not apply to me, and I should only be judged by my score on my strongest ability type".
But of course the concept of g applies to everyone; there is no reason to excuse those with uneven profiles. After all, they are responsible for the intercorrelations between abilities just as well as those with even profiles. Without the uneven profiles, all abilities would intercorrelate perfectly, and all g factor loadings would be 1. Judging uneven profiles by their strongest side would mean an inflation of high intelligence and an abandoning of the concept of g. The best indicator of intelligence is an overall score over different ability types, even for those with uneven profiles. Their existence in no way reduces the meaning of g.
This is not to say that real-life achievement in one isolated field is not valuable; of course one achieves most in what is one's strongest side, and that is how humankind progresses. But we are speaking here of intelligence testing, not of life itself. And for performance in real life, a test score that reflects general intelligence has greater predictive validity than a score on just one ability type; in fact even greater than a score on a task-specific test.