The psychiatrist Carl Gustav Jung (1875-1961) has remarked that scientists tend to see their own ideas, expectations, or prejudices reflected in the data they gather; and no doubt that is true for many bad and/or fraudulent scientists and pseudoscientists. No doubt it was true for Jung himself. But honest researchers with an open mind, unbiased, free from prejudice, not serving any interest other than truth-finding, and following proper scientific methods, should arrive at objective results rather than project personal expectations. They should arrive at the truth.
It happens to be so that I am one from that last category; my only interest is learning the truth, I do my best to use and develop methods that reveal the truth, and I am biologically incapable of prejudice and deceit. Nevertheless, on rare occasions criticism has reached me to the extent of "Your tests/statistics are biased/not stochastic" or "Your tests only reveal your own prejudices". Although such claims are never substantiated and upon closer inspection appear to stem from ulterior motives, they have baffled and appalled me every time again, and pushed me to ever greater rigour in striving for integrity in data and method.
One way to detect researcher bias is to compare one's findings to one's possible expectations, as done below. Included are only (and all of) those findings that are based on sufficient amounts of data to be confident of them.
This the first thing I learnt when I started administering tests; with problems that I intuitively thought were of appropriate difficulty for a high-range test, the scores were concentrated just above zero, and to obtain a score pattern with something resembling a "left tail", I had to include a fair number of, in my eyes, childishly easy items.
At first I assumed I was doing something wrong, and that my problems were simply bad or unsolvable. But over the years, after gaining experience with multiple tests, hundreds of candidates, and item analysis, I found that this was not the case, and that my problems had generally been of quite high quality from the start on, but much more difficult than I thought. I have accepted this phenomenon, adapted to it, and gone to lengths to find and correct possible wrongs on my part contributing to it.
These differences are treated in detail in Sex differences on high-range I.Q. tests analysed. Prior to finding this, I had never suspected the existence of such differences, and it had always been self-obvious to me that while the sexes might differ in physical strength, they were at least equal in intelligence. Few things have ever amazed me more, and it took years to accept it. Initially I was somewhat embarrassed, and felt guilty for putting unsuspecting female candidates through the ordeal of taking such shamelessly biased and unfair tests. I scrutinized every detail of the tests, hoping to uncover that hidden sex bias. I even constructed a test with only the types of tasks on which women are known to outscore men. I still keep studying the sex differences with statistical methods whenever the amount of female data allows such, to ensure that the tests measure the same in females as they do in males. But to date, I can say in honesty that, other than by requiring intelligence, the tests are not biased against females.
I do have a hunch that my particular personality, with its inborn incapability of prejudice, plays a role in letting me detect sex differences. I am naturally inclined to treat all people the same, regardless of sex, nationality or whatever, and test scoring is no exception to that. I rigidly score each test submission by the same standards. This shortcoming may set me apart from so many others who, frankly, tend to treat girls and women just a little more leniently than they treat boys and men. The fact that I lack that so important social heuristic and brutally treat everyone the same may be what puts it on my path to reveal differences that others, knowingly or unawares, hide under the cloak of humaneness, courtesy, and empathy.
Although superfluous, to avoid misunderstanding it is pointed out that the sex differences found in high-range testing do not imply an average sex difference in intelligence over the full range. This is so because the differences in the high range can also be explained by a possible sex difference in the spread of I.Q. over the full range.
This result has surprised me, and for years I thought it might be due to sampling error and would disappear as more data came in. It did not, and I am now fairly confident that there is indeed a significant, albeit small (about .3 to .4), negative correlation between high-range mental test scores and indicators of disorder and deviance such as the actual presence of psychiatric disorders, the presence of such in relatives (which reveals genetic disposition), and personality test scores related to deviance.
The main reason I had not expected this result is the persistent notion in "giftedness" circles that "gifted" individuals often experience psychosocial or psychiatric problems and may need special treatment and help. At events related to "giftedness" one can nowadays see committees of all sorts of (often quack) therapists, eager to "help", and whenever they spot someone "diagnosed" with "giftedness" those vultures come down from the trees. I have believed in this interpretation of "giftedness" until about the late 1990s, but gradually became sceptical as I saw the statistics build up, and as I got in contact with many people with known I.Q. scores on many tests; my experience in such contact is that, within the high range of intelligence, those with higher I.Q.'s are more normal, less deviant, undergo less psychosocial suffering, than those with somewhat lower I.Q.'s.
Do notice that the fairly small size of the negative correlation certainly allows some part of the population of intelligent to be deviant, disordered, or suffering; but it is apparently not the intelligence that causes their problems. Also, this result by no means excludes the possibility of a positive (genetic) link between intelligence and certain disorders, like schizophrenia and Asperger syndrome. The eventual correlation may result from a complex of mechanisms, such that a possible positive genetic relation is turned into negative by for instance (1) the fact that a high I.Q. suppresses the expression of the disorder, and (2) the fact that, in cases where the disorder does become fully expressed, the disorder depresses one's I.Q.
Wanting to know this was my original main motivation to create difficult I.Q. tests. In the 1990s, several experts assured me that it was impossible to measure intelligence validly higher than about the 99th centile, and I was reluctant to believe that because I observed such huge differences in comprehension between people who all had scored at or above that level. It seemed absurd that those differences could not be reflected in test scores. In The differentiation hypothesis of g tested I have analysed this matter insofar yet possible, and it seems that intelligence is measurable in the high range after all, be it that the g loading of tests may decrease somewhat as I.Q. goes up. This result should not be considered final though.
While in this case my expectation is apparently on its way to be confirmed, I must stress that I have no "interest" in it being one way or the other. I will not be "glad" when the expectation is confirmed. I do not "care" whether g breaks up at high I.Q. levels or not. Both outcomes have their pros and cons. Is intelligence measurable up to extreme levels? Then that will offer practical and research prospects. Is it not? Then everyone who scores at the 99th centile can be reassured: there is no need to feel intimidated by those braggers with much higher I.Q.'s than you have. It means nothing.
As an aside, it may be noted that many of the tests used in regular psychology can indeed not measure intelligence at high levels. Over the years I have begun to understand that one of the reasons for that is the fact that such tests are often purposely created to give sex-equal results, by leaving out or counterbalancing problems on which one sex does better than the other. Since males do better on difficult problems, difficult problems are left out, resulting in tests with low ceilings and no headroom for males to outscore females. In some cases one tries to cover this up by employing a short time limit to crank up the ceiling. The speed factor thus introduced then reduces the test's g loading at high levels (and therewith the possible male-female difference) even more. Note the prejudice that underlies this approach to test construction: the paradigm of sex equality is by decision imposed upon reality, instead of letting scientific curiosity as to the actual state of affairs prevail.
The above findings, all based on large amounts of data gathered over the course of twenty years, do not on the whole reflect my expectations. In three cases they contradict my expectations, and in one case my expectation is apparently going to be confirmed. Each of these findings could be (or become) the opposite of what it is now without any emotion on my behalf. When I learn that something is a certain way, I accept it regardless of my possible expectation, as my only interest is finding out the truth. Based on a comparison of findings and expectations, there is no reason to suspect that any researcher bias or prejudice is operating.