Pitfalls for high-range psychometricians

Introduction

Beginning or amateur constructors of high-range tests for mental abilities, as well as a disturbing number of more experienced psychometricians, characterize themselves by a collection of persistent errors which result from the giving in to otherwise understandable temptations and human weaknesses. A list of these aberrations with brief comment to each follows. More detailed explanations of why these behaviours are bad can be found in for instance Recommendations for conducting high-range intelligence tests.

The pitfalls

Allowing retests

Why this is bad is explained excellently in the aforementioned article. The temptations toward allowing retests are generally the following:

Gathering more data (combining the retests with the first attempts gives the impression of much more data than there actually is);
Gaining more income through test fees;
Giving in to the obvious demand for retests from candidates ("empathy");
Wanting to be "fair" by allowing a second chance;
Obtaining "stricter" (lower) norms (following the assumption that retest scores are higher than first scores, which is more true for bad tests than for good tests).

Each one of these apparent benefits is blown away by the disadvantages of retesting.

Offering homogeneous (one-sided) tests

Homogeneous tests tend to have less generality than heterogeneous tests, and with regard to I.Q. tests that implies lower validity and more room for one's score to be off in either direction, compared to one's true g level. Homogeneous tests are also less robust, more vulnerable to score inflation. They are the prime target of cheaters, answer publishers, answer sellers, and other such scum. Unethical people tend to have uneven aptitude profiles (which does logically not imply that people with uneven aptitude profiles are necessarily unethical, as in "a cow is an animal but not all animals are cows"), and therefore choose one-sided tests to cheat with; they lack the talent for tackling a heterogeneous test. Temptations toward offering homogeneous tests are:

Giving in to the obvious demand for homogeneous tests from candidates ("empathy"); one-sided tests are very far more popular than are heterogeneous tests, as candidates know (or think to know) they will score highest on a test that measures only their strongest side;
Gathering more data;
Gaining more income through test fees;
They are simply easier to create for a creator who oneself has an uneven aptitude profile (and in recent generations of test creators we see a worrying prevalence of such profiles);
Wanting to be "fair" to candidates with other native languages (but this problem is not actually solved by using homogeneous tests because of those tests' limitations and disadvantages; the best one can do is offering heterogeneous nonverbal tests, and encouraging potential test creators in foreign countries to construct heterogeneous tests in their own language).

Answering questions about the tests

Some candidates tend to ask questions about the tests, their contents, even about particular test items, and answering those to satisfaction of the candidate will always put that candidate at an advantage compared to other candidates and must therefore never be. The temptation toward answering lies in "empathy", in wanting to "help", and in not wanting to be seen as a "Dutch uncle", as a stiff, dull, strict, unresponsive person. Almost no one is unempathic enough to be strict in these matters, as is betrayed so mercilessly by questions and remarks I myself often receive from candidates whose words all too often reveal that with other test designers they are given the information about particular test items they desire, they are allowed retests, et cetera. The curse of "empathy", and the tremendous fear of being seen as unsympathetic or pedantic, render very few suitable as testers.

Giving credit to "alternative answers"

The reasons for not doing this, and the correct way to treat the phenomenon, are explained elsewhere. The temptation toward this habit comes from wanting to respect other people's different ways of looking at a problem but lacking the reasoning ability to see that certain answers are logically wrong, from not recognizing that the alternative answer is a pareidolic or apophenic delusion (so, that it is like the patterns one may see in random noise, clouds, and so on; candidates may be very confident that such a pattern is the intended solution, and thus, with the best possible intentions, trick the scorer into accepting it as a valid answer; but in reality, that pattern exists only in the paranoid-critical mind of the candidate, which is also why it is impossible to create tests that do not have pareidolic, apophenic "alternative answers"; those are not a property of the test but only of the candidate's mind), from being afraid to judge over what is right and wrong, and from personal sympathies for particular persons, including oneself (for yes, some test scorers partly or wholly take the tests they themselves score, score the tests they themselves take, and decide which of their own alternative answers are counted right and even to which norms their scores correspond; in fact this behaviour is not atypical for claimants of "the world's highest I.Q.").

Revealing the intended answers

Especially beginning test constructors tend to be shocked or worried when they see candidates score disappointingly lower than they - the candidates - expect, and sometimes want to soothe the disbelief and frustration of the candidate by explaining what the intended answers are, so that the candidate will recognize them as better than the given answers and thus be able to accept that the latter were wrong. Why this is a mistake is explained in the aforementioned article Recommendations for conducting high-range intelligence tests; the key phrase is motivation for secrecy.

Norming a test with data from a different test environment

When a test item is moved to a different environment - another test with other, fewer or more problems - its statistical behaviour changes. For instance, items that were originally accompanied by similar ones (possibly of varying difficulty) will become harder when some of those accompanying items are no longer there; this is probably because those other items served as "examples" in some way, or formed a gentle slope.

A typical situation is that wherein a new, shorter test is created by selecting items from an earlier, larger test. In the new test, the attention of the candidate will be less dispersed because of the lower number of items, and the items will on the whole tend to become easier, although a few may become harder. Using data from the earlier test to arrive at norms for the new test may result in too high norms. This is ironic, as the new test was of course created to achieve greater psychometric soundness, for instance by selecting items by their individual statistical properties, or by leaving out items to which the answers have leaked out. Inherent to this is that the scores from new candidates taking the shorter test are not comparable with (hypothetical) scores from candidates who took the original test and whose scores were converted to scores on the new test (which was a subset of the longer test).

Removing "compromised" items

Removing items, the answers to which have leaked out or been published, makes it all too easy for evil persons to destroy the work of a test creator. It is equivalent to a shop owner by default handing over the money to any robber without resistance. It is therefore an extremely bad approach that inevitably leads to the end of that test creator's career. It is giving in to terror.

It must on the other hand be stressed that in those cases not the test items have been "compromised", as some horribly say, but the culprits, the answer publishers, the answer leakers, the cheaters, have been compromised; by themselves, and for good. For good work can never be "compromised" by worthless piles of faeces that possess no ability of their own and therefore seek fulfilment in life through vandalizing the work of others.

The only right approach is to track them down, keep records of who they are and what they have done, and call them to account, no matter how long it takes. They must be thoroughly aware that never in their lives they will be safe again until they have redeemed their shameful debt, and that with every second the answers leaked by them remain published, their inescapable suffering when caught, no matter where or when, grows exponentially, and the whip will come down harder and oftener.

[More articles on intelligence]