Elsevier

Neuropsychologia

Volume 40, Issue 8, 2002, Pages 1196-1208
Neuropsychologia

Investigation of the single case in neuropsychology: confidence limits on the abnormality of test scores and test score differences

https://doi.org/10.1016/S0028-3932(01)00224-XGet rights and content

Abstract

Neuropsychologists often need to estimate the abnormality of an individual patient’s test score, or test score discrepancies, when the normative or control sample against which the patient is compared is modest in size. Crawford and Howell [The Clinical Neuropsychologist 12 (1998) 482] and Crawford et al. [Journal of Clinical and Experimental Neuropsychology 20 (1998) 898] presented methods for obtaining point estimates of the abnormality of test scores and test score discrepancies in this situation. In the present study, we extend this work by developing methods of setting confidence limits on the estimates of abnormality. Although these limits can be used with data from normative or control samples of any size, they will be most useful when the sample sizes are modest. We also develop a method for obtaining point estimates and confidence limits on the abnormality of a discrepancy between a patient’s mean score on k-tests and a test entering into that mean. Computer programs that implement the formulae for the confidence limits (and point estimates) are described and made available.

Introduction

Estimating the rarity or abnormality of an individual’s test score is a fundamental part of the assessment process in neuropsychology. The procedure for statistical inference in this situation is well known. When it is reasonable to assume that scores from a normative sample are normally distributed, the individual’s score is converted to a z score and evaluated using tables of the area under the normal curve [20], [23]. Thus, if a neuropsychologist has formed a directional hypothesis concerning the individual’s score prior to testing (e.g. that the score will be below the normative mean), then a z score which fell below −1.64 would be considered statistically significant (using the conventional 0.05 level). More generally, and it could be argued more usefully (given that any significance level is an arbitrary convention that does not address the issue of severity), the probability for z provides the neuropsychologist with information on the rarity or abnormality of the individual’s score. Thus, for example, if a patient obtained a z score of −1.28 on a given test, then a table of the normal curve will tell us that approximately 10% of the population would be expected to obtain a score lower than this.

Many tests used in neuropsychology are expressed on a conventional metric such as an IQ scale (mean=100, S.D.=15), or test score (mean=50, S.D.=10). In such cases it is clearly often not necessary to convert the score to z to arrive at the estimate of abnormality. For example, a patient obtaining a score of 85 on the Working Memory Index of the WAIS-III [41], [42] is exactly 1 S.D. below the mean. Most neuropsychologists will know that therefore approximately 16% of the population would be expected to obtain a score as low or lower than this. However, the principle in this latter example is identical, i.e. the score is referred to the normal curve.

In the standard procedure just described the normative or control sample is treated as if it was a population; i.e. the mean and standard deviation are used as if they were parameters rather than sample statistics. When the normative sample is reasonably large this is justifiable. However, Crawford and Howell [10] point out that there are a number of reasons why neuropsychologists may wish to compare the test scores of an individual with norms derived from a small sample. For example, although there has been a marked improvement in the quality of normative data in recent years, there are still many useful neuropsychological instruments that have modest normative data. Even when the overall N for a normative sample is reasonably large, the actual sample size (n) against which an individual’s score is compared can be small when the sample is broken down by demographic characteristics. Secondly, many clinical neuropsychologists have gathered local norms for neuropsychological instruments, but because of the time and expense involved, the size of the normative samples are often modest.

Finally, in recent years there has been an enormous resurgence of interest within academic neuropsychology in single case studies [3], [4], [16], [21], [25], [33]. In many of these studies the theoretical questions posed cannot be addressed using existing instruments and therefore novel instruments are designed specifically for the study. The sample size of the control or normative group recruited for comparison purposes in such studies is typically <10 and often <5.

Crawford and Howell [10] have described and illustrated the use of a method that can be used to compare an individual with normative or control samples that have modest N. Their approach uses a formula given by Sokal and Rohlf [37] that treats the statistics of the normative or control sample as statistics rather than as population parameters and uses the t-distribution (with N−1 degrees of freedom (d.f.)), rather than the standard normal distribution, to evaluate the abnormality of the individual’s scores. Essentially, this method is a modified independent samples t-test in which the individual is treated as a sample of M=1, and therefore does not contribute to the estimate of the within group variance. The formula for this test is presented in Appendix A.1.

The disadvantage of the standard (z score) method is that, with small samples, it exaggerates the rarity/abnormality of an individual’s score. This is because the normal distribution has “thinner tails” than t-distributions. Intuitively, the less that is known, the less extreme should be statements about abnormality/rarity. The z score method treats the variance as being known, when it is not, and consequently makes statements that are too extreme. A fuller illustration of this will be provided in a worked example, but in the interim, suppose that an individual obtains a score of 20 on a test and that the mean and S.D. for this test in a control sample are 40 and 10, respectively. If the N of the control sample was 10, then the estimate provided by the modified t-test procedure is that approximately 4.4% of the population would obtain a score lower than the individual’s score. The z score method exaggerates the rarity of the individual’s score as the estimate it provides is that approximately 2.3% of the population would obtain a lower score.

Up to this point we have been concerned with the simple case of comparing a single test score obtained from an individual with a normative or control sample. However, in the assessment of acquired neuropsychological deficits, simple normative comparison standards have limitations because of the large individual differences in premorbid competencies. For example, an average score on a test of mental arithmetic would represent a marked decline from the premorbid level in a patient who was a qualified accountant. Conversely, a score that fell well below the normative mean does not necessarily represent an acquired deficit in an individual who had modest premorbid abilities [5], [15], [28].

Because of the foregoing, considerable emphasis is placed on intra-individual comparison standards when attempting to detect and quantify the severity of acquired deficits [6], [24], [39]. In the simplest case, the neuropsychologist may wish to compare an individual’s score on two tests; a fundamental consideration in assessing the importance of any discrepancy between scores on the two tests is the extent to which it is rare or abnormal. Payne and Jones [29] developed a formula for this purpose. The method requires the mean and S.D. of the two tests in a normative sample and the correlation between them. The two tests must be on the same metric, or they must be converted to a standard metric (z scores are normally used). The formula provides an estimate of the percentage of the population that would exhibit a discrepancy that equals or exceeds the discrepancy observed for a patient.

A number of authors have commented on the usefulness of this formula in neuropsychology [7], [23], [26], [32], [34], and it has been applied to the analysis of differences on a variety of tests [1], [19], [27]. However, just as was the case for the standard method of comparing a single score with a normative sample, the Payne and Jones [29] formula treats the statistics of the normative or control sample as if they were population parameters. This limits the valid use of the method to comparisons of an individual with a large normative sample.

Crawford et al. [11] developed a method that treats the normative statistics as statistics. Like the Payne and Jones [29] method, it requires that the normative or control sample mean are converted to a common metric (z scores). The patient’s difference is divided by the standard error of the difference, yielding a quantity that is distributed as t with N−1 d.f. (where N is the sample size, i.e. it does not include the individual). Essentially then this is a modified paired samples t-test. The formula for this test is presented in Appendix A.2.

Technically, this method is more appropriate than the Payne and Jones [29] method for comparison of an individual’s test score difference with differences from any size of normative or control sample (i.e. our test norms are always obtained from a sample rather than a population). However, its usefulness lies in its ability to deal with comparisons involving normative or control samples that are modest in size; the Payne and Jones [29] method systematically overestimates the abnormality of an individual’s test score difference in such comparisons.

Crawford et al. [11] suggest that their method is particularly useful in single case studies where, as noted, the control samples against which a patient is compared usually has a small N. A common aim in neuropsychological case studies is to fractionate the cognitive system into its constituent parts and it proceeds by attempting to establish the presence of dissociations of function. Typically, if a patient obtains a score in the impaired range on a test of a particular function and is within the normal range on a test of another function, this is regarded as evidence of a dissociation. However, a more stringent test for the presence of a dissociation is to also compare the difference between tests observed for the patient with the distribution of differences in the control sample. For example, a patient’s score on the “impaired” task could lie just below the cut point for defining impairment and the performance on the other test lie just above it.

Crawford et al. [11] method can be used in such studies to test if the difference observed in the patient is significantly different from the differences in the controls. Their method is also useful in the converse situation where a patient’s scores are within the impaired range on both tasks. When this pattern is observed, the researcher can still test whether the magnitude of the difference between the two tasks is abnormal; i.e. evidence can be sought for the presence of a differential deficit on the test of one of the functions.

The above methods are designed to yield point estimates of the rarity or abnormality of either an individual’s single test score, or the difference between an individual’s scores on two tests. In the present paper, we extend this work by providing methods for obtaining confidence limits on the abnormality of test scores and test score differences. This is in keeping with the contemporary emphasis in statistics, psychometrics, and biometrics on the use of confidence limits [14], [18], [44]. Gardner and Altman [18] for example, in discussing the general issue of the error associated with sample estimates note that, “these quantities will be imprecise estimates of the values in the overall population, but fortunately the imprecision itself can be estimated and incorporated into the findings” (p. 3).

Neuropsychologists are aware that estimates of the rarity/abnormality of a test score or score difference are subject to sampling error and will have an intuitive appreciation that less confidence should be placed in them when N for the normative sample is small. However, the advantage of the procedures to be outlined is that they quantify the degree of confidence that should be placed in these pestimates.

In the following sections, we present the methods for obtaining confidence limits on the abnormality of a single test score and the difference between a pair of test scores. These methods and their applications are illustrated with examples relevant to academics who pursue single case research and to clinical neuropsychologists. We also include a method for obtaining confidence limits on the abnormality of the difference between an individual’s mean score on k-tests and a test score entering into that mean. The existing method of obtaining a point estimate of the abnormality of such a difference [35], [36] treats the normative sample against which the individual is compared as if it were a population. Therefore, we also develop a method for obtaining a point estimate of the abnormality of the difference that treats the normative sample statistics as statistics rather than as parameters. This is achieved by a straightforward extension of Crawford et al. [11] method for obtaining a point estimate of the abnormality of a pair of test scores.

The methods to be described for obtaining confidence limits require non-central t-distributions. As readers may not be familiar with such distributions a brief description is provided before formally presenting the methods. Both the t and non-central t-distributions are derived from a ratio of the distribution of sample means and that of sample variances drawn from a normal population. The sampling distribution of the mean is normal (and symmetrical), while the sampling distribution of the variance is skewed (and follows a χ2 distribution). When the sampling distribution of the mean has a mean of 0 (i.e. when the population distribution has a mean of 0) sample variances are combined equally often with positive and negative sample means. Effectively the asymmetry of the sampling distributions of the variance occurs equally often facing in positive and negative directions and so the resulting central t-distribution is symmetrical.

When the sampling distribution of the mean has a non-zero mean (i.e. when the population distribution itself has a non-zero mean) then the asymmetry of the sampling distribution of the variance is not balanced equally between positive and negative sample means and so the resulting non-central t-distribution is asymmetrical. The extent of its skew depends upon the mean and variance of the population distribution. The upshot for calculating confidence intervals is that one cannot simply shift a t-distribution along an axis in order to find a confidence interval around a mean, one has to find and use non-central t-distributions with specified properties.

Section snippets

Obtaining confidence limits for the abnormality of a test score

Letting P1 denote the percentage of the population that will fall below a given individual’s score (X0), we suppose we require a 100(1−α)% confidence interval for P1. Let (X0X̄) represent the difference between the individual’s score and the mean score of the normative or control sample, let S be the standard deviation in the normative sample, and let N be the size of the normative sample. We assume scores for the control population are normally distributed. If we putc1=X0X̄S,then c1 is an

Confidence limits for the abnormality of a difference between pairs of tests

The methods of the previous section may be used, with slight modification, to obtain lower and upper limits for the percentage of the population that will fall below a given difference score between two tests. In many situations, the means and S.D. of the two tests in the normative or control sample will differ, and the scores need to be converted to a common metric. We will use z scores and we assume that differences in scores are normally distributed in the normative population. We let X0 and

Point estimates of the abnormality of a difference between an individual’s mean score on k-tests and score on a test entering into that mean

Up to this point we have been concerned with point estimates and confidence limits on the abnormality of a single test score, or difference between scores on two tests. However, in neuropsychology there is an emphasis on examining an individual’s relative strengths and weaknesses across a wide range of cognitive domains [13], [24]. This necessitates using a large number of neuropsychological tests; as a result, there is a problem of how to reduce the number of potential comparisons between

Confidence limits on the abnormality of a difference between an individual’s mean score on k-tests and score on a test entering into that mean

Having presented the method for obtaining a point estimate of the abnormality of the difference between an individual’s mean test score and the score on a test entering into that mean, attention can now be turned to obtaining confidence limits on this estimate of abnormality. Let P3 denote the percentage of the population that will fall below the difference score observed for the individual. The formula for the confidence limits for P3 are easily obtained from the results that gave formula (1).

Use of the confidence limits in single case studies and clinical practice

We believe the confidence limits presented in the present paper will be of benefit to both single case researchers and clinicians. Firstly, they serve the useful purpose of reminding us of the fallibility of our normative or control data. As such they are in keeping with the contemporary emphasis on using confidence limits in many areas of statistics and psychometrics.

They will also directly assist neuropsychologists in their attempts to achieve a valid assessment of a patient’s relative

Computer programs for confidence limits on the abnormality of test scores and test score differences

Although all of the calculations described in the present paper could be carried out by hand or calculator it would clearly be more convenient if the methods were automated. In addition, tables for the non-central t-distribution (or a computer package that contains an algorithm for non-central t-distributions) would be needed for the calculations and these may not be readily accessible. Because of these considerations the methods have been implemented in computer programs for PCs. Aside from

Conclusion

The single case approach in neuropsychology has made a significant contribution to our understanding of the architecture of human cognition [3], [4], [16], [21], [25], [33]. However, as Caramazza and McCloskey [3] notes, if advances in theory are to be sustainable they “… must be based on unimpeachable methodological foundations” (p. 619). The statistical analysis of single case data is an aspect of methodology that has been relatively neglected. This is to be regretted. Other methodological

Acknowledgements

We are grateful to Dr. Sytse Knypstra of the Department of Econometrics, University of Groningen, The Netherlands, for providing an algorithm that finds the non-centrality parameter of a non-central t-distribution given a quantile, its associated probability, and the d.f. This algorithm is incorporated into the computer programs that implement the methods presented in this paper. We are also grateful to Professor David C. Howell (Department of Psychology, University of Vermont) for early

References (44)

  • L. Atkinson

    Some tables for statistically based interpretation of WAIS-R factor scores

    Psychological Assessment

    (1991)
  • A.J. Calder et al.

    Facial emotion recognition after bilateral amygdala damage: differentially severe impairment of fear

    Cognitive Neuropsyhology

    (1996)
  • A. Caramazza et al.

    The case for single-patient studies

    Cognitive Neuropsychology

    (1988)
  • Code C, Wallesch C, Joanette Y, Lecours AR, editors. Classic cases in neuropsychology. Hove, UK: Psychology Press,...
  • Crawford JR. Estimation of premorbid intelligence: a review of recent developments. In: Crawford JR, Parker DM,...
  • Crawford JR. Current and premorbid intelligence measures in neuropsychological assessment. In: Crawford JR, Parker DM,...
  • Crawford JR. Assessment. In: Beaumont JG, Kenealy PM, Rogers MJ, editors. The Blackwell dictionary of neuropsychology....
  • J.R. Crawford et al.

    WAIS-R subtest scatter: base rate data from a healthy UK sample

    British Journal of Clinical Psychology

    (1996)
  • J.R. Crawford et al.

    Base rate data on the abnormality of subtest scatter for WAIS-R short-forms

    British Journal of Clinical Psychology

    (1997)
  • J.R. Crawford et al.

    Comparing an individual’s test score against norms derived from small samples

    The Clinical Neuropsychologist

    (1998)
  • J.R. Crawford et al.

    Payne and Jones revisited: Estimating the abnormality of test score differences using a modified paired samples t-test

    Journal of Clinical and Experimental Neuropsychology

    (1998)
  • J.R. Crawford et al.

    Assessing the reliability and abnormality of subtest differences on the Test of Everyday Attention

    British Journal of Clinical Psychology

    (1997)
  • Crawford JR, Venneri A, O’Carroll RE. Neuropsychological assessment of the elderly. In: Bellack AS, Hersen M, editors....
  • Daly F, Hand DJ, Jones MC, Lunn AD, McConway KJ. Elements of statistics. Wokingham, England: Addison-Wesley,...
  • I.J. Deary

    Age-associated memory impairment: a suitable case for treatment

    Ageing and Society

    (1995)
  • Ellis AW, Young AW. Human cognitive neuropsychology: a textbook with readings. Hove, UK: Psychology Press,...
  • Feldt LS, Brennan RL. Reliability. In: Linn RL, editor. Educational measurement. 3rd ed. New York: Macmillan,...
  • Gardner MJ, Altman DG. Statistics with confidence- confidence intervals and statistical guidelines. London: British...
  • F.M. Grossman et al.

    Statistically inferred vs. empirically observed VIQ-PIQ differences in the WAIS-R

    Journal of Clinical Psychology

    (1985)
  • Howell DC. Statistical methods for psychology. 4th ed. Belmont, CA: Duxbury Press,...
  • Humphreys GW, editor. Case studies in the neuropsychology of vision. Hove, UK: Psychology Press,...
  • Kaufman AS. Assessing adolescent and adult intelligence. Boston, MA: Allyn & Bacon,...
  • Cited by (798)

    View all citing articles on Scopus
    View full text