One of the most fundamental quantitative methods in research is correlation. Correlation is simple to implement, but is subject to common misunderstandings, whether it is used in communication sciences and disorders or in other fields. This article focuses on seven key issues that are critical in the interpretation of correlation.
Let's start with whether correlation can prove causality. Spending a considerable amount of time in Austria in recent years, I cannot resist invoking the example that is the favorite there. The number of babies born has been diminishing in the province of Burgenland for many years. In addition, storks native to Burgenland have been diminishing for many years. This correlation is systematic, but it can be misinterpreted, if you get the drift intended by the Austrian sense of humor. It is important to note that not just in this case, but in every case, the conclusion listed below applies.
The stork case is an exception designed to show that sometimes you don't even get a hint about causality from correlation. Even so, we use correlation to develop evidence that may hint at causality. One reason correlation can't prove causality is that life is complex, and many factors influence most outcomes. For example, the correlation between socioeconomic status (SES) and vocabulary size in children (Hart & Risley, 1995) does not show that higher SES causes higher vocabulary learning, but it hints at the idea that higher SES is associated with factors that contribute to vocabulary learning (good nutrition, physical health, opportunity for education, higher levels of verbal interaction in the family, and so on). If one evaluates correlational evidence based on a number of factors, there is opportunity to provide converging evidence about possible causes for outcomes such as vocabulary growth.
Conclusion: Correlation indicates a systematic relation, but does not itself indicate causality.
2. Importance of the Correlation Squared (R2)
The magnitude of a correlation indicates something about how strong a relation exists between variables. But there's a little more to think about than just the magnitude. It may be easier to grasp the point in the context of an example. Consider an imaginary study correlating undergraduate grade point average (UGPA, hereafter, just GPA) with success in the practice of speech-language pathology (hereafter, Success). While I am making this study up, its points are based on the real experience of trying to predict success (especially in graduate school) based on GPA and standardized test scores (see Burton & Wang, 2005; Powers, 2001).
Imagine that five experienced supervisory SLPs observe other SLPs in the process of making a diagnosis and conducting treatment. The supervisors judge the performance of each of the 100 observed SLPs on a four-point scale (4=excellent, 3=good, 2=fair, 1=poor) and then Success scores are assigned based on the averages of the scores given to each individual by the five supervisors. We then plot GPAs from the 100 SLPs against the Success scores as in Figure 1 (on page 25).
High GPAs are clearly associated with high Success scores, as indicated in the figure, suggesting (if these imaginary data correspond to reality) that we may be justified in using GPA as a criterion for admitting students to our training programs. But the example also provides a basis for illustrating a key pitfall in interpretation of correlation. The magnitude of this correlation is 0.71 (symbolically, R = 0.71). Since a correlation looks like a proportion (it can range from 1.0 to -1.0), one might be tempted to think that 0.71 means that GPA predicts 71% of Success. But this isn't true. One of the most important things to know about this most common kind of correlation (called "product moment correlation") is that it is the square of the correlation (0.71 times 0.71, R2 = 0.50), not the correlation itself, that tells how much variation in Success is predicted by GPA. So the data suggest that about 50% of variation is predictable based on a correlation of 0.71.
Conclusion: The square of the correlation (R2) should be the primary focus of interpretation.
3. Reduction in Correlation Produces a Rapid Drop in Accounted Variation
Based on the second conclusion, it is important to notice that as correlation values dip, the amount of variation that is predicted by them drops very rapidly, a necessary result of squaring any value between 1 and 0. Not uncommonly, correlations as low as 0.30 or even 0.20 are reported in our literature, even though the squares of these value are 0.09 and 0.04 respectively, meaning that such correlations account for less than 10% and 5% of variation, respectively.
Conclusion: Low correlations account for very little variation.
4. Statistical Significance of a Correlation
We should also maintain a distinction between the amount of variation predicted by a correlation and the statistical reliability (usually called the "significance") of the correlation. The matter is tricky because even a correlation that accounts for very little variation-even 1%-can be statistically significant. In the common parlance, "significance" is usually intended to refer to importance, and this is where the problem arises, because a variable that accounts reliably for a very small proportion of variation may be unimportant. Statistical significance only tells us that a relation is arithmetically reliable. It does not tell us that it is important in any practical sense, and so the notion of statistical significance often confuses the uninitiated (among them, the typical news reporter).
Conclusion: A statistically significant correlation may account for very little variation and consequently may be practically unimportant.
5. Restricted Range of Scores Reduces Magnitude of Correlation
Back to our example: The overall correlation in Figure 1 is highly statistically significant with a probability of chance occurrence less than one in 10,000 (symbolically, p < .0001). So we may be inclined to think it is very important to consider GPA in our admissions decisions. However, the issue again is tricky. Many programs have a cutoff for admission that would exclude any student with a GPA less than some criterion level, commonly 3.0. The correlation for data only on the SLPs who have GPAs of more than 3.0 in our imaginary study are displayed in Figure 2 (at left). The correlation is much lower (only 0.25) than for the whole range of SLPs, and only 6% of variation (R2 = 0.06) is accounted for by GPA above 3.0. Furthermore, the highly statistically significant results of Figure 1 are not present in Figure 2, where the low correlation could be expected to occur by chance more than eight times out of 100 (p = 0.084). Now let's think about this. Would we be enthusiastic about using GPA as a basis for admitting students if it accounted for only 6% of variation in the students we would be willing to admit, and if the correlation had limited reliability? And what if our particular program had a criterion for admission of 3.2 GPA (as some do)?
Take a look at Figure 3 (on page 25), where we see that the correlation with Success based on GPA above 3.2 drops to 0, and the squared correlation is similarly, of course, 0. In such a case there would be no evidence at all from the study to justify using GPA (above 3.2) as a criterion for admission or funding. Now one might be tempted to think that these fictitious data are somehow unrealistic with regard to effects of the range of scores on correlation. But this is not true. Correlation is systematically sensitive to this phenomenon, which is formally called "range variation" or "range restriction." When you reduce the range of scores, you are highly likely to reduce the correlation, just as in this imaginary case.
Conclusion: Correlations regularly fall dramatically if one restricts the range of scores considered.
6. Prediction Within a Particular Range of Scores
Practically speaking, if we restrict the range of students we admit based on GPA, then the whole range of students in Figure 1 is not relevant for discriminating among students for financial aid in our program. Another way of putting this is: The data in Figure 1 suggest that a student with a GPA of 2.0 is not nearly as good a prospect for Success in the field as a student with a 4.0. However, the data in and of themselves provide no reason to believe that the student with a 4.0 is a better prospect than the student with a 3.4. In general, the correlation across the whole range of students simply does not tell us whether there is good predictability within a restricted range.
Conclusion: If you want to know how much prediction a variable gives within a particular range of possible values, you have to test precisely the restricted range you are interested in (and you should expect a lower correlation than for the whole range).
7. Error of Measurement also Reduces Magnitude of Correlation
Would these correlational data suggest we should only use a cutoff GPA criterion, but then ignore variations above that criterion in admission (and funding decisions)? Perhaps, but not necessarily. One reason that the problem remains complex is that measurement unreliability also can reduce correlations obtained in real studies. Even when two variables are highly correlated in reality, any individual study might fail to verify the relation if the measurement tools utilized in the study are unreliable. In our example, remember that five judges observe the SLPs and the average of the scores they assign becomes the measure of Success. This measure itself is obviously sensitive to who the judges are, how they feel on the days they evaluate, how the SLPs feel on those days, who the clients are on those days, and so on. Obviously, the Success scores are subject to random variation in many ways. We can think of that variation as a kind of measurement error or unreliability of measurement. And that kind of variation in measurement of Success in our imaginary study would tend to weaken the correlation with GPA. There is lots more to say about this problem, which has been recognized for a very long time (Spearman, 1904), but I'm going to draw just one last conclusion.
Conclusion: Even when a correlation is low and accounts for little variation in a particular study, it does not necessarily mean that the two variables are unrelated or only weakly related in reality. Any tool for quantification of reality is always sensitive to error of measurement.
Figures 1-3 [PDF]