
Practical Guidelines for Interpreting r
How to Interpret a Significant r
OVERVIEW:
Pearson Correlation
Sample Research Questions
* WHAT IS THE QUESTION: Is there a significant linear relationship between two variables?
* “Is processing speed related to intelligence?”
* “Is motivational level related to achievement?”
* “Are years of formal education related to income?”
* “Is the amount of alcohol intake related to reaction time?”
- Variables
are measured on (occasionally ordinal) INTERVAL or RATIO Scales (i.e., continuous
variables)
- Linearity: a measure of the Linear (Not Curvilinear)
relationship (A straight line can be drawn through all the points on a scattergraph)
A curvilinear relationship means the variable STARTS OUT as linear - as one
goes UP the other goes DOWN, then it reaches a peak and reverses: as variable
one goes UP the other goes DOWN. THIS type of curvilinear relationship would
appear as a U shape on a scattergram or a frown.
- Random sampling
- Normality of distribution
NOTE: if these assumptions are NOT met, the SPEARMAN RANK
CORRELATION COEFFICIENT may be more appropriate.
* VARIANCE
EXPLAINED (i.e., as one variable changes in value, the other
changes in direct proportion) is indicated by the Coefficient
of Determination (r squared).
The coefficient of determination is an indicator of effect size: effect size is an estimation of the
magnitude of the relationship between two variables in a population based on
a sample.
In other words, when r squared is multiplied by 100, it indicates the percentage
of the information in Y contained also in X.
FIRST, Find the Coefficient of Determination (r squared). If it is
.16 or less, it is generally considered too low to be meaningful
.20 low
.30 moderate
.50 moderate to strong
>.60 strong
NOTE: When reporting out the results, you must consider
TWO things:
1. Are the results meaningful? (r squared)
2. Are the results significant? (p value)
Pearson Correlation
Statistical Significance
* Statistically significant
r implies that it reflects a true (rather than due to chance) correlation
in the population
*What is statistical significance? Significance levels indicate how likely a
result is due to chance.
In statistics, we use the "p" value to indicate significance.
When SPSS calculates the Pearson "r" it also calculates the significance, or "p" value... "p" is compared to the alpha set by the researcher (the value at which the researcher will reject the null hypothesis that there is no significant relationship between the two variables).
Alpha is often set at .05, meaning that the result has a five percent (.05) chance of not being true, which is the opposite of a 95% chance of being true. To interpret the significance level, subtract the number shown from one. So, a value of ".01" means that there is a 99% (1-.01=.99) chance of it being true a true, significant, correlation.
* Strategies for
getting significant relationships:
- have a logical rationale for choosing variables
- increase the sample size
- use reliable measures
- avoid restriction of range
* What does Significant Pearson r tell you?
- Correlation procedures indicate whether or not a significant relationship exists between two variables, but they do not indicate what kind of relationship it is.
- It can be: Causal, 2 variables do indeed affect each other (or one causes the other), or it can be Coincidence, meaning there is a spurious relationship. Spurious - a case in which statistical relation is caused by a third variable.
* Scatter plots:
- Illustrate the relationship
- The larger the size of the correlation coefficient, the closer the points
are to the line of best fit, and the steeper the slope that best
represents all the dots in the scatter plot.
* Correlation Matrix
- A
table including all possible correlation coefficients between a set of 3 or
more variables*
NOTE: when comparing 3 or more variables, use the Pearson Correlation for exploratory
purposes only. To explore the linear relationships among all the variables,
you may consider using multiple regression. Using individual correlations increases the
probability of making a Type 1 error (rejecting the null hypothesis when it
is true).
* r = coefficient of correlation
* p = level of significance
* r squared = coefficient
of determination
EXAMPLE: Is there a significant relationship between a person's using the web for browsing, and publishing a web page?
| use: web browsing | use: web publishing | ||
|---|---|---|---|
| use: web browsing | r | 1.00 | .46(**) |
| p | . | < .001 | |
| n | 105 | 105 | |
| use: web publishing | r | .46(**) | 1.00 |
| p | < .001 | . | |
| n | 105 | 105 | |
| ** Correlation is significant at the 0.01 level (2-tailed). | |||
The results indicate that there is a significant (p < .001) and positive (r = .46) relationship between a person's using the web for browsing and publishing a web page. However, the relationship is moderate to low (r squared = .21).
* Restriction of range results in reduction in the magnitude of a correlation (e.g., selecting only those who are above a cutting point on test scores). For example, if you are comparing how much money people spend based on age, but only measure people between 18 and 25 as opposed to 18 and 40, you are restricting the age range and are less likely to find a correlation between age and spending (if one exists). Note in the graphic below, if scores had only been collected from 0 - 1.8 on the USEINTST variable, and from .5 to 1.8 on the USE_IN variable, no correlation would have been apparent.
* Outliers: Subjects with extreme values on a variable; can
distort the interpretation of data (e.g., an outlier draws the mean further
in the direction of the outlier; Pearson r may increase or decrease,
depending on the location of the outlier). For example, in the age/spending
example above, you may have one teenager who spends hundreds of dollars, compared
to most peers who spend tens of dollars.
ONE TAILED OR TWO TAILED
A correlation can
be "one-tailed" or "two-tailed".
A one-tailed correlation is one that specifies the direction of the correlation,
while a two-tailed hypothesis is one that does not. For example, if the hypothesis
states: "there is a significant positive correlation between smoking and poor
health", it is one-tailed. The same hypothesis as a two-tailed test would be
stated: "there is a significant correlation between smoking and poor health".
Why
does it matter? If you are projecting the direction, you will look for the 5%
difference (if alpha is .05) in only one tail.

If you are NOT projecting the direction, you will SPLIT the 5% between the two
tails, meaning you can only reject the null if the critical value falls in the
upper or lower 2.5% of the tails.

What is the practical significance of one-tailed versus two-tailed?
The researcher can use a smaller sample to test a one-tailed hypothesis. Using a smaller sample often reduces your costs and is less time consuming for the researcher.