logo1

Pearson Product-Moment Correlation
Pearson r (Correlation Coefficient)

Sample Research Questions

Assumptions

Practical Guidelines for Interpreting r

Statistical Significance

How to Interpret a Significant r

Components of the Output

OVERVIEW:

The PEARSON PRODUCT-MOMENT CORRELATION COEFFICIENT helps the researcher determine whether there is a significant relationship or association between two variables.


      Two variables are related if knowing the value of one variable tells you something about the value of the other variable (e.g., information about the height of a child in elementary school would have implications for the probable age of the child).    

 
NOTE: CORRELATION DOES NOT IMPLY CAUSATION!
For example, the fact that the number of churches and the number of bars in a town are CORRELATED does not mean churches cause bars or bars cause churches! Rather, a third variable, POPULATION, causes both!  


* "r" is a measure of the degree of the linear relationship between two variables, usually labeled X and Y   

* "r" is called the CORRELATION COEFFICIENT, and may take on any value between +1 and -1

* The sign of the correlation coefficient (+, -) defines the DIRECTION of the relationship

* A SIGNIFICANT correlation coefficient (usually where p is .05 or less) indicates that the scores on the two variables TEND TO CHANGE TOGETHER

* A "meaningful" relationship indicates that the scores tend to change each other more closely (e.g., the more "meaningful", the more closely you can predict the change in one by knowing the change in the other)

*   Positive (direct) relationship (r > 0)
High scores on one variable are associated with high scores on the other

*   Negative (inverse) relationship (r < 0)
High scores on one variable are associated with low scores on the other 

* The ABSOLUTE VALUE of r indicates the STRENGTH OF THE RELATIONSHIP. The closer r is to "1", the stronger the relationship. The closer r is to "0", the weaker the relationship.
  

Pearson Correlation
Sample Research Questions

*   WHAT IS THE QUESTION: Is there a significant linear relationship between two variables?

*   “Is processing speed related to intelligence?”
*   “Is motivational level related to achievement?”
*   “Are years of formal education related to income?”
*   “Is the amount of alcohol intake related to reaction time?”

Pearson Correlation Assumptions

-   Variables are measured on (occasionally ordinal) INTERVAL or RATIO Scales (i.e., continuous variables)

-   Linearity: a measure of the Linear (Not Curvilinear) relationship (A straight line can be drawn through all the points on a scattergraph)

A curvilinear relationship means the variable STARTS OUT as linear - as one goes UP the other goes DOWN, then it reaches a peak and reverses: as variable one goes UP the other goes DOWN. THIS type of curvilinear relationship would appear as a U shape on a scattergram or a frown.

-   Random sampling

-   Normality of distribution

NOTE: if these assumptions are NOT met, the SPEARMAN RANK CORRELATION COEFFICIENT may be more appropriate.

Practical Guidelines for Interpreting r

*   VARIANCE EXPLAINED (i.e., as one variable changes in value, the other changes in direct proportion) is indicated by the Coefficient of Determination (r squared).

The coefficient of determination is an indicator of effect size: effect size is an estimation of the magnitude of the relationship between two variables in a population based on a sample.
In other words, when r squared is multiplied by 100, it indicates the percentage of the information in Y contained also in X.

FIRST, Find the Coefficient of Determination (r squared). If it is

.16 or less, it is generally considered too low to be meaningful
.20 low
.30 moderate
.50 moderate to strong
>.60 strong

 
 
NOTE: When reporting out the results, you must consider TWO things:
1. Are the results meaningful? (r squared)
2. Are the results significant? (p value)

Pearson Correlation
Statistical Significance

*   Statistically significant r implies that it reflects a true (rather than due to chance) correlation in the population

*What is statistical significance? Significance levels indicate how likely a result is due to chance.

In statistics, we use the "p" value to indicate significance.

When SPSS calculates the Pearson "r" it also calculates the significance, or "p" value... "p" is compared to the alpha set by the researcher (the value at which the researcher will reject the null hypothesis that there is no significant relationship between the two variables).

Alpha is often set at .05, meaning that the result has a five percent (.05) chance of not being true, which is the opposite of a 95% chance of being true. To interpret the significance level, subtract the number shown from one. So, a value of ".01" means that there is a 99% (1-.01=.99) chance of it being true a true, significant, correlation.

*   Strategies for getting significant relationships:
- have a logical rationale for choosing variables                                                       
- increase the sample size                                                                                    
- use reliable measures
- avoid restriction of range     

Pearson Correlation
How to Interpret a Significant r

*   What does Significant Pearson r tell you?

     -      Correlation procedures indicate whether or not a significant relationship exists between two variables, but they do not indicate what kind of relationship it is.

     -      It can be:  Causal, 2 variables do indeed affect each other (or one causes the other), or it can be Coincidence, meaning there is a spurious relationship. Spurious - a case in which statistical relation is caused by a third variable. 

Components of the Output

*   Scatter plots:    
- Illustrate the relationship
- The larger the size of the correlation coefficient, the closer the points are to the line of best fit, and the steeper the slope that  best represents all the dots in the scatter plot.

scattergram

* Correlation Matrix

- A table including all possible correlation coefficients between a set of 3 or more variables*
NOTE: when comparing 3 or more variables, use the Pearson Correlation for exploratory purposes only. To explore the linear relationships among all the variables, you may consider using multiple regression. Using individual correlations increases the probability of making a Type 1 error (rejecting the null hypothesis when it is true).   

* r = coefficient of correlation
* p = level of significance
* r squared = coefficient of determination 

EXAMPLE: Is there a significant relationship between a person's using the web for browsing, and publishing a web page?

Correlations

use: web browsing use: web publishing
use: web browsing r 1.00 .46(**)
p . < .001
n 105 105
use: web publishing r .46(**) 1.00
p < .001 .
n 105 105
** Correlation is significant at the 0.01 level (2-tailed).

The results indicate that there is a significant (p < .001) and positive (r = .46) relationship between a person's using the web for browsing and publishing a web page. However, the relationship is moderate to low (r squared = .21).

*   Restriction of range results in reduction in the magnitude of a correlation (e.g., selecting only those who are above a cutting point on test scores). For example, if you are comparing how much money people spend based on age, but only measure people between 18 and 25 as opposed to 18 and 40, you are restricting the age range and are less likely to find a correlation between age and spending (if one exists).  Note in the graphic below, if scores had only been collected from 0 - 1.8 on the USEINTST variable, and from .5 to 1.8 on the USE_IN variable, no correlation would have been apparent.

 
          
*   Outliers: Subjects with extreme values on a variable; can distort the interpretation of data (e.g., an outlier draws the mean further in the direction of the outlier; Pearson r may increase or decrease, depending on the location of the outlier). For example, in the age/spending example above, you may have one teenager who spends hundreds of dollars, compared to most peers who spend tens of dollars.
 

ONE TAILED OR TWO TAILED

A correlation can be "one-tailed" or "two-tailed".
A one-tailed correlation is one that specifies the direction of the correlation, while a two-tailed hypothesis is one that does not. For example, if the hypothesis states: "there is a significant positive correlation between smoking and poor health", it is one-tailed. The same hypothesis as a two-tailed test would be stated: "there is a significant correlation between smoking and poor health".

Why does it matter? If you are projecting the direction, you will look for the 5% difference (if alpha is .05) in only one tail.


If you are NOT projecting the direction, you will SPLIT the 5% between the two tails, meaning you can only reject the null if the critical value falls in the upper or lower 2.5% of the tails.


What is the practical significance of one-tailed versus two-tailed?

The researcher can use a smaller sample to test a one-tailed hypothesis. Using a smaller sample often reduces your costs and is less time consuming for the researcher.

 

 

Graphic link to home