Correlation and Regression

Here, we introduce methods for comparing two continuous variables. Although the word correlation can be used to loosely describe a relationship, there are useful statistics obtained from a formal correlation analysis. Regression is used to predict values for one variable given values from another, and to test for a relationship between the two variables.

 
We can measure the association between two numeric variables as a correlation.
 

Correlation

We can measure the association between two numeric variables as a correlation. If larger than average values of one variable are associated with smaller than average values of the other variable then this would be a negative correlation. Correlations range between -1 and 1 and we can test the signiicance of a correlation using a t test statistic.

Regression is a technique that allows us to predict what the effect of one numeric variable will be on some other numeric variable.
 

Regression

A regression is used when we want to predict how one numeric variable (response variable) will change in response to changes in some other numeric variable (predictor).

 

Implementation and Interpretation of Regression

Regressions allow us to make predictions about what value we expect to see in our response variable for a given value in our predictor variable. We can also measure how different our observed values were from our predicted (fitted) values (i.e. residuals). This video also explains how we can assess the assumptions of regression.

Additional Resources


Whitlock & Schluter - The Analysis of Biological Data

Chapter 16: pages 507-524, and Chapter 17: 545-576 [Sapling Ch16, Sapling Ch17]

 

What are covariance and correlation?

Intro: Intuition behind covariance and correlation with examples.

 

Regression assumptions explained

Advanced: Explanation of six regression assumptions, including linearity, constant error variance, independent error terms, normal errors, multi-collinearity, exogeneity.

What is regression?

Intermediate: Looking at the foundations of regression.

 

Simple linear regression in R

Advanced: Code tutorial video for regression analysis in R.


Review Questions

 
  1. I plant 40 seedlings and give them various exposure to light (measure in lux). After a month, I measure each seedling to check its growth. I want to know if there is a relationship between the amount of light each plant received, and the amount it has grown. In this example, light measured in lux is the:

    (A) explanatory variable, or

    (B) response variable?

  2. Which of the following assumptions do we make while deriving linear regression parameters? (More than one may be true)

    (A) The true relationship between the response y and predictor x is linear.
    (B) The model errors are independent.
    (C) The errors are normally distributed with a 0 mean and constant standard deviation.
    (D) The predictor x is non-stochastic (constant) and is measured error-free.

  3. What does a negative correlation coefficient indicate?

The Next Steps


Confused?

Let’s move down the tree and review these concepts.

Ready to Move Forward?

Let’s move up the tree to the next topic.