Is if there is a very strong correlation between two variables then the correlation coefficient must be?

The sample correlation coefficient (r) is a measure of the closeness of association of the points in a scatter plot to a linear regression line based on those points, as in the example above for accumulated saving over time. Possible values of the correlation coefficient range from -1 to +1, with -1 indicating a perfectly linear negative, i.e., inverse, correlation (sloping downward) and +1 indicating a perfectly linear positive correlation (sloping upward).

Table of Contents Show

Calculation of the Correlation Coefficient
Describing Correlation Coefficients
Beware of Non-Linear Relationships
Beware of Outliers
Frequently Asked Questions

A correlation coefficient close to 0 suggests little, if any, correlation. The scatter plot suggests that measurement of IQ do not change with increasing age, i.e., there is no evidence that IQ is associated with age.

Calculation of the Correlation Coefficient

The equations below show the calculations sed to compute "r". However, you do not need to remember these equations. We will use R to do these calculations for us. Nevertheless, the equations give a sense of how "r" is computed.

where Cov(X,Y) is the covariance, i.e., how far each observed (X,Y) pair is from the mean of X and the mean of Y, simultaneously, and and sx2 and sy2 are the sample variances for X and Y.

. Cov (X,Y) is computed as:

You don't have to memorize or use these equations for hand calculations. Instead, we will use R to calculate correlation coefficients. For example, we could use the following command to compute the correlation coefficient for AGE and TOTCHOL in a subset of the Framingham Heart Study as follows:

> cor(AGE,TOTCHOL)
[1] 0.2917043

Describing Correlation Coefficients

The table below provides some guidelines for how to describe the strength of correlation coefficients, but these are just guidelines for description. Also, keep in mind that even weak correlations can be statistically significant, as you will learn shortly.

Correlation Coefficient (r)	Description (Rough Guideline )
+1.0	Perfect positive + association
+0.8 to 1.0	Very strong + association
+0.6 to 0.8	Strong + association
+0.4 to 0.6	Moderate + association
+0.2 to 0.4	Weak + association
0.0 to +0.2	Very weak + or no association
0.0 to -0.2	Very weak - or no association
-0.2 to – 0.4	Weak - association
-0.4 to -0.6	Moderate - association
-0.6 to -0.8	Strong - association
-0.8 to -1.0	Very strong - association
-1.0	Perfect negative association

The four images below give an idea of how some correlation coefficients might look on a scatter plot.

The scatter plot below illustrates the relationship between systolic blood pressure and age in a large number of subjects. It suggests a weak (r=0.36), but statistically significant (p<0.0001) positive association between age and systolic blood pressure. There is quite a bit of scatter, but there are many observations, and there is a clear linear trend.

How can a correlation be weak, but still statistically significant? Consider that most outcomes have multiple determinants. For example, body mass index (BMI) is determined by multiple factors ("exposures"), such as age, height, sex, calorie consumption, exercise, genetic factors, etc. So, height is just one determinant and is a contributing factor, but not the only determinant of BMI. As a result, height might be a significant determinant, i.e., it might be significantly associated with BMI but only be a partial factor. If that is the case, even a weak correlation might have be statistically significant if the sample size is sufficiently large. In essence, finding a weak correlation that is statistically significant suggests that that particular exposure has an impact on the outcome variable, but that there are other important determinants as well.

Beware of Non-Linear Relationships

Many relationships between measurement variables are reasonably linear, but others are not For example, the image below indicates that the risk of death is not linearly correlated with body mass index. Instead, this type of relationship is often described as "U-shaped" or "J-shaped," because the value of the Y-variable initially decreases with increases in X, but with further increases in X, the Y-variable increases substantially. The relationship between alcohol consumption and mortality is also "J-shaped."

Source: Calle EE, et al.: N Engl J Med 1999; 341:1097-1105

A simple way to evaluate whether a relationship is reasonably linear is to examine a scatter plot. To illustrate, look at the scatter plot below of height (in inches) and body weight (in pounds) using data from the Weymouth Health Survey in 2004. R was used to create the scatter plot and compute the correlation coefficient.

wey<-na.omit(Weymouth_Adult_Part)
attach(wey)
plot(hgt_inch,weight)
cor(hgt_inch,weight)
[1] 0.5653241

There is quite a lot of scatter, and the large number of data points makes it difficult to fully evaluate the correlation, but the trend is reasonably linear. The correlation coefficient is +0.56.

Beware of Outliers

Note also in the plot above that there are two individuals with apparent heights of 88 and 99 inches. A height of 88 inches (7 feet 3 inches) is plausible, but unlikely, and a height of 99 inches is certainly a coding error. Obvious coding errors should be excluded from the analysis, since they can have an inordinate effect on the results. It's always a good idea to look at the raw data in order to identify any gross mistakes in coding.

After excluding the two outliers, the plot looks like this:

Get the answer to your homework problem.

Try Numerade free for 7 days

Cairo University

A correlation coefficient, often expressed as r, indicates a measure of the direction and strength of a relationship between two variables. When the r value is closer to +1 or -1, it indicates that there is a stronger linear relationship between the two variables.

Correlational studies are quite common in psychology, particularly because some things are impossible to recreate or research in a lab setting. Instead of performing an experiment, researchers may collect data to look at possible relationships between variables. From the data they collect and its analysis, researchers then make inferences and predictions about the nature of the relationships between variables.

A correlation is a statistical measurement of the relationship between two variables. Remember this handy rule: The closer the correlation is to 0, the weaker it is. The closer it is to +/-1, the stronger it is.

Correlation strength ranges from -1 to +1.

A correlation of +1 indicates a perfect positive correlation, meaning that both variables move in the same direction together.

A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down.

A zero correlation suggests that the correlation statistic does not indicate a relationship between the two variables. This does not mean that there is no relationship at all; it simply means that there is not a linear relationship. A zero correlation is often indicated using the abbreviation r = 0.

Scatter plots (also called scatter charts, scattergrams, and scatter diagrams) are used to plot variables on a chart to observe the associations or relationships between them. The horizontal axis represents one variable, and the vertical axis represents the other.

Scatter Plot diagram.

Each point on the plot is a different measurement. From those measurements, a trend line can be calculated. The correlation coefficient is the slope of that line. When the correlation is weak (r is close to zero), the line is hard to distinguish. When the correlation is strong (r is close to 1), the line will be more apparent.

Correlations can be confusing, and many people equate positive with strong and negative with weak. A relationship between two variables can be negative, but that doesn't mean that the relationship isn't strong.

A weak positive correlation indicates that, although both variables tend to go up in response to one another, the relationship is not very strong. A strong negative correlation, on the other hand, indicates a strong connection between the two variables, but that one goes up whenever the other one goes down.

For example, a correlation of -0.97 is a strong negative correlation, whereas a correlation of 0.10 indicates a weak positive correlation. A correlation of +0.10 is weaker than -0.74, and a correlation of -0.98 is stronger than +0.79.

Correlation does not equal causation. Just because two variables have a relationship does not mean that changes in one variable cause changes in the other. Correlations tell us that there is a relationship between variables, but this does not necessarily mean that one variable causes the other to change.

An oft-cited example is the correlation between ice cream consumption and homicide rates. Studies have found a correlation between increased ice cream sales and spikes in homicides. However, eating ice cream does not cause you to commit murder. Instead, there is a third variable: heat. Both variables increase during summertime.

An illusory correlation is the perception of a relationship between two variables when only a minor relationship—or none at all—actually exists. An illusory correlation does not always mean inferring causation; it can also mean inferring a relationship between two variables when one does not exist.

For example, people sometimes assume that, because two events occurred together at one point in the past, one event must be the cause of the other. These illusory correlations can occur both in scientific investigations and in real-world situations.

Stereotypes are a good example of illusory correlations. Research has shown that people tend to assume that certain groups and traits occur together and frequently overestimate the strength of the association between the two variables.

For example, suppose someone holds the mistaken belief that all people from small towns are extremely kind. When they meet a very kind person, their immediate assumption might be that the person is from a small town, despite the fact that kindness is not related to city population.

Psychology research makes frequent use of correlations, but it's important to understand that correlation is not the same as causation. This is a frequent assumption among those not familiar with statistics and assumes a cause-effect relationship that might not exist.

Frequently Asked Questions

How do you find the correlation coefficient?

You can calculate the correlation coefficient in a few different ways, with the same result. The general formula is rXY=COVXY/(SX SY), which is the covariance between the two variables, divided by the product of their standard deviations:
How do you calculate a correlation coefficient in Excel?

In the cell in which you want the correlation coefficient to appear, enter =CORREL(A2:A7,B2:B7), where A2:A7 and B2:B7 are the variable lists to compare. Press Enter.
How do you find a linear correlation coefficient?

Finding the linear correlation coefficient requires a long, difficult calculation, so most people use a calculator or software such as Excel or a statistics program.
How do you interpret a correlation coefficient?

Correlations range from -1.00 to +1.00. The correlation coefficient (expressed as r ) shows the direction and strength of a relationship between two variables. The closer the r value is to +1 or -1, the stronger the linear relationship between the two variables is.
What is the difference between correlation and causation?

Correlations indicate a relationship between two variables, but one doesn't necessarily cause the other to change.