When there is strong relationship between two variables then the correlation coefficient will be?

Home>AP statistics>This page

Correlation coefficients measure the strength of association between two variables. The most common correlation coefficient, called the Pearson product-moment correlation coefficient, measures the strength of the linear association between variables measured on an interval or ratio scale.

Note: Your browser does not support HTML5 video. If you view this web page on a different browser (e.g., a recent version of Edge, Chrome, Firefox, or Opera), you can watch a video treatment of this lesson.

In this tutorial, when we speak simply of a correlation coefficient, we are referring to the Pearson product-moment correlation. Generally, the correlation coefficient of a sample is denoted by r, and the correlation coefficient of a population is denoted by ρ or R.

How to Interpret a Correlation Coefficient

The sign and the absolute value of a correlation coefficient describe the direction and the magnitude of the relationship between two variables.

  • A negative correlation means that if one variable gets bigger, the other variable tends to get smaller.

Keep in mind that the Pearson product-moment correlation coefficient only measures linear relationships. Therefore, a correlation of 0 does not mean zero relationship between two variables; rather, it means zero linear relationship. (It is possible for two variables to have zero linear relationship and a strong curvilinear relationship at the same time.)

Scatterplots and Correlation Coefficients

The scatterplots below show how different patterns of data produce different degrees of correlation.

Maximum positive correlation
(r = 1.0)

Strong positive correlation
(r = 0.80)

Zero correlation
(r = 0)

Maximum negative correlation
(r = -1.0)

Moderate negative correlation
(r = -0.43)

Strong correlation & outlier
(r = 0.71)

Several points are evident from the scatterplots.

Advertisement

How to Calculate a Correlation Coefficient

If you look in different statistics textbooks, you are likely to find different-looking (but equivalent) formulas for computing a correlation coefficient. In this section, we present several formulas that you may encounter.

The most common formula for computing a product-moment correlation coefficient (r) is given below.

Product-moment correlation coefficient. The correlation r between two variables is:

r = Σ (xy) / sqrt [ ( Σ x2 ) * ( Σ y2 ) ]

where Σ is the summation symbol, x = xi - x, xi is the x value for observation i, x is the mean x value, y = yi - y, yi is the y value for observation i, and y is the mean y value.

The formula below uses population means and population standard deviations to compute a population correlation coefficient (ρ) from population data.

Population correlation coefficient. The correlation ρ between two variables is:

ρ = [ 1 / N ] * Σ { [ (Xi - μX) / σx ]
* [ (Yi - μY) / σy ] }

where N is the number of observations in the population, Σ is the summation symbol, Xi is the X value for observation i, μX is the population mean for variable X, Yi is the Y value for observation i, μY is the population mean for variable Y, σx is the population standard deviation of X, and σy is the population standard deviation of Y.

The formula below uses sample means and sample standard deviations to compute a sample correlation coefficient (r) from sample data.

Sample correlation coefficient. The correlation r between two variables is:

r = [ 1 / (n - 1) ] * Σ { [ (xi - x) / sx ]
* [ (yi - y) / sy ] }

where n is the number of observations in the sample, Σ is the summation symbol, xi is the x value for observation i, x is the sample mean of x, yi is the y value for observation i, y is the sample mean of y, sx is the sample standard deviation of x, and sy is the sample standard deviation of y.

The interpretation of the sample correlation coefficient depends on how the sample data are collected. With a large simple random sample, the sample correlation coefficient is an unbiased estimate of the population correlation coefficient.

Each of the latter two formulas can be derived from the first formula. Use the first or second formula when you have data from the entire population. Use the third formula when you only have sample data, but want to estimate the correlation in the population. When in doubt, use the first formula.

Fortunately, you will rarely have to compute a correlation coefficient by hand. Many software packages (e.g., Excel) and most graphing calculators have a correlation function that will do the job for you.

Test Your Understanding

Problem 1

A national consumer magazine reported the following correlations.

  • The correlation between car weight and car reliability is -0.30.
  • The correlation between car weight and annual maintenance cost is 0.20.

Which of the following statements are true?

I. Heavier cars tend to be less reliable.II. Heavier cars tend to cost more to maintain.

III. Car weight is related more strongly to reliability than to maintenance cost.

(A) I only(B) II only(C) III only(D) I and II only

(E) I, II, and III

Solution

The correct answer is (E). The correlation between car weight and reliability is negative. This means that reliability tends to decrease as car weight increases. The correlation between car weight and maintenance cost is positive. This means that maintenance costs tend to increase as car weight increases.

The strength of a relationship between two variables is indicated by the absolute value of the correlation coefficient. The correlation between car weight and reliability has an absolute value of 0.30. The correlation between car weight and maintenance cost has an absolute value of 0.20. Therefore, the relationship between car weight and reliability is stronger than the relationship between car weight and maintenance cost.

If you would like to cite this web page, you can use the following text:

Berman H.B., "Correlation Coefficient", [online] Available at: //stattrek.com/statistics/correlation URL [Accessed Date: 7/7/2022].

The sample correlation coefficient (r) is a measure of the closeness of association of the points in a scatter plot to a linear regression line based on those points, as in the example above for accumulated saving over time. Possible values of the correlation coefficient range from -1 to +1, with -1 indicating a perfectly linear negative, i.e., inverse, correlation (sloping downward) and +1 indicating a perfectly linear positive correlation (sloping upward).

A correlation coefficient close to 0 suggests little, if any, correlation. The scatter plot suggests that measurement of IQ do not change with increasing age, i.e., there is no evidence that IQ is associated with age.

Calculation of the Correlation Coefficient

The equations below show the calculations sed to compute "r". However, you do not need to remember these equations. We will use R to do these calculations for us. Nevertheless, the equations give a sense of how "r" is computed.

where Cov(X,Y) is the covariance, i.e., how far each observed (X,Y) pair is from the mean of X and the mean of Y, simultaneously, and and sx2 and sy2 are the sample variances for X and Y.

. Cov (X,Y) is computed as:

You don't have to memorize or use these equations for hand calculations. Instead, we will use R to calculate correlation coefficients. For example, we could use the following command to compute the correlation coefficient for AGE and TOTCHOL in a subset of the Framingham Heart Study as follows:

> cor(AGE,TOTCHOL)
[1] 0.2917043

Describing Correlation Coefficients

The table below provides some guidelines for how to describe the strength of correlation coefficients, but these are just guidelines for description. Also, keep in mind that even weak correlations can be statistically significant, as you will learn shortly.

Correlation Coefficient (r) Description
(Rough Guideline )
+1.0 Perfect positive + association
+0.8 to 1.0 Very strong + association
+0.6 to 0.8 Strong + association
+0.4 to 0.6 Moderate + association
+0.2 to 0.4 Weak + association
0.0 to +0.2 Very weak + or no association
0.0 to -0.2 Very weak - or no association
-0.2 to – 0.4 Weak - association
-0.4 to -0.6 Moderate - association
-0.6 to -0.8 Strong - association
-0.8 to -1.0 Very strong - association
-1.0 Perfect negative association

 The four images below give an idea of how some correlation coefficients might look on a scatter plot.

The scatter plot below illustrates the relationship between systolic blood pressure and age in a large number of subjects. It suggests a weak (r=0.36), but statistically significant (p<0.0001) positive association between age and systolic blood pressure. There is quite a bit of scatter, but there are many observations, and there is a clear linear trend.

How can a correlation be weak, but still statistically significant? Consider that most outcomes have multiple determinants. For example, body mass index (BMI) is determined by multiple factors ("exposures"), such as age, height, sex, calorie consumption, exercise, genetic factors, etc. So, height is just one determinant and is a contributing factor, but not the only determinant of BMI. As a result, height might be a significant determinant, i.e., it might be significantly associated with BMI but only be a partial factor. If that is the case, even a weak correlation might have be statistically significant if the sample size is sufficiently large. In essence, finding a weak correlation that is statistically significant suggests that that particular exposure has an impact on the outcome variable, but that there are other important determinants as well.

Beware of Non-Linear Relationships

Many relationships between measurement variables are reasonably linear, but others are not For example, the image below indicates that the risk of death is not linearly correlated with body mass index. Instead, this type of relationship is often described as "U-shaped" or "J-shaped," because the value of the Y-variable initially decreases with increases in X, but with further increases in X, the Y-variable increases substantially. The relationship between alcohol consumption and mortality is also "J-shaped."

Source: Calle EE, et al.: N Engl J Med 1999; 341:1097-1105

A simple way to evaluate whether a relationship is reasonably linear is to examine a scatter plot. To illustrate, look at the scatter plot below of height (in inches) and body weight (in pounds) using data from the Weymouth Health Survey in 2004. R was used to create the scatter plot and compute the correlation coefficient.

wey<-na.omit(Weymouth_Adult_Part)
attach(wey)
plot(hgt_inch,weight)
cor(hgt_inch,weight)
[1] 0.5653241

There is quite a lot of scatter, and the large number of data points makes it difficult to fully evaluate the correlation, but the trend is reasonably linear. The correlation coefficient is +0.56.

Beware of Outliers

Note also in the plot above that there are two individuals with apparent heights of 88 and 99 inches. A height of 88 inches (7 feet 3 inches) is plausible, but unlikely, and a height of 99 inches is certainly a coding error. Obvious coding errors should be excluded from the analysis, since they can have an inordinate effect on the results. It's always a good idea to look at the raw data in order to identify any gross mistakes in coding.

After excluding the two outliers, the plot looks like this:


Postingan terbaru

LIHAT SEMUA