What type of MANOVA would be used with more than one dependent variable and one independent variable?

What type of MANOVA would be used with more than one dependent variable and one independent variable?
MANOVA . Photo by Nick Fewings on Unsplash.

ANOVA, short for Analysis of Variance, and also called AOV, is a statistical method mainly used for hypothesis testing. The most common use case for ANOVA is when you do an experiment in which your outcome variable is numeric, and your explanatory variable is a categorical variable with three or more classes.

ANOVA is used for statistical hypothesis testing. If you are not yet familiar with hypothesis testing, I strongly recommend reading this article first.

An example study using ANOVA

An example is a trial for a new agricultural crop growth product in which you measure the performance of two new treatments and a control group. You measure a numerical outcome (for example kilograms of harvest) in the three groups (treatment 1, treatment 2, and control).

For statistical validity, you will need to apply each of the treatments multiple times. Imagine that you cut your agricultural land in 15 subplots, and you did 5 times treatment 1, 5 times treatment 2, and 5 subplots with the nothing (control).

You then compute the average kilograms of harvest for each treatment, and you observe that there are differences in the averages. However, you need to define whether the differences are large enough to state that the outcomes were significantly different and that the differences were not just due to some random variations.

This is what ANOVA is made for. Of course, there are many studies in many domains that follow the exact same setup (three or more independent groups and one continuous outcome variable).

You can check out this article for a more detailed read on ANOVA and the advanced options.

MANOVA is a multivariate version of the ANOVA model. Multivariate here indicates the fact that there are multiple dependent variables instead of just one.

The goal of a MANOVA analysis is still to detect whether there is a treatment effect vs the other groups. However, this effect is now measured across multiple continuous variables rather than just one.

One MANOVA vs multiple ANOVAs

You could do a separate ANOVA for each of the dependent variables and get a result that is not extremely different from the MANOVA approach.

However, it is very possible that MANOVA finds a significant effect of treatment whereas this effect would not have been found when running individual ANOVAs for each of the individual dependent variables.

MANOVA: a part of Multivariate Statistics

Now, rather than seeing MANOVA as a Multivariate alternative of ANOVA, one could also describe MANOVA as a tool in the domain of Multivariate Statistics.

Other methods in the family Multivariate Statistics include Structural Equation Modeling, Multidimensional Scaling, Principal Coordinates Analysis, Canonical Correlation Analysis, or Factor Analysis. A central item in those methods is that they are all used to make sense out of many variables and try to summarize this into one or a few learnings.

This is very different from hypothesis testing (often used in experimental studies), which is a domain that focuses on finding an absolute answer (a truth based on significance) for a very precise hypothesis.

Both are true for MANOVA, but it is important to notice that the domain of “regular” hypothesis testing with one dependent variable generally has relatively different applications than the domain of multivariate statistics. It is important to think about the goal of your study when choosing the method.

An example use case for MANOVA

Let’s start working on a MANOVA example. In this case, let’s do a study in which the goal is to prove that different plant growth products lead to significantly different plant growth.

Therefore, we will have three treatments:

  • Treatment 1 (Control, No Product)
  • Treatment 2 (product 1)
  • Treatment 3 (product 2)

We will use three measurements for defining plant growth:

  • height of the plant
  • width of the plant
  • weight of the plant

Having three outcome variables is relatively few compared to the things one may encounter in multivariate statistics. However, it will be well suited to follow along with this MANOVA example.

MANOVA in R

Let’s start by doing a MANOVA analysis in R.

Getting the MANOVA data in R

I have uploaded the data in an S3 bucket. You can use the following code in R to obtain the data:

The data looks as follows:

What type of MANOVA would be used with more than one dependent variable and one independent variable?
The MANOVA data. Picture by author.

Univariate description of the data

To get a quick insight into the effect of treatment on the three dependent variables, you can use the following code to create box plots:

Create boxplots of the MANOVA plant growth data.

You will obtain the following plots:

What type of MANOVA would be used with more than one dependent variable and one independent variable?
Boxplots of the MANOVA plant growth data. Image by author.

What you can see in this plot is that the plants receiving treatment 1 have the lowest heights, widths, and weights. Plants receiving treatment 3 have the highest of everything. There is some overlap at some places, but we could reasonably expect that treatment 3 is the best overall product for plant growth.

Multivariate description of the data

As we are doing a multivariate analysis, it is important to look at the relationships between the dependent variables as well.

Let’s start by looking at the correlations between the dependent variables using the following code

Compute correlations between the dependent variables

You will obtain the following result:

What type of MANOVA would be used with more than one dependent variable and one independent variable?
Correlations between dependent variables

There are strong correlations between each of the three variables. The relation between Height and Weight is the strongest.

Including treatment in the multivariate analysis

By making scatter plots, you can see the individual data points represented. If you then add the treatment in there as a shape, you can see the correlations and the treatments altogether in a plot.

You can use the following code to do so:

Create MANOVA scatter plots

You will obtain the following plots:

What type of MANOVA would be used with more than one dependent variable and one independent variable?
MANOVA scatter plots

Fitting the MANOVA in R

Now rather than looking at plots, we want to have an objective answer to find out whether the treatment is significantly improving plant growth.

Fitting the MANOVA in R

You will obtain the following result:

What type of MANOVA would be used with more than one dependent variable and one independent variable?
MANOVA results in R

Understanding the output of the MANOVA in R

If you are not familiar with hypothesis testing, I recommend reading this article first.

The first things to look at in hypothesis testing outputs are generally the test statistic and the p-value.

The test statistic in MANOVA is the Pillai’s Trace: a value between 0 and 1. The p-value, as always, needs to be interpreted for concluding on significance. A p-value below 0.05 indicates that there is a significant effect of treatment on outcome.

In the current case, we can conclude that treatment has a significant effect on plant growth.

MANOVA in Python

Let’s now see how to do the same analysis in Python using the same steps.

Getting the MANOVA data in Python

You can import the same data set in Python as follows:

Importing the MANOVA data in Python

It will look as follows:

What type of MANOVA would be used with more than one dependent variable and one independent variable?
The MANOVA data in Python

Fitting the MANOVA in Python

You can fit a MANOVA in Python using statsmodels. You can use the following code to do so:

Fitting a MANOVA in Python

You will get the following output:

What type of MANOVA would be used with more than one dependent variable and one independent variable?
MANOVA output in Python

Understanding the output of the MANOVA in Python

Now in Python, the output shows the analysis using different test statistics. The second one, Pillai’s trace, is the one that we saw in the R output as well. Pillai’s trace is know to be relatively conservative: it gives a significant result less easily (the differences have to be bigger to obtain significant output).

The Wilks’ Lambda is another often-used test statistic. Hotelling-Lawley trace and Roy’s greatest root are also alternative options. There is no absolute consensus in the statistical literature as to which test statistic should be preferred.

The p-values are shown in the right column and are all inferior to 0.05, which confirms that treatment has an impact on plant growth.

Assumptions of MANOVA

As in all statistical models, there are a few assumptions to take into account. In MANOVA, the assumptions are:

  • Independent and identically distributed random variables
  • Dependent variables follow a multivariate normal distribution within each group
  • Equal population covariance matrices between each group (the multivariate alternative to homogeneity of variances in ANOVA). If this assumption is met, it is generally advice to use Pillai’s trace, whereas you should default to Wilk’s lambda otherwise.

If you want to rely on your MANOVA conclusions, you’ll need to make sure that these assumptions are met.

Conclusion

In this article, you have learned what ANOVA is, when you should use it, and how to apply it in R and Python on an example use case about crop growth. I hope that this article was useful for you! Thanks for reading and don’t hesitate to stay tuned for more stats, math, and data science content.