A z-test is a statistical test used to determine whether two population means are different when the variances are known and the sample size is large. The test statistic is assumed to have a normal distribution, and nuisance parameters such as standard deviation should be known in order for an accurate z-test to be performed.
The z-test is also a hypothesis test in which the z-statistic follows a normal distribution. The z-test is best used for greater-than-30 samples because, under the central limit theorem, as the number of samples gets larger, the samples are considered to be approximately normally distributed. When conducting a z-test, the null and alternative hypotheses, alpha and z-score should be stated. Next, the test statistic should be calculated, and the results and conclusion stated. A z-statistic, or z-score, is a number representing how many standard deviations above or below the mean population a score derived from a z-test is. Examples of tests that can be conducted as z-tests include a one-sample location test, a two-sample location test, a paired difference test, and a maximum likelihood estimate. Z-tests are closely related to t-tests, but t-tests are best performed when an experiment has a small sample size. Also, t-tests assume the standard deviation is unknown, while z-tests assume it is known. If the standard deviation of the population is unknown, the assumption of the sample variance equaling the population variance is made. Assume an investor wishes to test whether the average daily return of a stock is greater than 3%. A simple random sample of 50 returns is calculated and has an average of 2%. Assume the standard deviation of the returns is 2.5%. Therefore, the null hypothesis is when the average, or mean, is equal to 3%. Conversely, the alternative hypothesis is whether the mean return is greater or less than 3%. Assume an alpha of 0.05% is selected with a two-tailed test. Consequently, there is 0.025% of the samples in each tail, and the alpha has a critical value of 1.96 or -1.96. If the value of z is greater than 1.96 or less than -1.96, the null hypothesis is rejected. The value for z is calculated by subtracting the value of the average daily return selected for the test, or 1% in this case, from the observed average of the samples. Next, divide the resulting value by the standard deviation divided by the square root of the number of observed values. Therefore, the test statistic is: (0.02 - 0.01) ÷ (0.025 ÷ √ 50) = 2.83
The investor rejects the null hypothesis since z is greater than 1.96 and concludes that the average daily return is greater than 1%.
Z-tests are closely related to t-tests, but t-tests are best performed when the data consists of a small sample size, i.e., less than 30. Also, t-tests assume the standard deviation is unknown, while z-tests assume it is known.
If the standard deviation of the population is unknown and the sample size is greater than or equal to 30, then the assumption of the sample variance equaling the population variance should be made using the z-test. Regardless of the sample size, if the population standard deviation for a variable remains unknown, a t-test should be used instead.
A z-score, or z-statistic, is a number representing how many standard deviations above or below the mean population the score derived from a z-test is. Essentially, it is a numerical measurement that describes a value's relationship to the mean of a group of values. If a z-score is 0, it indicates that the data point's score is identical to the mean score. A z-score of 1.0 would indicate a value that is one standard deviation from the mean. Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean.
In the study of probability theory, the central limit theorem (CLT) states that the distribution of sample approximates a normal distribution (also known as a “bell curve”) as the sample size becomes larger, assuming that all samples are identical in size, and regardless of the population distribution shape. Sample sizes equal to or greater than 30 are considered sufficient for the CLT to predict the characteristics of a population accurately. The z-test's fidelity relies on the CLT holding. A z-test is used in hypothesis testing to evaluate whether a finding or association is statistically significant or not. In particular, it tests whether two means are the same (the null hypothesis). A z-test can only be used if the population standard deviation is known and the sample size is 30 data points or larger. Otherwise, a t-test should be employed.
$z$-tests are a statistical way of testing a hypothesis when either:
If we have a sample size of less than $30$ and do not know the population variance then we must use a t-test These are some further conditions for using this type of test:
One sample z-testsWe use a one sample $z$-test when we wish to compare the sample mean $\mu$ to the population mean $\mu_0$. Illustrative ExampleA headmistress wants to compare the GCSE English results of her pupils against the national data to determine if there is a difference. The national data is normally distributed with known variance. A large number of pupils in her school have taken the exam and in order to save time she decides to take a random sample of her pupils' results. She calculates the sample mean and then uses a $z$-test to asses whether there is a significant difference between the sample mean and the national mean at the 5% significance level. In this case, the null hypothesis would be that there is no significant difference, and the $z$-test is used to see whether or not this is the case. The $z$-test statistic is calculated using the following formula:
\begin{equation} z = \dfrac{\bar{x} - \mu_0}{\sqrt{\dfrac{\sigma^2}{n}}} \end{equation} The Method:
For example, the average GCSE English results of the schoolchildren is higher than the national average (a one-tailed hypothesis), or the average GCSE English results of the schoolchildren is different from the national average (a two-tailed hypothesis).
See the page of worked examples. Two sample z-testsOften, we need to compare the means from two samples and we use the $z$-statistic for when we know the population variances ($\sigma^2$) (see two sample t-tests for unknown variances). There are two types of two sample $z$-test:
The main difference between these two tests is that the $z$-statistic is calculated differently. For the independent/unrelated $z$-test, the test statistic is:
\begin{equation} z = \dfrac{\bar{x_1} - \bar{x_2}}{\sqrt{\dfrac{\sigma_1^2}{n_1} +\dfrac{\sigma_2^2}{n_2}}} \end{equation} where $\bar{x_1} \text{and } \bar{x_2}$ are the sample means, $n_1 \text{and } n_2$ are the samples sizes and $\sigma_1^2 \text{and } \sigma_2^2$ are the population variances. For paired/related $z$-tests the $z$-statistic is:
\begin{equation} z= \dfrac{\bar{d}- D}{\sqrt{\dfrac{\sigma_d^2}{n}}} \end{equation} where $\bar{d} $is the mean of the differences between the samples, $D$ is the hypothesised mean of the differences (usually this is zero), $n$ is the sample size and $\sigma_d^2$ is the population variance of the differences.
z-TableThis is a $z$-table with an explanation of each section of the table and a guide for using it: Worked Example 1
You work in the HR department at a large franchise and you are currently working in the expenses department. You want to test whether you have set your employee monthly allowances correctly. In the past it was believed that the average claim was $£500$ with a standard deviation of $150$, however you believe this may have increased due to inflation. You want to test if the monthly allowances should be increased. A random sample is taken of $40$ employees and gives a mean monthly claim of $£640$. Perform a hypothesis test of whether you should increase your employees' monthly allowances.
This is a one sample $z$-test because you know the population standard deviation ($\sigma = 150$). It is also a One-tailed example because we are testing for an increase. The hypotheses are: $H_0: \mu = 500$ $H_1: \mu > 500$Calculating the $z$-statistic using the formula above gives: $\dfrac{640 - 500}{~\sqrt{~\dfrac{150^2}{40}~}~} = 5.903$ (to 3 d.p)
Next, we look up this value in the $z$-table (see above). Since this is a one tailed example, we compare the values at the $z_{1- \alpha}$ level. At the $1\%$ level, you can see that $ 5.903 > 2.33$, so this is a significant result. This means we have sufficient evidence to reject the null hypothesis that there has been no change in the average employee expenses claim. Thus, you should increase the monthly allowances. t-tests - Population variance unknownOne Sample t-testsThis is where you are only testing one sample, for example the number of owls in an area over the past 10 years. Usually you would compare your data with a known value, typically a mean that has been derived from previous research. You want to test the null hypothesis, i.e. is the mean of the sample the same as the known mean? A one-sample t-test is used to compare a sample mean $\bar{x}-$ (calculated using the data) to a known ‘’population’’ mean $\mu$ (typically obtained in previous research). We want to test the null hypothesis that the population mean is equal to the sample mean. For example, we might want to test whether the proportion of red squirrels to grey squirrels in Newcastle is different from the known UK average. The MethodAs you progress through your university career you will be introduced to statistical packages such as R and Minitab that can perform these tests for you and present the final significance level. However, you may also be introduced to how to conduct and interpret hypothesis test without using such software (this is good to demonstrate a thorough knowledge of what is really happening with the data). This is done as follows:
\begin{equation} t = \dfrac{\bar{x}-\mu}{\sqrt{\dfrac{s^2}{n}}} \end{equation} Where $\bar{x}-$ is the sample mean, $\mu$ is the population mean, $n$ is our sample size and $s$ is the sample standard deviation.
Worked Example 2
The number of owls in an area has been recorded for the past 50 years and the average number for these past 50 years (in a previous experiment) was found to be $106$. Over the last 9 years the counts have been recorded in the table below. Has there been a change in the number of owls in the area?
For the last 9 years, the mean number of owls has been $120.6$ (1 d.p.) with a standard deviation of $18.6$. The null hypothesis is that the number of owls has remained the same for the past 50 years. The hypothesis is that the number of owls has changed (from $106$). (Note: This is a two tailed t-test because we are just testing for a change, with no specific direction.) I.e.
Using Minitab and R we find the t statistic is $2.342$ (3 d.p.). $n = 9$, so we will compare our t statistic to a t table on $9 - 1 = 8$ degrees of freedom.
Looking at the table, we can see that the critical t-value at the $95$% level $(P=0.05)$ is $2.306$. We see that $2.342$ is greater than $2.306$. Therefore, our t-statistic is statistically significant at the $95$% level and its corresponding P-value will be less than $0.05$. We have evidence for the hypothesis and thus can conclude that the number of owls has changed. Here are video tutorials on using R Studio and Minitab (ver. 16) for this example:
Two Sample t-testsA two sample t-test compares two samples of normally distributed data where the population variance is unknown and the sample sizes are small ($n \lt 30$). We shall look at two types of two sample t-tests:
The main difference between these two tests is that the t statistic is calculated differently (using differences for Paired), however Minitab and R calculate this for you, once you specify which type of two sample t-test you would like to perform. See the page of worked examples. We use F-tests (usually in Minitab or R) to check our two samples have equal variances. See F-test for more information. Minitab and R also can be used to test for normality. The t-tableHere is an example of a t-table with explanations of what each bit means. (This is for a two tailed or paired t-test, for a one-tailed t-test the probabilities are halved-see worked example below). Worked Example 3
This example is very similar to examples in the lecture notes in the first year animal behaviour module (ACE1027). This is a paired t-test because there is one group being tested twice, rather than two independent groups. A class were conducting an experiment to assess interobserver reliability. They observed stabled horses performing stereotypes (repetitive behaviours indicative of poor welfare: weaving, wind sucking and crib biting). They watched the footage 3 times. They assessed whether their observation and recording skills had improved each time they watched. They used data from a group of animal behaviourist students and performed intra-observation agreement calculations between watching the first and second time, then between the second and third times. The groups results are displayed in the table below. The results were $44$% agreement between the first two observations and $61$% on the second. This seemed like a difference. They conducted a t-test and found the t statistic to be $1.731$ with $P = 0.134$. Is this a significant result?
Since the t statistic $1.731$ is less than the t value of $2.447$ on $6$ degrees of freedom at $95$% level $(P = 0.05)$ (circled in the table above), we conclude that this is not statistically significant. There is no change. The P value is $0.134$, which is approaching a trend, suggesting more experiments are needed to be more conclusive. Worked Example 4
A behaviourist is interested in time taken to complete a maze for two different strains of laboratory rat. The trial involves 20 animals, 10 rats were from a strain selected according to the performance of their parents and 10 rats were from an unselected line. The time in seconds to complete the maze is recorded in the table below. Is there a difference between the average times to complete the maze for the two strains?
The mean for the selected strain is $42.6$ and the standard deviation is $10.8$. The mean for the unselected strain is $48.4$ and the standard deviation is $15.0$.
Using Minitab we find the t-statistic is $0.99$. (R calculates it as $0.992$.) We compare this to the t-value on $n_1 + n_2 - 2 = 18$ degrees of freedom.
Looking at the table, we can see that the critical t-value at the $95$% level $(P=0.05)$ is $2.101$. We see that $0.992$ is less than $2.101$ so our t-statistic is not significant at the $95$% level. Minitab calculates $P = 0.336$, which means there is no evidence to accept the hypothesis. There is no difference in average time to complete the maze between the two strains. Here are video tutorials for Minitab (ver. 16) and R Studio for this example:
Test YourselfTry our Numbas test on hypothesis testing: Practising confidence intervals and hypothesis tests t or z-test??Often it can be difficult to decide whether to use a $z$-test or a t-test as they are both very similar. Here are some tips to help you decide:
The following diagram can be used to help you decide which test is appropriate too. (For more information about each test click the boxes.)
Test YourselfTry our Numbas test on hypothesis testing: Hypothesis testing and confidence intervals and also two-sample tests. See AlsoFor more information about the topics covered here see the introduction to hypothesis testing. |