Parametric hypothesis tests with examples in R

How To
Parametric Tests
T-test
Z-test
F-test
ANOVA
A tutorial on parametric hypothesis tests with examples in R.
Author

Rohit Farmer

Published

November 10, 2022

2022-12-22 Section for ANOVA.
2022-11-11 Added example code for one-tailed t-test; mentioned Welch’s t-test in a note callout.
2022-11-10 First draft with live code on Kaggle.

Introduction

A hypothesis test is a statistical test used to determine whether there is enough evidence to support a hypothesis. For example, there is a difference between the average height of males and females. A parametric hypothesis test assumes that the data you want to test is (or approximately is) normally distributed. For example, height is normally distributed in a population. In addition, your data needs to be symmetrical since normally distributed data is symmetrical. If your data does not have the appropriate properties, then you use a non-parametric test.

There are different parametric tests, but the most common is the t-test. Below is the list of the five parametric hypothesis tests we will explore here.

Tip

If you are primarily interested in code examples, please follow the navigation on the right; alternatively, you can test live code on Kaggle.

Besides deciding which hypothesis test to use to answer the question at hand, we also need to decide a couple of other parameters, for example, whether the test would be one sample or two samples, paired or un-paired, and one or two-tailed. Below is a brief description of each parameter.

Paired vs. unpaired tests

Paired tests are used to compare two related data groups, such as before and after measurements. Unpaired tests are used to compare two unrelated data groups, such as men and women.

One sample vs. two sample tests

There are key differences between one-sample and two-sample hypothesis testing. Firstly, in one sample hypothesis testing, we are testing the mean of a single sample against a known population mean. In contrast, two-sample hypothesis testing involves comparing the means of two independent samples. Secondly, one-sample hypothesis testing only requires a single sample size, while two-sample hypothesis testing requires two. Finally, one-sample hypothesis testing is typically used when the population standard deviation is known, while two-sample hypothesis testing is used when the population standard deviation is unknown.

One vs. two-tailed tests

Note

All the example codes in this tutorial are for two-tailed tests except one in the T-test Section 3.

In a hypothesis test, the null and alternate hypotheses are stated in terms of population parameters. These hypotheses are:

  • Null hypothesis (\(H_0\)): The value of the population parameter is equal to the hypothesized value.
  • The alternate hypothesis (\(H_1\)): The value of the population parameter is not equal to the hypothesized value.

The hypothesis test is based on a sample from the population. This sample is used to test the hypotheses by deriving a test statistic. Finally, the value of the test statistic is compared to a critical value. The critical value depends on the alpha level, which is the likelihood of rejecting the null hypothesis when it is true.

The null and alternate hypotheses determine the direction of the test. There are two types of hypothesis tests: one-tailed and two-tailed tests.

One-tailed Test

A one-tailed test is conducted when the null and alternate hypotheses are stated in terms of “greater than” or “less than”.

For example, let’s say that a company wants to test a new advertising campaign. The null hypothesis (\(H_0\)) is that the new campaign will have no effect on sales. The alternate hypothesis (\(H_1\)) is that the new campaign will increase sales.

The null hypothesis is stated as:

\(H_0\) : The population mean is less than or equal to 10%.

The alternate hypothesis is stated as:

\(H_1\) : The population mean is greater than 10%.

The test is conducted by taking a sample of data and calculating the mean. Then, the mean is compared to the critical value. The null hypothesis is rejected if the mean is greater than the critical value.

Two-tailed Test

A two-tailed test is conducted when the null and alternate hypotheses are stated in terms of “not equal to”.

Taking our advertising campaign example. The null hypothesis (\(H_0\)) is that the new campaign will have no effect on sales. The alternate hypothesis (\(H_1\)) is that the new campaign will increase or decrease sales.

The null hypothesis is stated as:

\(H_0\) : The population mean is equal to 10%.

The alternate hypothesis is stated as:

\(H_1\) : The population mean is not equal to 10%.

The test is conducted by taking a sample of data and calculating the mean. Then, the mean is compared to the critical value. The null hypothesis is rejected if the mean is not equal to the critical value.

The method

No matter which hypothesis testing method is selected following are the steps that are executed:

  1. Firstly, identify the null hypothesis \(H_0:\mu = \mu_0\)
  2. Then identify the alternative hypothesis \(H_1\) and decide if it is of the form \(H_1 : \mu \neq \mu_0\) (a two-tailed test) or if there is a specific direction for how the mean changes \(H_1 : \mu > \mu_0\) or \(H_1 : \mu < \mu_0\), (a one-tailed test).
  3. Next, calculate the test statistic. Compare the test statistic to the critical values and obtain a range for the p-value which is the probability that the difference between the two groups is due to chance. The test is usually used with a significance level of 0.05, which means that there is a 5% chance that the difference between the two groups is due to chance. However, per recommendations from the American Statistical Association we need to be careful when we state statements like statistically significant.
  4. Form conclusions. If your test statistic is greater than the critical values in the table, it is significant. You can reject the null hypothesis at that level, otherwise you accept it.

Dataset

For our example exercises, we will use the dataset from Open Case Studies “exploring global patterns of obesity across rural and urban regions” (Wright et al. 2020) . Body mass index (BMI) is often used as a proxy for adiposity (the condition of having excess body fat) and is measured as an individual’s weight in kilograms (\(kg\)) or pounds (\(lbs\)) divided by the individual’s height in meters (\(m^2\)) squared. Recently, an article published in Nature evaluated and compared the BMI of populations in rural and urban communities around the world (“Rising Rural Body-Mass Index Is the Main Driver of the Global Obesity Epidemic in Adults” 2019). The article challenged the widely-held view that increased urbanization was one of the major reasons for increased global obesity rates. This view came about because many countries around the world have shown increased urbanization levels in parallel with increased obesity rates. In this article, however, the NCD-RisC argued that this might not be the case and that in fact for most regions around the world, BMI measurements are increasing in rural populations just as much, if not more so, than urban populations. Furthermore, this study suggested that obesity has particularly increased in female populations in rural regions:

“We noted a persistently higher rural BMI, especially for women.”

We will fetch the cleaned version of the data from the the Open Case Studies GitHub repository as data wrangling is out of the scope of this tutorial.

suppressMessages(library(tidyverse))
suppressMessages(library(DT))
suppressMessages(library(kableExtra))
dat <- readr::read_csv("https://raw.githubusercontent.com/opencasestudies/ocs-bp-rural-and-urban-obesity/master/data/wrangled/BMI_long.csv", show_col_types = FALSE)

DT::datatable(dat)

Z-test

A z-test is used to compare two population means when the sample size is large and the population variance is known.

These are some conditions for using this type of test:

  • The data must be normally distributed.
  • All data points must be independent.
  • For each sample the variances must be equal.

Equation 1 shows the formula for a z-test.

\[ z = \dfrac{\bar{x} - \mu_0}{\sqrt{\dfrac{\sigma^2}{n}}} \tag{1}\]

Where \(\bar{x}\) is the sample mean, \(\mu_0\) is the population mean, \(\sigma^2\) is the population variance, and \({n}\) is the number of samples.

Two sample unpaired z-test

\[ \begin{equation} z = \dfrac{\bar{x_1} - \bar{x_2}}{\sqrt{\dfrac{\sigma_1^2}{n_1} +\dfrac{\sigma_2^2}{n_2}}} \end{equation} \tag{2}\]

Where \(\bar{x_1}\) and \(\bar{x_2}\) are the sample means, \(\sigma_1^2\) and \(\sigma_2^2\) are the population variances, and \({n_1}\) and \({n_2}\) are the number of samples.

Example code in R

suppressMessages(library(PASWR2))

# Calculate population standard deviations
sig_x <- dplyr::filter(dat, Sex == "Women", Year == 1985) %>%
  dplyr::pull(BMI) %>% na.omit() %>% sd()
sig_y <- dplyr::filter(dat, Sex == "Women", Year == 2017) %>%
  dplyr::pull(BMI) %>% na.omit() %>% sd()

# Fetch a random sample of BMI data for women in the year 1985 and 2017
x1 <- dplyr::filter(dat, Sex == "Women", Year == 1985) %>%
  dplyr::pull(BMI) %>% na.omit() %>%
  sample(.,300)
x2 <- dplyr::filter(dat, Sex == "Women", Year == 2017) %>%
  dplyr::pull(BMI) %>% na.omit() %>%
  sample(.,300)

# Perform a two sample (unpaired) t-test between x1 and x2
(z_res <- PASWR2::z.test(x1, x2, mu = 0, sigma.x = sig_x, sigma.y = sig_y))

    Two Sample z-test

data:  x1 and x2
z = -11.071, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.885719 -2.017614
sample estimates:
mean of x mean of y 
 24.08067  26.53233 
# Fetch z-test result metrics and present them in a tidy table
broom::tidy(z_res) %>%
  kbl() %>%
  kable_paper("hover", full_width = F)
estimate1 estimate2 statistic p.value conf.low conf.high method alternative
24.08067 26.53233 -11.0705 0 -2.885719 -2.017614 Two Sample z-test two.sided

Two sample paired z-test

\[ \begin{equation} z= \dfrac{\bar{d}- D}{\sqrt{\dfrac{\sigma_d^2}{n}}} \end{equation} \tag{3}\]

Where \(\bar{d}\) is the mean of the differences between the samples, \(D\) is the hypothesised mean of the differences (usually this is zero), \(n\) is the sample size and \(\sigma_d^2\) is the population variance of the differences.

Example code in R

# Fetch a first 300 samples of BMI data for women in the year 1985 and 2017
x1 <- dplyr::filter(dat, Sex == "Women", Year == 1985) %>%
  dplyr::pull(BMI) %>% na.omit() 
x1 <- x1[1:300]
x2 <- dplyr::filter(dat, Sex == "Women", Year == 2017) %>%
  dplyr::pull(BMI) %>% na.omit()
x2 <- x2[1:300]

# Perform a two sample (unpaired) t-test between x1 and x2
(z_res <- PASWR2::z.test(x1, x2, sigma.x = sig_x, sigma.y = sig_y, sigma.d = abs(sig_y - sig_x), paired = TRUE))

    Paired z-test

data:  x1 and x2
z = -393.33, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.36171 -2.33829
sample estimates:
mean of the differences 
                  -2.35 
# Fetch z-test result metrics and present them in a tidy table
broom::tidy(z_res) %>%
  kbl() %>%
  kable_paper("hover", full_width = F)
estimate statistic p.value conf.low conf.high method alternative
-2.35 -393.3333 0 -2.36171 -2.33829 Paired z-test two.sided

T-test

The t-test is used to determine whether two groups or one group and a known mean are significantly different from each other. The t-test can be used to compare means, or to compare proportions and can be used to compare two independent groups, or two dependent groups. The t-test is based on the t-statistic, which for a one group/sample where the mean for comparison is known is calculated using the following formula:

\[ \begin{equation} t = \dfrac{\bar{x}-\mu}{\sqrt{\dfrac{s^2}{n}}} \end{equation} \tag{4}\]

Where \(\bar x\) is the mean of the first group (one sample), \(\mu\) is the known mean, \(s^2\) is the standard deviation of the first group, and \(n\) is the number of observations in the first group.

Example code in R

# Fetch the BMI data for men from rural areas in the year 2017
x <- dplyr::filter(dat, Sex == "Men", Region == "Rural", Year == 2017) %>%
  dplyr::pull(BMI) 

# Perform one sample t-test between x and a known mean = 24.5
(t_res <- t.test(x, mu = 24.5))

    One Sample t-test

data:  x
t = 3.955, df = 195, p-value = 0.000107
alternative hypothesis: true mean is not equal to 24.5
95 percent confidence interval:
 24.87345 25.61635
sample estimates:
mean of x 
  25.2449 
# Fetch t-test result metrics and present them in a tidy table
broom::tidy(t_res) %>%
  kbl() %>%
  kable_paper("hover", full_width = F)
estimate statistic p.value parameter conf.low conf.high method alternative
25.2449 3.955031 0.000107 195 24.87345 25.61635 One Sample t-test two.sided

Two sample unpaired (independent) t-test

\[ \begin{equation} t = \dfrac{\bar{x_1} - \bar{x_2}}{\sqrt{s^2\bigg(\dfrac{1}{n_1} + \dfrac{1}{n_2}\bigg)}} \end{equation} \tag{5}\]

Where the pooled standard deviation \(s\) is:

\[ \begin{equation} s =\sqrt{\dfrac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 +n_2 -2}} \end{equation} \tag{6}\]

Where \(\bar x_1\) and \(\bar x_2\) are the means from the two samples, likewise \(n_1\) and \(n_2\) are the sample sizes and \(s_1^2\) and \(s_2^2\) are the sample variances. This test statistic you will compare to t-tables on \((n1+n2−2)\) degrees of freedom.

Example code for a two-tailed test in R

# Fetch the BMI data for women from rural and urban areas in the year 1985
x1 <- dplyr::filter(dat, Sex == "Women", Region == "Rural", Year == 1985) %>%
  dplyr::pull(BMI) 
x2 <- dplyr::filter(dat, Sex == "Women", Region == "Urban", Year == 1985) %>%
  dplyr::pull(BMI) 

# Perform a two sample (unpaired) t-test between x1 and x2
(t_res <- t.test(x1, x2, var.equal = TRUE))

    Two Sample t-test

data:  x1 and x2
t = -3.8952, df = 394, p-value = 0.0001152
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.5744694 -0.5182378
sample estimates:
mean of x mean of y 
 23.58782  24.63417 

We use the var.equal = TRUE option here to use the pooled standard deviation \(s\) Equation 6.

Note

For the pooled variance t-test to be appropriate you rely on the assumption that the two samples come from the same population and have equal variance. However, a modification of the t-test known as Welch’s test is said to correct for this problem by estimating the variances, and adjusting the degrees of freedom to use in the test. This correction is performed by default, but can be shut off by using the var.equal=TRUE argument as used in the example above.

# Fetch t-test result metrics and present them in a tidy table
broom::tidy(t_res) %>%
  kbl() %>%
  kable_paper("hover", full_width = F)
estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
-1.046354 23.58782 24.63417 -3.895234 0.0001152 394 -1.574469 -0.5182378 Two Sample t-test two.sided

Example code for a one-tailed test in R

Here we will test if women in rural areas had a higher BMI than those in urban areas in 1985.

# Perform a one-tailed two sample (unpaired) t-test between x1 and x2
(t_res <- t.test(x1, x2, var.equal = TRUE, alternative = "greater"))

    Two Sample t-test

data:  x1 and x2
t = -3.8952, df = 394, p-value = 0.9999
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 -1.489242       Inf
sample estimates:
mean of x mean of y 
 23.58782  24.63417 
# Fetch t-test result metrics and present them in a tidy table
broom::tidy(t_res) %>%
  kbl() %>%
  kable_paper("hover", full_width = F)
estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
-1.046354 23.58782 24.63417 -3.895234 0.9999424 394 -1.489242 Inf Two Sample t-test greater

Two sample paired (dependent) t-test

\[ \begin{equation} t = \dfrac{\bar{d}}{\sqrt{\dfrac{s^2}{n}}} \end{equation} \tag{7}\]

Where \(\bar d\) is the mean of the differences between the samples. You will compare the t-statistic to the critical values in a t-table on \((n−1)\) degrees of freedom. Here \(s\) is the standard deviation of the differences.

Example code in R

# Perform a two sample (paired) t-test between x1 and x2
(t_res <- t.test(x1, x2, paired = TRUE))

    Paired t-test

data:  x1 and x2
t = -14.095, df = 195, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -1.1870263 -0.8956268
sample estimates:
mean difference 
      -1.041327 
# Fetch t-test result metrics and present them in a tidy table
broom::tidy(t_res) %>%
  kbl() %>%
  kable_paper("hover", full_width = F)
estimate statistic p.value parameter conf.low conf.high method alternative
-1.041327 -14.09549 0 195 -1.187026 -0.8956268 Paired t-test two.sided

F-test

Conventionally in a hypothesis test the sample means are compared. However, it is possible to have two identical means for two different samples/groups, meanwhile their variances could differ drastically. The statistical test to use to compare variance is called the \(F\)-ratio test (or the variance ratio test) and compares two variances in order to test whether they come from the same populations.

Comparing two variances is useful in several cases, including:

  • When you want to perform a two samples t-test to check the equality of the variances of the two samples

  • When you want to compare the variability of a new measurement method to an old one. Does the new method reduce the variability of the measure?

\[ F = \frac{S_A^2}{S_B^2} \tag{8}\]

Where \(S^2_A\) and \(S^2_B\) are the two variances to compare. There is a table of the \(F\)-distribution. Similarly to the \(t\) and Normal distributions, it is organised according to the degrees of freedom of the two variance estimates. The degrees of freedom are \(n_A−1\) (for the numerator) and \(n_B−1\) (for the denominator).

Example code in R

(f_res <- var.test(x1, x2))

    F test to compare two variances

data:  x1 and x2
F = 1.3238, num df = 196, denom df = 198, p-value = 0.04957
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 1.000525 1.751931
sample estimates:
ratio of variances 
          1.323819 
# Fetch f-test result metrics and present them in a tidy table
broom::tidy(f_res) %>%
  kbl() %>%
  kable_paper("hover", full_width = F)
Multiple parameters; naming those columns num.df, den.df
estimate num.df den.df statistic p.value conf.low conf.high method alternative
1.323819 196 198 1.323819 0.0495733 1.000525 1.751931 F test to compare two variances two.sided

ANOVA

ANOVA (Analysis of Variance) is a parametric statistical method used to compare the means of two or more groups. Often used to determine if the differences between the means of several groups are statistically significant and therefore generalizes the t-test beyond two means.

At its core, ANOVA is a hypothesis testing procedure that uses an F-test Section 4 to compare the means of two or more groups. In R, ANOVA can be performed using the aov() function. This function requires a formula, data, and an optional subset argument. The formula should include the response variable and the explanatory variable. The data argument should be a data frame containing the variables specified in the formula. Finally, the subset argument is optional and can specify a subset of cases to include in the analysis.

In the example below, we test the difference in the average BMI for men in the year 2017 among national, rural, and urban regions.

# Filter the data for men and the year 2017.
a_dat <- dat %>% dplyr::filter(Sex == "Men" & Year == 2017)
DT::datatable(a_dat)
# Carry out ANOVA to measure differences in the BMI across the three regions.
a_res <- aov(BMI ~ Region, data = a_dat)
summary(a_res)
             Df Sum Sq Mean Sq F value Pr(>F)  
Region        2     42  21.171   3.422 0.0333 *
Residuals   592   3663   6.187                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
5 observations deleted due to missingness

The output of the summary() function will display the F-statistic and the corresponding p-value. If the p-value is less than the significance level, then we can conclude that there is a statistically significant difference between the means of the three groups.

Finally, we can use the TukeyHSD() function to perform a Tukey’s Honest Significant Difference test, which compares the means of all pairs of groups and tells us which groups are significantly different from each other.

tshd_res <- TukeyHSD(a_res)
(tshd_res)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = BMI ~ Region, data = a_dat)

$Region
                     diff         lwr       upr     p adj
Rural-National -0.3701020 -0.95753700 0.2173329 0.3011038
Urban-National  0.2829899 -0.30220442 0.8681843 0.4921257
Urban-Rural     0.6530920  0.06492696 1.2412570 0.0251982

References

“Rising Rural Body-Mass Index Is the Main Driver of the Global Obesity Epidemic in Adults.” 2019. Nature 569 (7755): 260–64. https://doi.org/10.1038/s41586-019-1171-x.
Wright, Carrie, Qier Meng, Leah Jager, Margaret Taub, and Stephanie. Hicks. 2020. “Exploring Global Patterns of Obesity Across Rural and Urban Regions (Version V1.0.0).” 2020. https://github.com/opencasestudies/ocs-bp-rural-and-urban-obesity.

Website(s)

Back to top

Citation

BibTeX citation:
@online{farmer2022,
  author = {Farmer, Rohit},
  title = {Parametric Hypothesis Tests with Examples in {R}},
  date = {2022-11-10},
  url = {https://dataalltheway.com/posts/010-parametric-hypothesis-tests-r},
  langid = {en}
}
For attribution, please cite this work as:
Farmer, Rohit. 2022. “Parametric Hypothesis Tests with Examples in R.” November 10, 2022. https://dataalltheway.com/posts/010-parametric-hypothesis-tests-r.