Chapter 7 Parametric tests and relevant assumptions
This section further illustrates assumptions of parametric tests and the methods to assess them.
7.1 Parametric statistical tests
T-test, analysis of variance, and linear regression are all parametric statistical tests. They are used when the dependent variable is an interval/ratio data variable, such as length, height, weight.
Advantages:
your audience will likely be familiar with the techniques and interpretation of the results.
These tests are also often more flexible and more powerful than their nonparametric analogues.
Drawback:
all parametric tests assume something about the distribution of the underlying data. If these assumptions are violated, the resultant test statistics will not be valid, and the tests will not be as powerful as for cases when assumptions are met.
Count/categorical data may not be appropriate for common parametric tests.
7.2 Assumptions
7.2.1 Random sampling
The data captured in the sample are randomly chosen from the population as a whole. Selection bias will obviously affect the validity of the outcome of the analysis.
7.2.2 Independent observations
Tests will also assume that observations are independent of one another, except when the analysis takes non-independence into account.
For example, in repeated measures experiments, the same subject is observed over time. Students with a high test score on one date to have a high test score on subsequent dates. In this case the observation on one date would not be independent of observations on other dates.The independence of observation is often assumed from good experimental design. Also, data or residuals can be plotted, for example to see if observations from one date are correlated to those for another date.
7.2.3 Normal distribution of data or residuals
Parametric tests assume that the data come from a population of known distribution, such as normal distribution. That is, the data are normally distributed once the effects of the variables in the model are taken into account.
Practically speaking, this means that the residuals from the analysis should be normally distributed. This will usually be assessed with a histogram of residuals, a density plot, or with quantile–quantile plot.
A select number of tests (limited to one-sample t-test, two-sample t-test, and paired t-test) will require that data itself be normally distributed.For other tests, the distribution of the residuals will be investigated.
Residuals, also commonly called errors, are the difference between the observations and the value predicted by the model. For example, if the calculated mean of a sample is 10, and one observation is 12, the residual for this observation is 2. Be careful not to get confused about this assumption. You may see discussion about how “data” should be normally distributed for parametric tests. This is usually wrong-headed. The t-test assumes that the observations for each group are normally distributed, but if there is a difference in the groups, we might expect a bi-modal distribution, not a simple normal distribution, for the combined data. This is why in most cases we look at the distribution of the residuals, not the raw data.
7.2.4 Homogeneity of variance
Parametric analyses will also assume a homogeneity of variance among groups. That is, for Student’s t-test comparing two groups, each group should have the same variance.Homogeneity of variance is also called homoscedasticity.
7.2.5 Additivity of treatment effects
Models for two-way analysis of variance and similar analyses are constructed as linear models in which the dependent variable is predicted as a linear combination of the independent variables.
A violation of this assumption is sometimes indicated when a plot of residuals versus predicted values exhibits a curved pattern.
7.2.6 Outliers
Outliers are observations whose value is far outside what is expected. They can play havoc with parametric analyses since they affect the distribution of the data and strongly influence the mean.
There are a variety of formal tests for detecting outliers, but they will not be discussed here. The best approach is one that looks at residuals after an analysis. Good tools are the “Residuals vs. leverage” plot and other plots in the “Other diagnostic plots” section below.
Some parametric tests are somewhat robust to violations of certain assumptions. For example, the t-test is reasonably robust to violations of normality for symmetric distributions, but not to samples having unequal variances (unless Welch’s t-test is used). A one-way analysis of variance is likewise reasonably robust to violations in normality.
…model assumptions should always be checked, but you may be able to tolerate small violations in the distribution of residuals or homoscedasticity. Large violations will make the test invalid, though. It is important to be honest with your assessments when checking model assumptions. It is better to transform data, change your model, use a robust method, or use a nonparametric test than to not have confidence in your analysis.
7.3 Assessing model assumptions
7.3.1 Normality of residuals
There are formal tests to assess the normality of residuals.
Common tests include
Shapiro-Wilk
Anderson–Darling
Kolmogorov–Smirnov
D’Agostino–Pearson
However, their results are dependent on sample size. When the sample size is large, the tests may indicate a statistically significant departure from normality, even if that departure is small. And when sample sizes are small, they won’t detect departures from normality.
In each case, the null hypothesis is that the data distribution is not different from normal. That is, a significant p-value (p < 0.05) suggests that data are not normally distributed.
# head(diamonds)
# summary(diamonds)
test_data = diamonds%>%
dplyr::filter(cut %in% c("Fair", "Ideal" ),
carat == 0.7,
color %in% c("G", "F" ),
clarity %in% c("SI1", "VS2" ))
# table(test_data$cut)
ggplot(test_data,
aes(price, fill = cut)) +
geom_density(position="dodge",
alpha = 0.6)
## Warning: Width not defined. Set with `position_dodge(width = ?)`
#Define a linear model
model = lm(price ~ cut + color,
data = test_data)
#Shapiro–Wilk normality test
x = residuals(model)
shapiro.test(x)
##
## Shapiro-Wilk normality test
##
## data: x
## W = 0.96048, p-value = 0.0006383
# Anderson-Darling normality test
# library(nortest)
x = residuals(model)
ad.test(x)
##
## Anderson-Darling normality test
##
## data: x
## A = 1.7123, p-value = 0.0002071
# One-sample Kolmogorov-Smirnov test
x = residuals(model)
ks.test(x,
"pnorm",
mean = mean(x),
sd = sd(x))
## Warning in ks.test(x, "pnorm", mean = mean(x), sd = sd(x)): ties should not
## be present for the Kolmogorov-Smirnov test
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.096286, p-value = 0.1666
## alternative hypothesis: two-sided
# D'Agostino Normality Test
# library(fBasics)
x = residuals(model)
dagoTest(x)
##
## Title:
## D'Agostino Normality Test
##
## Test Results:
## STATISTIC:
## Chi2 | Omnibus: 10.2249
## Z3 | Skewness: 3.1412
## Z4 | Kurtosis: 0.5981
## P VALUE:
## Omnibus Test: 0.006021
## Skewness Test: 0.001683
## Kurtosis Test: 0.5498
##
## Description:
## Thu Dec 19 20:58:39 2019 by user: Juqiang Chen
7.3.2 Skew and kurtosis
There are no definitive guidelines as to what range of skew or kurtosis are acceptable for considering residuals to be normally distributed. we can rely on skew and kurtosis calculations,or histograms and other plots.
If the absolute value is > 0.5, BE CAUTIOUS If the absolute value is > 1.0, consider it not normally distributed. Some authors use 2.0 as a cutoff for normality, and others use a higher limit for kurtosis.
# library(psych)
x = residuals(model)
describe(x,
type=2)
## vars n mean sd median trimmed mad min max range skew
## X1 1 134 0 284.79 -40.84 -20.5 262.42 -543.69 839.01 1382.7 0.7
## kurtosis se
## X1 0.18 24.6
7.3.3 Visual inspection to assess the normality of residuals
Usually, the best method to see if model residuals meet the assumptions of normal distribution and homoscedasticity are to plot them and inspect the plots visually.
7.3.3.1 Histogram with normal curve
A histogram of the residuals should be approximately normal, without excessive skew or kurtosis.
Adding a normal curve with the same mean and standard deviation as the data helps to assess the histogram.
x = residuals(model)
library(rcompanion)
plotNormalHistogram(residuals(model))
7.3.3.2 Kernel density plot with normal curve
A kernel density plot is similar to a histogram, but is smoothed into a curve. Sometimes a density plot gives a better representation of the distribution of data, because the appearance of the histogram depends upon how many bins are used.
The plotNormalDensity function will produce this plot. Options include those for the plot function, as well as adjust, bw, and kernel which are passed to the density function. col1, col2, and col3 change plot colors, and lwd changes line thickness.
library(rcompanion)
x = residuals(model)
plotNormalDensity(x,
adjust = 1) ### Decrease this number
### to make line less smooth
7.3.4 Visual inspection for homogeneity of variance
Patterns in the plot of residuals versus fitted values can indicate a lack of homoscedasticity or that errors are not independent of fitted values.
plot(fitted(model),
residuals(model))
7.3.5 formal tests for homogeneity of variance
In each case, the null hypothesis is that the variance among groups is not different. That is, a significant p-value (p < 0.05) suggests that the variance among groups is different.
#Define a linear model
model = lm(price ~ cut + color,
data = test_data)
7.3.5.1 Bartlett’s test for homogeneity of variance
Bartlett’s test is known to be sensitive to non-normality in samples. That is, non-normal samples can result in a significant test due to the non-normality.
x = residuals(model)
bartlett.test(x ~ interaction(cut, color),
data = test_data)
##
## Bartlett test of homogeneity of variances
##
## data: x by interaction(cut, color)
## Bartlett's K-squared = 39.068, df = 3, p-value = 1.679e-08
7.3.5.2 Levene’s test for homogeneity of variance
Levene’s test is an alternative to Bartlett’s that is supposedly less sensitive to departures from normality in the data.
#library(car)
x = residuals(model)
leveneTest(x ~ cut * color,
data=test_data,
center=mean) ### original Levene’s test
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 3 8.76 2.468e-05 ***
## 130
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
7.3.5.3 Brown–Forsythe or robust Levene’s test
The Brown–Forsythe modification of Levene’s test makes it more robust to departures in normality of the data.
x = residuals(model)
#library(car)
leveneTest(x ~ cut * color, data=test_data)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 8.5541 3.163e-05 ***
## 130
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#library(lawstat)
levene.test(x, interaction(test_data$cut, test_data$color))
##
## Modified robust Brown-Forsythe Levene-type test based on the
## absolute deviations from the median
##
## data: x
## Test Statistic = 0.51854, p-value = 0.6702
7.3.5.4 Fligner-Killeen test
The Fligner-Killeen test is another test for homogeneity of variances that is robust to departures in normality of the data.
x = residuals(model)
fligner.test(x ~ interaction(cut, color), data=test_data)
##
## Fligner-Killeen test of homogeneity of variances
##
## data: x by interaction(cut, color)
## Fligner-Killeen:med chi-squared = 11.948, df = 3, p-value =
## 0.007562
References: https://rcompanion.org/handbook/I_01.html