Linear mixed-effects models are commonly used to analyze clustered data structures. There are numerous packages to fit these models in R and conduct likelihood-based inference. The implementation of resampling-based procedures for inference are more limited. In this paper, we introduce the lmeresampler package for bootstrapping nested linear mixed-effects models fit via lme4 or nlme. Bootstrap estimation allows for bias correction, adjusted standard errors and confidence intervals for small samples sizes and when distributional assumptions break down. We will also illustrate how bootstrap resampling can be used to diagnose this model class. In addition, lmeresampler makes it easy to construct interval estimates of functions of model parameters.
Clustered data structures occur in a wide range of studies. For example, students are organized within classrooms, schools, and districts, imposing a correlation structure that must be accounted for in the modeling process. Similarly, cholesterol measurements could be tracked across time for a number of subjects, resulting in measurements being grouped by subject. Other names for clustered data structures include grouped, nested, multilevel, hierarchical, longitudinal, repeated measurements, and blocked data. The covariance structure imposed by clustered data is commonly modeled using linear mixed-effects (LME) models, also referred to as hierarchical linear or multilevel linear models (Pinheiro and Bates 2000; Raudenbush and Bryk 2002; Goldstein 2011).
In this paper, we restrict attention to the Gaussian response LME model for clustered data structures. For cluster \(i=1, \ldots, g\), this model is expressed as \[\begin{equation} \underset{(n_i \times 1)}{\boldsymbol{y}_i} = \underset{(n_i \times p)}{\boldsymbol{X}_i} \ \underset{(p \times 1)}{\boldsymbol{\beta}} + \underset{(n_i \times q)}{\boldsymbol{Z}_i} \ \underset{(q \times 1)}{\boldsymbol{b}_i} + \underset{(n_i \times 1)}{\boldsymbol{\varepsilon}_i}, \tag{1} \end{equation}\] where \[\begin{equation} \boldsymbol{\varepsilon}_i \sim \mathcal{N}(\boldsymbol{0}, \sigma^2 \boldsymbol{I}_{n_i}), \boldsymbol{b}_i \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{D}), \tag{2} \end{equation}\] where \(\boldsymbol{\varepsilon}_i\) and \(\boldsymbol{b}_i\) are independent for all \(i\), \(\boldsymbol{\varepsilon}_i\) is independent of \(\boldsymbol{\varepsilon}_j\) for \(i\ne j\), and \(\boldsymbol{b}_i\) is independent of \(\boldsymbol{b}_j\) for \(i\ne j\). Here, \(\boldsymbol{0}\) denotes a vector of zeros of length \(n_i\) (the number of observations in group \(i\)), \(\boldsymbol{I}_{n_i}\) denotes the \(n_i\)-dimensional identity matrix, and \(\boldsymbol{D}\) is a \(q \times q\) covariance matrix.
In R, the two most popular packages to fit LME models are nlme (Pinheiro et al. 2017) and lme4 (Bates et al. 2015). Both packages fit LME models using either maximum likelihood or restricted maximum likelihood methods. These methods rely on the distributional assumptions placed on the residual quantities, \(\boldsymbol{\varepsilon}_i\) and \(\boldsymbol{b}_i\), as well as large enough sample sizes. While some aspects of LME models are quite robust to model misspecification, others are more sensitive, leading to biased estimates and/or incorrect standard errors. Jacqmin-Gadda et al. (2007) found that inference for the fixed effects is robust if the distribution of the error terms, \(\boldsymbol{\varepsilon}_i\), is non-normal or heteroscedastic, but that variance components and random effects are biased if the covariance structure for the error terms is misspecified. The fixed intercept is not robust to misspecification of the random effects (Hui et al. 2021). In addition, misspecification of the random effects distribution can lead to biased estimates of the variance components and undercoverage of the confidence intervals (Hui et al. 2021). In cases where distributional assumptions are suspect, bootstrapping provides an alternative inferential approach that leads to consistent, bias-corrected parameter estimates, standard errors, and confidence intervals. Standard errors and confidence intervals for functions of model parameters are also easily calculated using a bootstrap procedure, and are available even in situations where closed-form solutions are not.
A variety of bootstrap procedures for clustered data and the LME model have been proposed and investigated, including the cases (non-parametric) bootstrap, the residual bootstrap, the parametric bootstrap, the random effect block (REB) bootstrap, and the wild bootstrap (Morris 2002; Carpenter et al. 2003; Field and Welsh 2007; Chambers and Chandra 2013; Modugno and Giannerini 2015). Van der Leeden et al. (2008) provide a thorough overview of bootstrapping LME models. Sánchez-Espigares and Ocaña (2009) developed a full-featured bootstrapping framework for LME models in R; however, this package is not available and there appears to be no active development. Consequently, R users must pick and choose packages based on what bootstrap procedure they wish to implement. The parametric bootstrap is available in lme4 via the bootMer()
function, as is the semi-parametric (residual) bootstrap proposed by Morris (2002). The simulateY()
function in nlmeU (Galecki and Burzykowski 2013) makes it easy to simulate values of the response variable for lme
model objects; however, the user is required to implement the remainder of the parametric bootstrap. Chambers and Chandra (2013) made R code available to implement their REB bootstrap as well as the parametric bootstrap and the residual bootstrap proposed by Carpenter et al. (2003) for LME models fix via nlme::lme()
. Unfortunately, these functions were not published on CRAN or extended to models fit via lme4::lmer()
. Bootstrap procedures for specific inferential tasks have also been implemented. The parametric bootstrap has been implemented to carry out likelihood ratio tests for the presence of variance components in RLRsim (Scheipl et al. 2008). pbkrtest (Halekoh and Højsgaard 2014) provides a Kenward-Roger approximation for performing F-tests using a parametric bootstrap approach for LME models and generalized LME models. rptR uses the parametric bootstrap to estimate repeatabilities for LME models (Stoffel et al. 2017). Finally, constrained inference for LMEs using the residual bootstrap is implemented in CLME (Farnan et al. 2014).
In this paper, we introduce the lmeresampler package which implements a suite of bootstrap methods for LME models fit using either nlme or lme4 in a single package. lmeresampler provides a unified bootstrap()
command to reduce cognitive load on the user and also provides access to procedures that were not previously available on CRAN. In the next section, we will clarify some notation and terminology for LME models. In Bootstrap procedures for clustered data, we provide an overview of the bootstrap methods implemented in lmeresampler. We then give an Overview of lmeresampler, discuss a variety of Example applications for LME models, and show how to Bootstrapping in parallel works in lmeresampler. We conclude with a Summary and highlight areas for future development.
In lmeresampler, we implement five bootstrap procedures for clustered data: a cases (i.e., nonparamteric) bootstrap, a residual bootstrap (i.e., semiparametric), a parametric bootstrap, a wild bootstrap, and a block bootstrap. In this section, we provide an overview of these bootstrap approaches. Our discussion focuses on two-level models, but procedures generalize to higher-level models unless otherwise noted.
The cases bootstrap is a fully non-parametric bootstrap that resamples the clusters in the data set to generate bootstrap resamples. Depending on the nature of the data, this resampling can be done only for the top-level cluster, only at the observation-level within a cluster, or at both levels. The choice of the exact resampling scheme should be dictated by the way the data were generated, since the cases bootstrap generates new data by mimicking this process. Van der Leeden et al. (2008) provide a cogent explanation of how to select a resampling scheme. To help ground the idea of resampling, consider a two-level hierarchical data set where students are organized into schools.
One version of the cases bootstrap is implemented by only resampling the clusters. This version of the bootstrap is what Field and Welsh (2007) term the cluster bootstrap and Goldstein (2011) term the non-parametric bootstrap. We would choose this resampling scheme, for example, if schools were chosen at random and then all students within each school were observed. In this case, the bootstrap proceeds as follows:
An alternative version of the cases bootstrap only resamples the observations within clusters, which makes sense in our example if the schools were fixed and students were randomly sampled within schools.
A third version of the cases bootstrap resamples both clusters and cases within clusters. This is what Field and Welsh (2007) term the two-state bootstrap. We would choose this resampling scheme if both schools and students were sampled during the data collection process.
All three versions of the case bootstrap are implemented in lmeresampler. We explain how the resampling is specified in our Overview of lmeresampler. Regardless of which version of the cases bootstrap you choose, it requires the weakest conditions: it only requires that the hierarchical structure in the data set is correctly specified.
The parametric bootstrap simulates random effects and error terms from the fitted distributions to form bootstrap resamples. Consequently, it requires the strongest conditions—that is, the parametric bootstrap assumes that the model, as specified by Equations (1) and (2), is correctly specified.
Let \(\widehat{\boldsymbol \beta}\), \(\widehat{\boldsymbol D}\), and \(\widehat{\sigma}^2\) be maximum likelihood or restricted maximum likelihood estimates of the fixed effects and variance components from the fitted model. The parametric bootstrap is then implemented through the following steps:
The residual bootstrap resamples the residual quantities from the fitted LME model in order to generate bootstrap resamples. There are three general types of residuals for LME models (Haslett and Haslett 2007; Singer et al. 2017). correspond to the errors made using the marginal predictions, \(\widehat{\boldsymbol r}_i = \boldsymbol{y_i} - \widehat{\boldsymbol{y}}_i = \boldsymbol{y}_i - \boldsymbol{X}_i \widehat{\boldsymbol{\beta}}\). correspond to the errors made using predictions conditioned on the clusters, \(\widehat{\boldsymbol{e}}_i = \boldsymbol{y}_i - \widehat{\boldsymbol{y}}_i| \boldsymbol{b}_i = \boldsymbol{y}_i - \boldsymbol{X}_i \widehat{\boldsymbol{\beta}} - \boldsymbol{Z}_i \widehat{\boldsymbol{b}}_i\). The are the last type of residual quantity and are defined as \(\widehat{\boldsymbol{b}}_i = \widehat{\boldsymbol{D}} \boldsymbol{Z}_i^\prime \widehat{\boldsymbol{V}}_i^{-1}\left(\boldsymbol{y}_i - \boldsymbol{X}_i \widehat{\boldsymbol{\beta}} \right)\) where \(\widehat{\boldsymbol{V}}_i = \boldsymbol{Z}_i \widehat{\boldsymbol{D}}\boldsymbol{Z}_i^\prime + \widehat{\sigma}^2 \boldsymbol{I}_{n_i}\).
A naive implementation of the residual bootstrap would draw random samples, with replacement, from the estimated conditional residuals and the best linear unbiased predictors (BLUPS); however, this will consistently underestimate the variability in the data because the residuals are shrunken toward zero (Morris 2002; Carpenter et al. 2003). Carpenter et al. (2003) solve this problem by “reflating” the residual quantities so that the empirical covariance matrices match the estimated covariance matrices prior to resampling:
Fit the model and calculate the empirical BLUPs, \(\widehat{\boldsymbol{b}}_i\), and the predicted conditional residuals, \(\widehat{\boldsymbol{e}}_i\).
Mean center each residual quantity and reflate the centered residuals. Only the process to reflate the predicted random effects is discussed below, but the process is analogous for the conditional residuals.
Draw a random sample, with replacement, from the set \(\lbrace \boldsymbol{u}_i^* \rbrace\) of size \(g\), where \(\boldsymbol{u}_i^*\) is the \(i\)th row of the centered and reflated random effects matrix, \(\widehat{\boldsymbol{U}}^*\).
Draw \(g\) random samples, with replacement, of sizes \(n_i\) from the set of the centered and reflated conditional residuals, \(\lbrace \boldsymbol{e}_i^* \rbrace\).
Generate the bootstrap responses, \(\boldsymbol{y}^*_i\), using the fitted model equation: \(\boldsymbol{y}^*_i = \boldsymbol{X}_i \widehat{\boldsymbol{\beta}} + \boldsymbol{Z}_i \widehat{\boldsymbol{u}}^*_i + \boldsymbol{e}_i^*\).
Refit the model to the bootstrap responses and extract the parameter estimates of interest.
Repeat steps 3–6 B times.
Notice that the residual bootstrap is a semiparametric bootstrap, since it depends on the model structure (both the mean function and the covariance structure) but not the distributional conditions (Morris 2002).
Another semiparametric bootstrap is the random effect block (REB) bootstrap proposed by Chambers and Chandra (2013). The REB bootstrap can be viewed as a version of the residual bootstrap where conditional residuals are resampled from within clusters (i.e., blocks) to allow for weaker assumptions on the covariance structure of the residuals. The residual bootstrap requires that the conditional residuals are independent and identically distributed, whereas the REB bootstrap relaxes this to only require that the covariance structure of the error terms is similar across clusters. In addition, the REB bootstrap utilizes the marginal residuals to calculate non-parametric predicted random effects rather than relying on the model-based empirical best linear unbiased predictors (EBLUPS). Chambers and Chandra (2013) developed three versions of the REB bootstrap, all of which have been implemented in lmeresampler. We refer the reader to Chambers and Chandra (2013) for a discussion of when each should be used. It’s important to note that at the time of this writing, that the REB bootstrap has only been explored and implemented for use with two-level models.
The base algorithm for the REB bootstrap (also known as REB/0) is as follows:
The first variation of the REB bootstrap (REB/1) zero centers and reflates the residual quantities prior to resampling in order to satisfy the conditions for consistency (Shao and Tu 1995). This is the same process outlined in Step 2 of the residual bootstrap outlined above.
The second variation of the REB bootstrap (REB/2 or postscaled REB) addresses two issues: potential non-zero covariances in the joint bootstrap distribution of the variance components and potential bias in the parameter estimates. After the REB/0 algorithm is run, the following post processing is performed:
To uncorrelate the bootstrap estimates of the variance components produced by REB/0, Chambers and Chandra (2013) propose the following procedure:
Apply natural logarithms to the bootstrap distribution of each variance component and form the following matrices. Note that we use \(\nu\) to denote the total number of variance components.
Calculate \(\boldsymbol{C}^*=\mathrm{cov} \left( \boldsymbol{S}^* \right)\).
Calculate \(\boldsymbol{L}^* = \boldsymbol{M}^* + \lbrace \left(\boldsymbol{S}^* - \boldsymbol{M}^*\right)\boldsymbol{C}^{*-1/2} \rbrace \circ \boldsymbol{D}^*\), where \(\circ\) denotes the Hadamard (elementwise) product.
Exponentiate the elements of \(\boldsymbol{L}^*\). The columns of \(\boldsymbol{L}^*\) are then uncorrelated versions of the bootstrap variance components.
To correct bias in the estimation of the fixed effects, apply a mean correction. For each parameter estimate, \(\widehat{\beta}_k\), adjust the bootstrapped estimates, \(\widehat{\boldsymbol{\beta}}_k^*\) as follows: \(\widetilde{\boldsymbol{\beta}}_k^{*} = \widehat{\beta}_k \boldsymbol{1}_{B} + \widehat{\boldsymbol{\beta}}_k^* -{\rm avg} \left( \widehat{ \boldsymbol{\beta} }_k^* \right)\). To correct bias in the estimation of the variance components, apply a ratio correction. For each estimated variance component , \(\widehat{\sigma}^2_v\), adjust the uncorrelated bootstrapped estimates, \(\widehat{\boldsymbol{\sigma}}_v^{2*}\) as follows: \(\widetilde{\boldsymbol{\sigma}}_{v}^{2*} = \widehat{\boldsymbol{\sigma}}_v^{2*} \circ \lbrace \widehat{\sigma}^2_v / {\rm avg} \left( \widehat{\boldsymbol{\sigma}}_v^{2*} \right) \rbrace\)
The wild bootstrap also relaxes the assumptions made on the error terms of the model, allowing heteroscedasticity both within and across groups. The wild bootstrap is well developed for the ordinary regression model (Liu 1988; Flachaire 2005; Davidson and Flachaire 2008) and Modugno and Giannerini (2015) adapt it for the nested LME model.
To begin, we can re-express model (1) as
\[\begin{equation} \boldsymbol{y}_i = \boldsymbol{X}_i \boldsymbol{\beta} + \boldsymbol{v}_i, \text{ where } \boldsymbol{v}_i = \boldsymbol{Z}_i \boldsymbol{b}_i + \boldsymbol{\varepsilon}_i. \tag{3} \end{equation}\]
The wild bootstrap proceeds as follows:
Draw a random sample, \(w_1, w_2, \ldots, w_g\), from an auxiliary distribution with mean zero and unit variance.
Generate bootstrap responses using the re-expressed model equation (3): \(\boldsymbol{y}^*_i = \boldsymbol{X}_i \widehat{\boldsymbol{\beta}} + \tilde{\boldsymbol{v}}_i w_j\), where \(\tilde{\boldsymbol{v}}_i\) is a heteroscedasticity consistent covariance matrix estimator. Modugno and Giannerini (2015) suggest using what Flachaire (2005) calls \({{\rm HC}_2}\) or \({{\rm HC}_3}\) in the regression context:
\[\begin{align} {{\rm HC}_2}&: \tilde{\boldsymbol{v}}_i = {\rm diag} \left( \boldsymbol{I}_{n_i} - \boldsymbol{H}_i \right)^{-1/2} \circ \boldsymbol{r}_i\\ {{\rm HC}_3}&: \tilde{\boldsymbol{v}}_i = {\rm diag} \left( \boldsymbol{I}_{n_i} - \boldsymbol{H}_i \right) \circ \boldsymbol{r}_i, \end{align}\]
where \(\boldsymbol{H}_i = \boldsymbol{X}_i \left(\boldsymbol{X}_i^\prime \boldsymbol{X}_i \right)^\prime \boldsymbol{X}_i^\prime\), the \(i\)th diagonal block of the orthogonal projection matrix, and \(\boldsymbol{r}_i\) is the vector of marginal residuals for group \(i\).
Refit the model to the bootstrap sample and extract the parameter estimates of interest.
Repeat steps 1–3 \(B\) times.
Five options for the auxiliary distribution are implemented in lmeresampler:
The two-point distribution proposed by Mammen (1993) takes value \(w_i=-(\sqrt{5} - 1)/2\) with probability \(p=(\sqrt{5}+1) / (2 \sqrt{5})\) and \(w_i = (\sqrt{5} + 1)/2\) with probability \(1-p\).
The two-point Rademacher distribution places equal probability on \(w_i = \pm 1\).
The six-point distribution proposed by Webb (2013) places equal probability on \(w_i = \pm \sqrt{1/2}, \pm1, \pm \sqrt{3/2}\).
The standard normal distribution.
The gamma distribution with shape parameter 4 and scale parameter \(1/2\) (Liu 1988) that is centered to have mean zero. \(w_i^*\) are randomly sampled from the gamma distribution, and then centered via \(w_i = w_i^* - 2\).
Modugno and Giannerini (2015) found the Mammen distribution to be preferred based on a Monte Carlo study, but only compared it to the Rademacher distribution.
The lmeresampler package implements the five bootstrap procedures outlined in the previous section for Gaussian response LME models for clustered data structures fit using either nlme (Pinheiro et al. 2017) or lme4. The workhorse function in lmeresampler is
bootstrap(model, .f, type, B, resample = NULL, reb_type = NULL, hccme, aux.dist,
orig_data, .refit)
The four required parameters to are:
model
, an lme
or merMod
fitted model object.
.f
, a function defining the parameter(s) of interest that should be extracted/calculated for each bootstrap iteration.
type
, a character string specifying the type of bootstrap to run. Possible values include: "parametric"
, "residual"
, "reb"
, "wild"
, and "case"
.
B
, the number of bootstrap resamples to generate.
.refit
, whether the model should be refit to the bootstrap sample. This defaults to TRUE
.
There are also five optional parameters: resample
, reb_type
, hccme
, aux.dist
, and orig_data
.
If the user sets type = "case"
, then they must also set resample
to specify what cluster resampling scheme to use. resample
requires a logical vector of length equal to the number of levels in the model. A value of TRUE
in the \(i\)th position indicates that cases/clusters at that level should be resampled. For example, to only resample the clusters (i.e., level 2 units) in a two-level model, the user would specify resample = c(FALSE, TRUE)
.
If the user sets type = "reb"
, then they must also set reb_type
to indicate which version of the REB bootstrap to run. reb_type
accepts the integers 0
, 1
, and 2
to indicate REB/0, REB/1, and REB/2, respectively.
If the user sets type = "wild"
, then they must specify both hccme
and aux.dist
. Currently, hccme
can be set to "hc2"
or "hc3"
and aux.dist
can be set to "mammen"
, "rademacher"
, "norm"
, "webb"
, or "gamma"
.
If variables are transformed within the model formula and lmer()
is used to fit the LME model, then orig_data
should be set to the original data frame, since this cannot be recovered from the fitted model object.
The bootstrap()
function is a generic function that calls functions for each type of bootstrap. The user can call the specific bootstrap function directly if preferred. An overview of the specific bootstrap functions is given in Table 1.
Bootstrap | Function name | Required arguments |
---|---|---|
Cases |
case_bootstrap
|
model, .f, type, B, resample, orig_data, .refit
|
Residual |
resid_bootstrap
|
model, .f, type, B, orig_data, .refit
|
REB |
reb_bootstrap
|
model, .f, type, B, reb_type, orig_data, .refit
|
Wild |
wild_bootstrap
|
model, .f, type, B, hccme, aux.dist, orig_data, .refit
|
Parametric |
parametric_bootstrap
|
model, .f, type, B, orig_data, .refit
|
Each of the specific bootstrap functions performs four general steps:
Setup. Key information (parameter estimates, design matrices, etc.) is extracted from the fitted model to eliminate repetitive actions during the resampling process.
Resampling. The setup information is passed to an internal resampler function to generate the B
bootstrap samples.
Refitting. The model is refit for each of the bootstrap samples and the specified parameters are extracted/calculated.
Clean up. An internal completion function formats the original and bootstrapped quantities to return a list to the user.
Each function returns an object of class lmeresamp
, which is a list with elements outlined in Table 2. print()
, summary()
, plot()
, and confint()
methods are available for lmeresamp
objects.
Element | Description |
---|---|
observed
|
values for the original model parameter estimates. |
model
|
the original fitted model object. |
.f
|
the function call defining the parameters of interest. |
replicates
|
a \(B \times p\) tibble containing the bootstrapped quantities. Each column contains a single bootstrap distribution. |
stats
|
a tibble containing the observed , rep.mean (bootstrap mean), se (bootstrap standard error), and bias values for each parameter.
|
B
|
the number of bootstrap resamples performed. |
data
|
the original data set. |
seed
|
a vector of randomly generated seeds that are used by the bootstrap. |
type
|
a character string specifying the type of bootstrap performed. |
call
|
the user’s call to the bootstrap function. |
message
|
a list of length B giving any messages generated during refitting. An entry will be NULL if no message was generated.
|
warning
|
a list of length B giving any warnings generated during refitting. An entry will be NULL if no warning was generated.
|
error
|
a list of length B giving any errors encountered during refitting. An entry will be NULL if no error was encountered.
|
lmeresampler also provides the extract_parameters()
helper function to extract the fixed effects and variance components from merMod
and lme
objects as a named vector.
As a first application of the bootstrap for nested LME models, consider the junior school project (JSP) data (Mortimore et al. 1988; Goldstein 2011) that is stored as jsp728
in lmeresampler. The data set is comprised of measurements taken on 728 elementary school students across 48 schools in London.
library(lmeresampler)
tibble::as_tibble(jsp728)
# A tibble: 728 × 9
mathA…¹ mathA…² gender class school normA…³ normA…⁴ schoo…⁵ mathA…⁶
<dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 39 36 M nonm… 1 1.80 1.55 22.4 13.6
2 11 19 F manu… 1 -2.29 -0.980 22.4 -3.42
3 32 31 F manu… 1 -0.0413 0.638 22.4 8.58
4 27 23 F nonm… 1 -0.750 -0.460 22.4 0.579
5 36 39 F nonm… 1 0.743 2.15 22.4 16.6
6 33 25 M manu… 1 0.163 -0.182 22.4 2.58
7 30 27 M manu… 1 -0.372 0.0724 22.4 4.58
8 17 14 M manu… 1 -1.63 -1.52 22.4 -8.42
9 33 30 M manu… 1 0.163 0.454 22.4 7.58
10 20 19 M manu… 1 -1.40 -0.980 22.4 -3.42
# … with 718 more rows, and abbreviated variable names ¹mathAge11,
# ²mathAge8, ³normAge11, ⁴normAge8, ⁵schoolMathAge8, ⁶mathAge8c
Suppose we wish to fit a model using the math score at age 8, gender, and the father’s social class to describe math scores at age 11, including a random intercept for school (Goldstein 2011). This LME model can be fit using the lmer()
function in lme4.
To implement the residual bootstrap to estimate the fixed effects, we can use the bootstrap()
function and set type = "residual"
.
(jsp_boot <- bootstrap(jsp_mod, .f = fixef, type = "residual", B = 2000))
Bootstrap type: residual
Number of resamples: 2000
term observed rep.mean se bias
1 (Intercept) 14.1577509 14.1697871 0.6702835 0.0120361440
2 mathAge8 0.6388895 0.6391441 0.0248334 0.0002546297
3 genderM -0.3571922 -0.3654058 0.3391162 -0.0082135863
4 classnonmanual 0.7200815 0.7154916 0.3783149 -0.0045898872
There were 0 messages, 0 warnings, and 0 errors.
We can then calculate normal, percentile, and basic bootstrap confidence intervals via confint()
.
confint(jsp_boot)
# A tibble: 12 × 6
term estimate lower upper type level
<chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 (Intercept) 14.2 12.8 15.5 norm 0.95
2 mathAge8 0.639 0.590 0.687 norm 0.95
3 genderM -0.357 -1.01 0.316 norm 0.95
4 classnonmanual 0.720 -0.0168 1.47 norm 0.95
5 (Intercept) 14.2 12.8 15.5 basic 0.95
6 mathAge8 0.639 0.588 0.687 basic 0.95
7 genderM -0.357 -1.00 0.323 basic 0.95
8 classnonmanual 0.720 -0.000162 1.46 basic 0.95
9 (Intercept) 14.2 12.9 15.5 perc 0.95
10 mathAge8 0.639 0.590 0.690 perc 0.95
11 genderM -0.357 -1.04 0.287 perc 0.95
12 classnonmanual 0.720 -0.0187 1.44 perc 0.95
The default setting is to calculate all three intervals, but this can be restricted by setting the type
parameter to "norm"
, "basic"
, or "perc"
.
The beauty of the bootstrap is its flexibility. Interval estimates can be constructed for functions of model parameters that would otherwise require more complex derivations. For example, the bootstrap can be used to estimate the intraclass correlation. The intraclass correlation measures the proportion of the total variance in the response accounted for by groups, and is an important measure of repeatability in ecology and evolutionary biology (Nakagawa and Schielzeth 2010). As a simple example, we’ll consider the BeetlesBody
data set in rptR (Stoffel et al. 2017). This simulated data set contains information on body length (BodyL
) and the Population
from which the beetles were sampled. A simple Guassian-response LME model of the form
\[ y_{ij} = \beta_0 + b_i + \varepsilon_{ij}, \qquad b_i \sim \mathcal{N}(0, \sigma^2_b), \qquad \varepsilon_{ij} \sim \mathcal{N}(0, \sigma^2), \]
can be used to describe the body length of beetle \(j\) from population \(i\). The repeatability is then calculated as \(R = \sigma^2_b / (\sigma^2_b + \sigma^2)\). Below we fit this model using lmer()
:
data("BeetlesBody", package = "rptR")
(beetle_mod <- lmer(BodyL ~ (1 | Population), data = BeetlesBody))
Linear mixed model fit by REML ['lmerMod']
Formula: BodyL ~ (1 | Population)
Data: BeetlesBody
REML criterion at convergence: 3893.268
Random effects:
Groups Name Std.Dev.
Population (Intercept) 1.173
Residual 1.798
Number of obs: 960, groups: Population, 12
Fixed Effects:
(Intercept)
14.08
To construct a bootstrap confidence interval for the repeatability, we first must write a function to calculate it from the fitted model. Below we write a one-off function for this model to demonstrate a “typical” workflow rather than trying to be overly general.
repeatability <- function(object) {
vc <- as.data.frame(VarCorr(object))
vc$vcov[1] / (sum(vc$vcov))
}
The original estimate of repeatability can then be quickly calculated:
repeatability(beetle_mod)
[1] 0.2985548
To construct a bootstrap confidence interval for the repeatability, we run the desired bootstrap procedure, specifying .f = repeatability
and then pass the results to confint()
.
(beetle_boot <- bootstrap(beetle_mod, .f = repeatability, type = "parametric", B = 2000))
Bootstrap type: parametric
Number of resamples: 2000
observed rep.mean se bias
1 0.2985548 0.2874126 0.08787679 -0.01114219
There were 0 messages, 0 warnings, and 0 errors.
(beetle_ci <- confint(beetle_boot, type = "basic"))
# A tibble: 1 × 6
term estimate lower upper type level
<chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 "" 0.299 0.139 0.471 basic 0.95
Notice that the column of beetle_ci
is an empty character string since we did not have return a named vector.
Alternatively, we can plot()
the results, as shown in Figure 1. The plot method for lmeresamp
objects uses stat_halfeye()
from ggdist (Kay 2021) to render a density plot with associated 66% and 95% percentile intervals.
plot(beetle_boot, .width = c(.5, .9)) +
ggplot2::labs(
title = "Bootstrap repeatabilities",
y = "density",
x = "repeatability"
)
While lmeresampler was designed with a focus on estimation, the bootstrap functions can be used to conduct bootstrap tests on individual parameters. For example, returning to the JSP example, we might be interested in generating approximate \(p\)-values for the fixed effects:
summary(jsp_mod)$coefficients
Estimate Std. Error t value
(Intercept) 14.1577509 0.73344165 19.303173
mathAge8 0.6388895 0.02544478 25.108865
genderM -0.3571922 0.34009284 -1.050279
classnonmanual 0.7200815 0.38696812 1.860829
To generate a bootstrap \(p\)-value for a fixed effect, we must generate \(B\) bootstrap resamples under the reduced model (i.e., the null hypothesis), refit the full model, and then calculate the \(t\)-statistic for each resample, denoted \(t^*_i\). The bootstrap \(p\)-value is then calculated as \((n_{\rm extreme} + 1) / (B + 1)\) (Davison and Hinkley 1997; Halekoh and Højsgaard 2014).
Using the bootstrap()
function, you can implement this procedure for a single parameter. For example, if we wish to calculate the bootstrap \(p\)-value for the class
fixed effect we first generate \(B=1000\) bootstrap samples from the reduced model without class
. In this example, we use the Wild bootstrap to illustrate that we are not restricted to a parametric bootstrap.
Next, we refit the full model, jsp_mod
, to each simulation and extract the \(t\)-statistic for the class
variable. Note that the function extract_t()
extracts the specified \(t\)-statistic from the coefficient table from model summary.
With the bootstrap \(t\)-statistics in hand, we can approximate the \(p\)-value using basic logical and arithmetic operators
While the above process is not particularly difficult to implement using the tools provided by lmeresampler, things get tedious if multiple parameters are of interest in this summary table. To help reduce this burden on the user, we have provided the bootstrap_pvals()
function that will add bootstrap \(p\)-values for each term in the coefficient summary table. For jsp_mod
, this is achieved in the below code chunk:
bootstrap_pvals(jsp_mod, type = "wild", B = 1000, hccme = "hc2", aux.dist = "mammen")
Bootstrap type: wild
Number of resamples: 1000
# A tibble: 4 × 5
term Estimate `Std. Error` `t value` p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 14.2 0.733 19.3 0.000999
2 mathAge8 0.639 0.0254 25.1 0.000999
3 genderM -0.357 0.340 -1.05 0.273
4 classnonmanual 0.720 0.387 1.86 0.0659
It’s important to note that running bootstraps for each term in the model is computationally demanding. To speed up the computation, you can run the command in parallel, as we discuss below in Bootstrapping in parallel.
The bootstrap can be useful during model selection.
For example, if you are comparing a full and reduced model where the reduced model has fewer random effects, a 50:50 mixture of \(\chi^2\) distributions is often used (Stram and Lee 1994); however, Pinheiro and Bates (2000) point out that this approximation is not always optimal. In this example, we explore the Machine
data set discussed by Pinheiro and Bates (2000), which consists of productivity scores for six workers on three brands of machine. This data set can be loaded from nlme:
data("Machines", package = "nlme")
Pinheiro and Bates (2000) consider two LME models for these data. The first model has a fixed effect for the machine and a random intercept for the worker.
reduced_mod <- lmer(score ~ Machine + (1 | Worker), data = Machines, REML = FALSE)
The second model has the same fixed effects structure, but adds an additional random effect for the machine within the worker.
full_mod <- lmer(score ~ Machine + (1 | Worker/Machine), data = Machines, REML = FALSE)
Pinheiro and Bates (2000) note that the approximate null distribution given by \(0.5\chi^2_0 + 0.5 \chi^2_1\) is not successful when the models are fit via maximum likelihood, and that the mixture is closer to \(0.65\chi^2_0 + 0.35 \chi^2_1\). Instead of relying on the conventional approximation, a bootstrap test can be conducted using bootstrap()
to simulate the responses from the reduced model.
To conduct this bootstrap test, we first extract the observed statistic obtained via anova()
and then generate \(B=1000\) bootstrap responses from the reduced model, fm1_machine
. Recall that specifying .refit = FALSE
returns a data frame of the simulated responses. Here, we use a residual bootstrap for illustration.
Next, we must fit both the full and reduced models to the bootstrap responses and calculate the test statistic. The user-written compare_models()
function performs this refit and calculation for given models and bootstrap responses. The control
argument for the full model was set to reduce the number of convergence warnings, since the null model had a variance component of 0 for machines within workers, so we expect warnings as we fit an expanded model.
compare_models <- function(full, reduced, newdata) {
full_mod <- refit(full, newdata,
control = lmerControl(check.conv.singular = "ignore",
check.conv.grad = "ignore"))
reduced_mod <- refit(reduced, newdata)
anova(full_mod, reduced_mod)$Chisq[2]
}
chisq_stats <- purrr::map_dbl(reduced_boot, ~compare_models(full_mod, reduced_mod, newdata = .x))
With the test statistics in hand, we can quickly calculate the \(p\)-value
(sum(chisq_stats >= observed) + 1) / (1000 + 1)
[1] 0.000999001
Our final example illustrates how the bootstrap()
function can be used for model diagnosis. Loy et al. (2017) propose using the lineup protocol to diagnose LME models, since artificial structures often appear in conventional residual plots for this model class that are not indicative of a model deficiency.
In this example, we consider the Dialyzer
data set provided by nlme. The data arise from a study characterizing the water transportation characteristics of 20 high flux membrane dialyzers, which were introduced to reduce the time a patient spends on hemodialysis (Vonesh and Carter 1992). The dialyzers were studied in vitro using bovine blood at flow rates of either 200 or 300 ml/min. The study measured the the ultrafiltration rate (ml/hr) at even transmembrane pressures (in mmHg). Pinheiro and Bates (2000) discuss modeling these data. Here, we explore how to create a lineup of residual plots to investigate the adequacy of the initial homoscedastic LME model fit by Pinheiro and Bates (2000).
Pinheiro and Bates (2000) construct a residual plot of the conditional residuals plotted against the transmembrane pressure to explore the adequacy of the fitted model (Figure 2). There appears to be increasing spread of the conditional residuals, which would indicate that the homoscedastic model is not sufficient.
To check if this pattern is actually indicative of a problem, we construct a lineup of residual plots. To do this, we must generate data from a number of properly specified models, say 19, to serve as decoy residual plots. Then, we create a faceted set of residual plots where the observed residual plot (Figure 2) is randomly assigned to a facet. To generate the residuals from properly specified models, we use the parametric bootstrap via bootstrap()
and extract a data frame containing the residuals from each bootstrap sample using hlm_resid()
from HLMdiag (Loy and Hofmann 2014).
The simulated residuals are stored in the replicates
element of the sim_resids
list. sim_resids$replicates
is a tibble containing the 19 bootstrap samples, with the the replicate number stored in the .n
column.
dplyr::glimpse(sim_resids$replicates)
Rows: 2,660
Columns: 15
$ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
$ rate <dbl> -0.8579226, 17.0489029, 34.3658738, 44.84955…
$ pressure <dbl> 0.240, 0.505, 0.995, 1.485, 2.020, 2.495, 2.…
$ `I(pressure^2)` <I<dbl>> 0.0576, 0.255025, 0.990025, 2.205225, …
$ `I(pressure^3)` <I<dbl>> 0.013824, 0.128787625, 0.985074875,…
$ `I(pressure^4)` <I<dbl>> 0.00331776, 0.065037...., 0.980149....,…
$ QB <fct> 200, 200, 200, 200, 200, 200, 200, 200, 200,…
$ Subject <ord> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3,…
$ .resid <dbl> -1.19758632, 0.65600425, -0.90104247, 1.3913…
$ .fitted <dbl> 0.3396637, 16.3928987, 35.2669162, 43.458244…
$ .ls.resid <dbl> -0.51757746, 1.23968783, -1.32231163, 0.5907…
$ .ls.fitted <dbl> -0.3403452, 15.8092151, 35.6881854, 44.25884…
$ .mar.resid <dbl> -2.1236410, -0.3100360, -2.1298323, -0.34531…
$ .mar.fitted <dbl> 1.265718, 17.358939, 36.495706, 45.194867, 4…
$ .n <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
Now, we use the lineup()
function from nullabor (Buja et al. 2009) to generate the lineup data. lineup()
will randomly insert the observed (true
) data into the samples
data, “encrypt” the position of the observed data, and print a message that you can later decrypt in the console.
library(nullabor)
lineup_data <- lineup(true = hlm_resid(dialyzer_mod), n = 19, samples = sim_resids$replicates)
dplyr::glimpse(lineup_data)
Rows: 2,800
Columns: 15
$ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
$ rate <dbl> -0.8579226, 17.0489029, 34.3658738, 44.84955…
$ pressure <dbl> 0.240, 0.505, 0.995, 1.485, 2.020, 2.495, 2.…
$ `I(pressure^2)` <I<dbl>> 0.0576, 0.255025, 0.990025, 2.205225, …
$ `I(pressure^3)` <I<dbl>> 0.013824, 0.128787625, 0.985074875,…
$ `I(pressure^4)` <I<dbl>> 0.00331776, 0.065037...., 0.980149....,…
$ QB <fct> 200, 200, 200, 200, 200, 200, 200, 200, 200,…
$ Subject <ord> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3,…
$ .resid <dbl> -1.19758632, 0.65600425, -0.90104247, 1.3913…
$ .fitted <dbl> 0.3396637, 16.3928987, 35.2669162, 43.458244…
$ .ls.resid <dbl> -0.51757746, 1.23968783, -1.32231163, 0.5907…
$ .ls.fitted <dbl> -0.3403452, 15.8092151, 35.6881854, 44.25884…
$ .mar.resid <dbl> -2.1236410, -0.3100360, -2.1298323, -0.34531…
$ .mar.fitted <dbl> 1.265718, 17.358939, 36.495706, 45.194867, 4…
$ .sample <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
With the lineup data in hand, we can create a lineup of residual plots using facet_wrap()
:
ggplot(lineup_data, aes(x = pressure, y = .resid)) +
geom_hline(yintercept = 0, color = "gray60") +
geom_point(shape = 1) +
facet_wrap(~.sample) +
theme_bw() +
labs(x = "Transmembrane pressure (dmHg)", y = "Residuals (ml/hr)")
In Figure 3, the observed residual plot is in position 13. If you can discern this plot from the field of decoys, then there is evidence that the fitted homogeneous LME model is deficient. In this case, we believe 13 is discernibly different, as expected based on the discussion in Pinheiro and Bates (2000).
Since we have discovered signs of within-group heteroscedasticity, we could reformulate our model via nlme::lme()
adding a weights
argument using varPower
, or we could utilize the Wild bootstrap, as illustrated below.
wild_dialyzer <- bootstrap(dialyzer_mod, .f = fixef, type = "wild", B = 1000,
hccme = "hc2", aux.dist = "webb")
confint(wild_dialyzer, type = "perc")
# A tibble: 10 × 6
term estimate lower upper type level
<chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 (Intercept) -16.0 -18.0 -13.9 perc 0.95
2 pressure 88.4 77.4 99.1 perc 0.95
3 I(pressure^2) -44.3 -57.1 -30.8 perc 0.95
4 I(pressure^3) 9.17 2.45 15.5 perc 0.95
5 I(pressure^4) -0.690 -1.70 0.425 perc 0.95
6 QB300 -1.26 -4.55 2.06 perc 0.95
7 pressure:QB300 0.621 -16.0 16.9 perc 0.95
8 I(pressure^2):QB300 3.15 -18.5 27.6 perc 0.95
9 I(pressure^3):QB300 0.102 -12.0 11.1 perc 0.95
10 I(pressure^4):QB300 -0.172 -1.98 1.84 perc 0.95
Bootstrapping is a computationally demanding task, but bootstrap iterations do not rely on each other so they are easy to implement in parallel. Rather than building parallel processing into lmeresampler, we created a utility function, combine_lmeresamp()
, that allows the user to easily implement parallel processing using doParallel (Microsoft and Weston 2020a) and foreach (Microsoft and Weston 2020b). The code is thus concise and simple enough for users without much experience with parallelization, while also providing flexibility to the user.
doParallel and foreach default to multicore (i.e., forking) functionality when using parallel computing on UNIX operating systems and snow-like (i.e., clustering) functionality on Windows systems. In this section, we will use clustering in our example. For more information on forking, we refer the reader to the vignette in Microsoft and Weston (2020a).
The basic idea behind clustering is to execute tasks as a “cluster” of computers. Each cluster needs to be fed in information separately, and as a consequence clustering has more overhead than forking. Clusters also need to be made and stopped with each call to foreach()
to explicitly tell the CPU when to begin and end the parallelization.
Below, we revisit the JSP example and distribute 2000 bootstrap iterations equally over two cores:
library(foreach)
library(doParallel)
set.seed(5678)
# Starting a cluster with 2 cores
no_cores <- 2
cl <- makeCluster(no_cores)
registerDoParallel(cores = no_cores)
# Run 1000 bootstrap iterations on each core
boot_parallel <- foreach(
B = rep(1000, 2),
.combine = combine_lmeresamp,
.packages = c("lmeresampler", "lme4")
) %dopar% {
bootstrap(jsp_mod, .f = fixef, type = "parametric", B = B)
}
# Stop the cluster
stopCluster(cl)
The combine_lmeresamp()
function combines the two lmeresamp
objects that are returned from the two bootstrap()
calls into a single lmeresamp
object. Consequently, working with the returned object proceeds as previously discussed.
It’s important to note that running a process on two cores does not yield a runtime that is twice as fast as running the same process on one core. This is because parallelization takes some overhead to split the processes, so while runtime will substantially improve, it will not correspond exactly to the number of cores being used. For example, the runtime for the JSP example run on a single core was
user system elapsed
23.084 1.212 24.376
and the runtime for the JSP run on two cores was
user system elapsed
23.550 1.800 12.747
These timings were generated using system.time()
on a MacBook Pro with a 2.9 GHz Quad-Core Intel Core i7 processor. In this set up, running the 2000 bootstrap iterations over two cores reduced the runtime by a factor of about 1.91, but this will vary based on the hardware and setting used.
In this paper, we discussed our implementation of five bootstrap procedures for nested, Gaussian-response LME models fit via the nlme or lme4 packages. The function in lmeresampler provides a unified interface to these procedures, allowing users to easily bootstrap their fitted LME models. In our examples, we illustrated the basic usage of the bootstrap, how it can be used to estimate functions of parameters, how it can be used for testing, and how it can be used to create simulation-based visual diagnostics. The bootstrap approach to inference is computationally intensive, so we have also demonstrated how users can bootstrap in parallel.
While this paper focused solely on the nested, Gaussian-response LME model, lmeresampler implements bootstrap procedures for a wide class of models. Specifically, the cases, residual, and parametric bootstraps can be used to bootstrap generalized LME models fit via . Additionally, the parametric bootstrap works with LME models with crossed random effects, though the results may not be optimal (Mccullagh 2000). Future development of lmeresampler will focus on implementing additional extensions, especially for crossed data structures.
We thank Spenser Steele for his contributions to the original code base of lmeresampler. We also thank the reviewers and associate editor whose comments improved the quality of this paper.
Supplementary materials are available in addition to this article. It can be downloaded at RJ-2023-015.zip
nlme, lme4, RLRsim, pbkrtest, rptR, CLME, lmeresampler, ggdist, HLMdiag, nullabor, doParallel, foreach
ChemPhys, Econometrics, Environmetrics, Finance, HighPerformanceComputing, MixedModels, OfficialStatistics, Psychometrics, Spatial, SpatioTemporal
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Loy & Korobova, "Bootstrapping Clustered Data in R using lmeresampler", The R Journal, 2023
BibTeX citation
@article{RJ-2023-015, author = {Loy, Adam and Korobova, Jenna}, title = {Bootstrapping Clustered Data in R using lmeresampler}, journal = {The R Journal}, year = {2023}, note = {https://doi.org/10.32614/RJ-2023-015}, doi = {10.32614/RJ-2023-015}, volume = {14}, issue = {4}, issn = {2073-4859}, pages = {103-120} }