bcmixed: A Package for Median Inference on Longitudinal Data with the Box–Cox Transformation

This article illustrates the use of the bcmixed package and focuses on the two main functions: bcmarg and bcmmrm. The bcmarg function provides inference results for a marginal model of a mixed effect model using the Box–Cox transformation. The bcmmrm function provides model median inferences based on the mixed effect models for repeated measures analysis using the Box–Cox transformation for longitudinal randomized clinical trials. Using the bcmmrm function, analysis results with high power and high interpretability for treatment effects can be obtained for longitudinal randomized clinical trials with skewed outcomes. Further, the bcmixed package provides summarizing and visualization tools, which would be helpful to write clinical trial reports.

Kazushi Maruo (University of Tsukuba Department of Biostatistisc, Faculty of Medicine) , Ryota Ishii (Keio University Hospital) , Yusuke Yamaguchi (Data Science, Development, Astellas Pharma Inc.) , Masahiko Gosho (University of Tsukuba)
2021-09-20

1 Introduction

Longitudinal data are often observed in medical or biological research. One of the most popular statistical models for studying longitudinal continuous outcomes is the linear mixed effect model. Several packages are available from CRAN that allow for the implementation of linear mixed effects models (e.g., nlme (Pinheiro et al. 2021), glme (Sam Weerahandi et al. 2021), lme4 (Bates et al. 2015), CLME (Jelsema and Peddada 2016), PLmixed (Rockwood and Jeon 2019), MCMCglmm (Hadfield 2010)).The linear mixed effect models assume that longitudinal outcomes follow a multivariate normal distribution. However, the distribution of the outcome is often right skewed in the medical and biological fields. Therefore, evaluating fixed effects based on the normal distribution theory may result in inefficient inferences such as power loss for some statistical tests. In addition, although a model-based mean for a certain level of the categorical exploratory variables is often estimated when applying the linear mixed effect model (e.g., the model-based mean for each treatment group of the last visit in a randomized clinical trial), the mean may be inadequate as a representative value for the skewed data.

The Box–Cox transformation (Box and Cox 1964) is often applied to skewed longitudinal data (Lipsitz et al. 2000) to reduce the skewness of a skewed outcome and apply existing statistical models based on a normal distribution. However, it is difficult to directly interpret the model mean estimated on the scale after applying some transformations to the outcome variable.

For the sake of the interpretability of the analysis results, (Maruo et al. 2015) propose a model median inference method on the original scale based on the Box–Cox transformation in the context of randomized clinical trials. (Maruo et al. 2017) extend this method to the framework of mixed effects models for repeated measures (MMRM) analysis (Mallinckrodt et al. 2001) in the context of longitudinal randomized clinical trials.

The bcmixed package (Maruo et al. 2020) contains functions to estimate model medians for longitudinal data proposed by (Maruo et al. 2017), as well as a sample data set that is used in (Maruo et al. 2017). In this package, the parameter estimation is conducted partially using the gls function in the nlme package. This paper illustrates the usage of the package with the sample data in several contexts.

The remainder of this manuscript is organized as follows: In section 2, we describe the methods proposed by (Maruo et al. 2017). Then, we illustrate the usage of the bcmixed package with the example data in section 3. Finally, our contributions are summarized in section 4.

2 Methods

We briefly introduce the method proposed by in (Maruo et al. 2017). The detailed expressions can be found in (Maruo et al. 2017).

Parameter inference for the Box–Cox model

Let the outcome be a continuous and positive value. The outcomes are measured over time for each subject \(i=1,\ldots,n\), and the number of planned measurement occasions is \(T\) (occasion index: \(t=1,\ldots,T\)). The outcome vector for the \(i\)th subject is denoted by \(\mathbf{y}_i=(y_{i1},\ldots,y_{in_i})^T\), where \(n_i\) is the number of measurements for the \(i\)th subject. We have \(n_i\le T\) because of missingness. Then, we consider applying the following model (Box and Cox 1964; Lipsitz et al. 2000): \[\label{model} \mathbf{z}_{i}=X_i\mathbf{\beta}+W_i\mathbf{b}_i+\mathbf{\epsilon}_i, \tag{1}\] where \(\mathbf{z}_i\) is the Box–Cox transformed outcome vector such that the \(j\)th element (\(j=1,\ldots,n_i\)) is denoted by \[z_{ij}=\left\{\begin{array}{ll} (y_{ij}^\lambda-1)/\lambda,&\lambda\neq 0,\\ \log y_{ij},&\lambda=0, \end{array} \right.\] and \(\lambda\) is the transformation parameter. The parameter \(\mathbf{\beta}\) is the \(p\)-dimensional vector of the fixed effect, \(\mathbf{b}_i\) is the \(q\)-dimensional vector of random effects distributed as \(MVN_q(\mathbf{0}_q, D)\), \(\mathbf{\epsilon}_i\) is the \(n_i\)-dimensional vector of random errors distributed independently as \(MVN_{n_i}(\mathbf{0}_{n_i}, \Sigma_i)\), and \(X_i\) and \(W_i\) are \(n_i \times p\) and \(n_i \times q\) design matrices relating the fixed and random effects, respectively. The random variables \(\mathbf{b}_i\) and \(\mathbf{\epsilon}_i\) are independent. From Formula ((1)), it is derived that \(\mathbf{z}_i|\lambda\sim MVN_{n_i}(X_i\mathbf{\beta}, V_i\)), where \(V_i = W_i D W_i^T + \Sigma_i\). In this paper, we consider situations where researchers have little interest in random effects and are interested in assessing fixed effects. In such cases, a simple formulation of the linear mixed effect model ((1)) can be implemented wherein the random effects are not explicitly modeled, but are rather included as part of the covariance matrix \(V_i\). We focus on such a “marginal” mean model. The covariance parameter vector in \(V=V_i\) for \(n_i=T\) (i.e., subjects with no missing values) is denoted as \(\mathbf{\alpha}=(\alpha_1,\ldots,\alpha_m)^T\). The dimension of \(\mathbf{\alpha}\), that is \(m\), depends on \(T\) and the specified covariance structure.

Let the model parameter vector be \(\mathbf{\theta}=(\lambda,\mathbf{\beta}^T,\mathbf{\alpha}^T)^T\). The maximum likelihood estimate of \(\mathbf{\theta}\) is obtained through maximizing the profile likelihood for \(\lambda\) (Lipsitz et al. 2000; Maruo et al. 2017). The model-based and robust variance estimators of the maximum likelihood estimator \(\hat{\mathbf{\theta}}\) are given by \(V_{\mathbf{\theta}}^{(M)}=\{-\hat{H}\}^{-1}\) and \(V_{\mathbf{\theta}}^{(R)}=\{-\hat{H}\}^{-1}\hat{J}\{-\hat{H}\}^{-1}\), respectively, where \(H=\frac{\partial^2}{\partial\mathbf{\theta}\partial\mathbf{\theta}^T}l(\mathbf{\theta})\), \(J=\sum_{i=1}^n\left\{\frac{\partial}{\partial\mathbf{\theta}}l_i(\mathbf{\theta})\right\}\left\{\frac{\partial}{\partial\mathbf{\theta}}l_i(\mathbf{\theta})\right\}^T\), \(l(\mathbf{\theta})\) is the likelihood function for \(n\) subjects, and \(l_i(\mathbf{\theta})\) is the likelihood function for the \(i\)th subject. The matrices \(\hat{H}\) and \(\hat{J}\) are obtained from the matrices \(H\) and \(J\) by replacing \(\mathbf{\theta}\) by \(\hat{\mathbf{\theta}}\), respectively. A robust variance estimator is an asymptotically valid estimator even when the model ((1)) is mis-specified (Cox 1961).

Model median inference

We now focus on the continuous and positive outcomes of a certain disease, and consider a situation in which the efficacy of some treatments (group index: \(g=1,...,G\)) is compared based on a randomized, parallel group clinical trial, where the total number of subjects is \(n\). The explanatory variable matrix, \(X_i\), and the fixed effect parameter, \(\mathbf{\beta}\) in model ((1)) are denoted in detail as follows. Now, \(X_i\) is given by the \(n_i\times(GT+C)\) matrix that contains the intercept, \(G-1\) group variables, \(T-1\) occasion variables, group-by-occasion interaction variables, and \(C\) covariates, where the categorical covariates are converted into dummy variables. The fixed effect parameter vector is given by \(\mathbf{\beta}=(\beta_I,\mathbf{\beta}_g^T,\mathbf{\beta}_t^T,\mathbf{\beta}_{gt}^T,\mathbf{\beta}_c^T)^T\), where \(\beta_I\), \(\mathbf{\beta}_g=(\beta_{g(1)},\ldots,\beta_{g(G-1)})^T\), \(\mathbf{\beta}_t=(\beta_{t(1)},\ldots,\beta_{t(T-1)})^T\), \(\mathbf{\beta}_{gt}=(\beta_{gt(1,1)},\beta_{gt(1,2)},\ldots,\beta_{gt(G-1,T-1)})^T\), and \(\mathbf{\beta}_c=(\beta_{c(1)},\ldots,\beta_{c(C)})^T\) are the intercept, group, occasion, group-by-occasion, and covariate parameter vectors, respectively. Although group \(G\) and occasion \(T\) is set to be at the reference level, we set \(\beta_{g(G)}=\beta_{t(T)}=\beta_{gt(G,t)}=\beta_{gt(g,T)}=0\) for the sake of description (\(g=1,\ldots,G,\) \(t=1,\ldots,T\)).

Under these settings, the model median, \(\xi_{(g,t)}\), for the treatment group \(g\) at the occasion \(t\) on the original scale is given by \[\xi_{(g,t)}=\left\{\lambda\left(\beta_I+\beta_{g(g)}+\beta_{t(t)}+\beta_{gt(g,t)}+\mathbf{x}_{\bar{c}}^T\mathbf{\beta}_c\right)+1\right\}^{1/\lambda},\] where \(\mathbf{x}_{\bar{c}}\) is the vector of the mean of each covariate for all subjects. The model median is the inverse Box–Cox transformation of the model means on the Box–Cox transformed scale. The model median can be easily interpreted because it is the estimator of the median on the original scale.

Using the delta method, the variance estimator of the maximum likelihood estimator for the model median, \(\hat{\xi}_{(g,t)}\), is given by \(V_{\xi(g,t)}^{(\cdot)}=\Delta_{\xi(g,t)}^TV_{\mathbf{\theta}}^{(\cdot)}\Delta_{\xi(g,t)},\) where \(\Delta_{\xi(g,t)}=\left.\frac{\partial}{\partial\mathbf{\theta}}\xi_{(g,t)}\right|_{\mathbf{\theta}=\hat{\mathbf{\theta}}}\). If we use \(V_{\mathbf{\theta}}^{(\cdot)}=V_{\mathbf{\theta}}^{(M)}\), we obtain the model-based variance estimator, \(V_{\xi(g,t)}^{(M)}\). On the other hand, the robust variance estimator, \(V_{xi(g,t)}^{(R)}\), is obtained if we use \(V_{\mathbf{\theta}}^{(\cdot)}=V_{\mathbf{\theta}}^{(R)}\).

Thus, the model median difference between groups \(g_1\) and \(g_2\) at occasion \(t\) is given by \(\delta_{(g_1;g_2,t)}=\xi_{(g_1,t)}-\xi_{(g_2,t)}\) and the variance estimators of the maximum likelihood estimator of the model median difference, \(\hat{\delta}_{(g_1;g_2,t)}\), is given by \(V_{\delta(g_1;g_2,t)}=(\Delta_{(g_1,t)}-\Delta_{(g_2,t)})^TV_{\mathbf{\theta}}^{(\cdot)}(\Delta_{(g_1,t)}-\Delta_{(g_2,t)})\). The model-based and robust variance estimators are obtained in the same way as the model median estimator. Using the asymptotic normality of the maximum likelihood estimator, the \(100(1-\alpha)\%\) confidence interval of \(\delta_{(g_1;g_2,t)}\) is obtained as \(\hat{\delta}_{(g_1;g_2,t)}\pm \Phi^{-1}(1-\alpha/2)\sqrt{V_{\delta(g_1;g_2,t)}^{(\cdot)}}\), where \(\Phi^{-1}(\cdot)\) is the quantile function of the standard normal distribution. The Wald-type test for the null hypothesis, \(\delta_{(g_1;g_2,t)}=0\), is performed with the test statistic \(t=\hat{\delta}_{(g_1;g_2,t)}/\sqrt{V_{\delta(g_1;g_2,t)}^{(\cdot)}}\).

The performances of the above-mentioned inference procedures may be low for a small sample because these are based on the asymptotic properties of the maximum likelihood estimation. Therefore, (Maruo et al. 2017) applied the following empirical small sample adjustment for the inferences of the model median differences by referring to the study in (Schluchter and Elashoff 1990).

They provide an adjusted standard error (SE) for the median difference as \(\sqrt{M/(M-p)V_{\delta(g_1;g_2,t)}^{(\cdot)}}\) for the compound symmetry (CS) or the first-order auto regression (AR(1)) structure and \(\sqrt{n^*/(n^*-T)}\)
\(\times\sqrt{V_{\delta(g_1;g_2,t)}^{(\cdot)}}\) for the unstructured (UN) structure. Further, we approximate the null distribution by the \(t\) distribution where the degrees of freedom for the CS or the AR(1) structure and the UN structure are \((n-G)(T-1)-m\) and \(n^*-T\), respectively.

Although (Maruo et al. 2017) and (Maruo et al. 2020) focus only on the three covariance structures, these three options are sufficient in the applied settings because the MMRM analysis is often applied in the following steps. The UN structure is used, and the CS or AR(1) structure with the robust variance estimation is used when the parameter estimation process is not properly converged (e.g., see (Gosho et al. 2017)).

Figure 1:  Longitudinal box-whisker plot of outcome variable (cd4) for each treatment group in the aidscd4 dataset.

3 Illustrations

In this section, we describe the bcmixed package that provides the analysis results based on the mixed effect models with the Box–Cox transformation. The package bcmixed is available from the Comprehensive R Archive Network (CRAN) at https:// CRAN.R-project.org/package=bcmixed. The package bcmixed can be installed and loaded using the following code.

R> install.packages("bcmixed")
R> library(bcmixed)

In particular, the following main functions are demonstrated in this article:

bcmarg

: Inference on the marginal model of the mixed effect model with the Box–Cox transformation.

bcmmrm

: Inference on the model median for longitudinal data in randomized clinical trials.

Example data

First, we illustrate a real example for the acquired immune deficiency syndrome (AIDS) clinical trial data, which is stored as the data frame aidscd4 in the bcmixed package. The data are from a randomized, double-blind study of AIDS patients with advanced immune suppression (cluster of differentiation 4 (CD4) cells count of less than or equal to 50 cells/mm3) (Henry et al. 1998; Fitzmaurice et al. 2011). Patients in the AIDS clinical trial group study 193A were randomized to dual or triple combinations of human immunodeficiency virus-1 reverse transcriptase inhibitors. In particular, the patients were randomized to one of four daily regimens. The original data can be downloaded from https://content.sph.harvard.edu/fitzmaur/ala/ (Datasets->AIDS Clinical Trial Group (ACTG) Study 193A). As for the more detailed data handling process, see (Maruo et al. 2017). The data frame aidscd4 contains seven variables (Table 1).

Table 1: Variable definition of AIDS clinical trial data
Variable Description
id A patient identifier; in total there are 1177 patients.
weekc A visit variable (weeks 8, 16, 24, 32).
treatment Allocated treatment regimens;
1 = zidovudine alternating monthly with 400mg didanosine,
2 = zidovudine plus 2.25mg of zalcitabine,
3 = zidovudine plus 400mg of didanosine, and
4 = zidovudine plus 400mg of didanosine plus 400mg of nevirapine.
age Patients’ age (years).
sex Patients’ sex (1 = male, 0 = female).
cd4.bl A baseline value of CD4 cells count + 1.
cd4 A CD4 cells count + 1.

Figure 1 shows the longitudinal box-whisker plot of the values of CD4 plus 1 for each group plotted using ggplot2 package (Wickham 2016). The distribution shapes of the measurements were heavily skewed.

bcmarg function

This function provides the inference results for the marginal model of the mixed effect model with the Box–Cox transformation described in Section 2.1.

Usage:

The bcmarg function is called using the following syntax.

bcmarg(formula, data, time = NULL, id = NULL, structure = "UN", 
  lmdint = c(-3, 3))

Arguments:

The bcmarg function takes arguments tabulated in Table 2.

Table 2: Auguments of bcmarg function
Argument Description
formula A two-sided linear formula object describing the model, with the response
on the left of a ~ operator and the terms, separated by + operators, on the right.
data A data frame containing the variables used in the model.
time A time variable name for repeated measurements. The default is NULL.
id A subject id variable name for repeated measurements. The default is NULL.
structure Specify the covariance structure from c("UN", "CS", "AR(1)"). The default is "UN".
lmdint A vector containing the end points of the interval to be searched for a
transformation parameter. The default is c(-3, 3).

If time and id are not specified, an ordinary linear model with the Box–Cox transformation is applied.

Value:

The bcmarg function returns an object of class "bcmarg". The objects of this class have methods for generic functions coef, logLik, vcov, fitted, print, and summary. The object includes the components for the marginal model parameter inference (Table 3).

Table 3: Values of bcmarg function
Component Description
lambda A numeric value of the estimate of the transformation parameter.
beta A vector with the estimates of the regression parameters.
alpha A vector with the estimates of the covariance parameters.
V A variance-covariance matrix for any subject with no missing values.
betainf A matrix containing the inference results for beta under the assumption
that lambda is known.
Vtheta.mod A model-based variance-covariance matrix for MLE of the vector of all
parameters: c(lambda, beta, alpha).
Vtheta.rob A robust variance-covariance matrix for MLE of the vector of all parameters.
logLik A numeric value of the maximized likelihood.
adj.prm A vector with parameters used for the empirical small sample adjustment in
bcmmrm: c(number of subjects, number of completed subjects, number of
outcome observations, number of missing observations).
glsObject An object of "gls" (or "lm" when time and id are not specified) containing
results of gls (or lm) function on the transformed scale.

In bcmarg function, lambda is estimated with the optimize function by maximizing the profile likelihood function for \(\lambda\). If an error occurs in the optimize function, problems may be solved by changing the search area for \(\lambda\), lmdint.

Example code:

We applied a marginal model to the aidscd4 data, where the fixed effects were the treatment, visit, treatment-visit interaction, and the Box–Cox transformed baseline, where the visit was treated as a nominal variable. The covariance structure was set as unstructured (default setting). This model is frequently used for MMRM analysis. A sample code is as follows. The bct.v function returns the Box–Cox transformed vector.

R> data("aidscd4")
R> aidscd4$cd4.bl.tr <- bct.v(aidscd4$cd4.bl)$transformed
R> res1 <- bcmarg(formula = cd4 ~ as.factor(treatment) * as.factor(weekc) + 
+                 cd4.bl.tr, data = aidscd4, time = weekc, id = id)

The summarized analysis results on the transformed scale are obtained by applying the summary function to the "bcmarg" object as follows.

R> summary(res1)        

Box-Cox transformed mixed model analysis
  Formula: cd4 ~ as.factor(treatment) * as.factor(weekc) + cd4.bl.tr 
  Time: weekc 
  ID: id 
  Covariance structure: "UN" 
  Data: aidscd4 
  Log-likelihood: -13322.96
  Estimated transformation parameter:  0.154

Coefficients on the transformed scale:
                                           Value Std.Error t-value p-value
(Intercept)                               1.0849    0.1249   8.684   0.000
as.factor(treatment)2                     0.2454    0.1214   2.022   0.043
as.factor(treatment)3                     0.4203    0.1212   3.468   0.001
as.factor(treatment)4                     0.7649    0.1199   6.377   0.000
as.factor(weekc)16                       -0.2043    0.0843  -2.424   0.015
as.factor(weekc)24                       -0.4498    0.0899  -5.004   0.000
as.factor(weekc)32                       -0.6750    0.0988  -6.835   0.000
cd4.bl.tr                                 0.5782    0.0200  28.922   0.000
as.factor(treatment)2:as.factor(weekc)16 -0.1183    0.1185  -0.998   0.318
as.factor(treatment)3:as.factor(weekc)16 -0.0560    0.1180  -0.475   0.635
as.factor(treatment)4:as.factor(weekc)16  0.0675    0.1175   0.574   0.566
as.factor(treatment)2:as.factor(weekc)24 -0.1863    0.1264  -1.474   0.141
as.factor(treatment)3:as.factor(weekc)24 -0.0243    0.1264  -0.193   0.847
as.factor(treatment)4:as.factor(weekc)24  0.0870    0.1263   0.689   0.491
as.factor(treatment)2:as.factor(weekc)32 -0.0852    0.1414  -0.603   0.547
as.factor(treatment)3:as.factor(weekc)32 -0.0893    0.1400  -0.638   0.524
as.factor(treatment)4:as.factor(weekc)32  0.1414    0.1381   1.024   0.306

Covariance parameters on the transformed scale:
UN(1,1) UN(1,2) UN(1,3) UN(1,4) UN(2,2) UN(2,3) UN(2,4) UN(3,3) UN(3,4) UN(4,4) 
  1.798   1.156   1.105   0.927   1.957   1.346   1.298   1.903   1.452   2.009 

Correlations on the transformed scale:
       8    16    24    32
8  1.000 0.616 0.598 0.488
16 0.616 1.000 0.697 0.655
24 0.598 0.697 1.000 0.743
32 0.488 0.655 0.743 1.000

The results of coefficients on the transformed scale are obtained with the gls function in the nlme package. The transformation parameter was estimated as 0.154, which suggested that the shape of the residual distribution was close to a multivariate log-normal distribution. All main effects were significant; however, treatment-by-week interaction was not significant at a level of 0.05. Note that inference results for beta under the assumption that the transformation parameter is known are provided. Although statistical tests would be asymptotically valid (e.g., see (Doksum and Wong 1983)), standard errors might be underestimated (e.g., see (Bickel and Doksum 1981)).

bcmmrm function

This function provides the results for the model median inferences for longitudinal randomized clinical trial data described in Section 2.2. The parameter inference is conducted by calling the bcmarg function in the bcmmrm function.

Usage:

The bcmmrm function is called using the following syntax.

bcmmrm(outcome, group, data, time = NULL, id = NULL, covv = NULL, 
  cfactor = NULL, structure = "UN", lmdint = c(-3, 3), glabel = NULL, 
  tlabel = NULL)

Argument:

The bcmmrm function takes arguments tablated in Table 4.

Table 4: Auguments of bcmmrm function
Argument Description
outcome A name of positive outcome (dependent) variable included in data.
group A name of treatment group variable included in data.
data A data frame that may include outcome, group, time, id, and specified covariate
variables.
time A name of time variable for repeated measurements included in data.
The default is NULL.
id A name of subject id variable for repeated measurements included in data.
The default is NULL.
covv A character vector with names of covariate variables included in data.
The default is NULL.
cfactor An integer vector including nominal variable indicators for covariate variables.
Nominal variable: 1, continuous variable: 0. The default is NULL.
structure Specify the covariance structure from c("UN", "CS", "AR(1)"). The default is "UN".
conf.level A numeric value of the confidence level for the confidence intervals.
The default is 0.95.
lmdint A vector containing end points of the interval to be searched for a transformation
parameter. The default is c(-3, 3).
glabel A vector of length number of treatment groups containing the labels of group
variable. The default is NULL and the levels of group variable in data are used.
tlabel A vector of length number of repeated measures containing the labels of time
variable. The default is NULL and the levels of time variable in data are used.

If time and id are not specified, inference results reduce to the results for the context of linear regression model provided by (Maruo et al. 2015).

Value:

The bcmmrm function returns an object of class "bcmmrm" representing the results of model median inference based on the Box–Cox transformed MMRM analysis. Generic functions boxplot, coef, logLik, vcov, fitted, plot, print, and summary have methods to demonstrate the results of the fit. Components tablated in Table 5 must be included in a legitimate "bcmmrm" object.

Table 5: Values of bcmmrm function
Component Description
call A list containing an image of the bcmmrm call that produced the object.
median.mod, Lists including inference results for the model medians for all groups.
median.rob, Levels of the list are time points, where the correspondence table is given
median.mod.adj, as time.tbl$code. mod: model-based inference, rob: robust inference,
median.rob.adj adj: with empirical small sample adjustment.
meddif.mod, Lists including inference results for the for the model median differences
meddif.rob, between all pairs of groups (group1 - group0). The levels of the list are
meddif.mod.adj, time points, where the correspondence table is given as time.tbl$code.
meddif.rob.adj mod: model-based inference, rob: robust inference,
adj: with empirical small sample adjustment.
lambda A numeric value of estimates of the transformation parameter.
R A correlation matrix for transformed outcomes.
betainf Inference results for beta under the assumption that lambda is known.
time.tbl A data frame of a correspondence table for the time points.
This object is used when applying the above generic functions.
group.tbl A data frame of a correspondence table for treatment groups.
This object is used when applying the above generic functions.
inf.marg A "bcmarg" object with results of the bcmarg function called in the
bcmmrm function.
outdata A data frame where the transformed outcome (ytr), the fitted values
on the transformed scale (ytr.fitted), and the residuals on
the transformed scale (res.tr) are added to input data.
conf.level A numeric value of the specified confidence level.

lambda, R, betainf, and inf.marg are obtained from the results of the bcmarg function that is called in the bcmmrm function.

Sample code:

We applied the MMRM analysis with the Box–Cox transformation described in Section 3.2 to the aidscd4 data, where the Box–Cox transformed baseline and sex were included as the covariates. The example code is as follows.

R> data("aidscd4")
R> aidscd4$cd4.bl.tr <- bct.v(aidscd4$cd4.bl)$transformed
R> res2 <- bcmmrm(outcome = cd4, group = treatment, data = aidscd4, time = weekc, 
+                id = id, covv = c("cd4.bl.tr", "sex"),  cfactor = c(0, 1), 
+                glabel = c("Zid/Did", "Zid+Zal", "Zid+Did", "Zid+Did+Nev")

The transformed baseline and sex are continuous and categorical variables, respectively, and therefore, cfactor was set as c(0,1).

The print function only provides information about model detail, the estimated transformation parameter, the maximized log-likelihood, and the model median estimate for each time point and group as follows.

R> print(res2)

Model median estimation based on MMRM with Box-Cox transformation
  Outcome: cd4 
  Group: treatment 
  Time: weekc 
  ID: id 
  Covariate(s): c("cd4.bl.tr", "sex") 
  Covariance structure: "UN" 
  Data: aidscd4 
  Estimated transformation parameter:  0.154 
  Log-likelihood: -13322.36

Model median estimates (row: group, col: time):
  treatment | weekc    8   16   24   32
1           Zid/Did 18.9 16.5 14.1 12.1
2           Zid+Zal 22.0 17.9 14.6 13.5
3           Zid+Did 24.5 20.9 18.2 15.1
4       Zid+Did+Nev 30.1 27.8 24.2 21.8

The summary function provides more detailed analysis results as follows.

R> summary(res2)

Model median inference based on MMRM with the Box-Cox transformation

Data and variable information:
  Outcome: cd4 
  Group: treatment 
  Time: weekc 
  ID: id 
  Covariate(s): c("cd4.bl.tr", "sex") 
  Data: aidscd4 

Analysis information:
  Covariance structure: "UN" 
  Robust inference: TRUE 
  Empirical small sample adjustment: TRUE 
  Confidence level: 0.95 

Analysis results:
  Estimated transformation parameter:  0.154 

 
Model median inferences for weekc = 8 
 
    treatment median    SE lower.CL upper.CL
1     Zid/Did   18.9 0.862     17.2     20.6
2     Zid+Zal   22.0 1.124     19.8     24.2
3     Zid+Did   24.5 1.465     21.6     27.4
4 Zid+Did+Nev   30.1 1.597     27.0     33.3

(...Omitted for weeks 16 and 24...)

Model median inferences for weekc = 32 
 
    treatment median    SE lower.CL upper.CL
1     Zid/Did   12.1 0.662     10.8     13.4
2     Zid+Zal   13.5 0.813     11.9     15.1
3     Zid+Did   15.1 1.019     13.1     17.1
4 Zid+Did+Nev   21.8 1.376     19.1     24.5

 
Inferences of model median difference between groups 
  ( treatment 1 - treatment 0 ) for weekc = 8 
 
  treatment 1 treatment 0 delta   SE lower.CL upper.CL t.value p.value
1     Zid+Zal     Zid/Did  3.12 1.40    0.363     5.87    2.22   0.027
2     Zid+Did     Zid/Did  5.64 1.69    2.325     8.96    3.34   0.001
3 Zid+Did+Nev     Zid/Did 11.25 1.80    7.711    14.80    6.24   0.000
4     Zid+Did     Zid+Zal  2.53 1.83   -1.059     6.12    1.39   0.167
5 Zid+Did+Nev     Zid+Zal  8.14 1.93    4.349    11.93    4.22   0.000
6 Zid+Did+Nev     Zid+Did  5.61 2.16    1.372     9.85    2.60   0.010

(...Omitted for weeks 16 and 24...)
 
Inferences of model median difference between groups 
  ( treatment 1 - treatment 0 ) for weekc = 32 
 
  treatment 1 treatment 0 delta   SE lower.CL upper.CL t.value p.value
1     Zid+Zal     Zid/Did  1.35 1.04   -0.692     3.40    1.30   0.194
2     Zid+Did     Zid/Did  3.00 1.20    0.633     5.37    2.49   0.013
3 Zid+Did+Nev     Zid/Did  9.70 1.52    6.705    12.69    6.37   0.000
4     Zid+Did     Zid+Zal  1.65 1.30   -0.907     4.20    1.27   0.206
5 Zid+Did+Nev     Zid+Zal  8.34 1.60    5.206    11.48    5.23   0.000
6 Zid+Did+Nev     Zid+Did  6.70 1.71    3.338    10.06    3.92   0.000

Significant differences were detected for the following pairs Zid+Did vs. Zid/Did, Zid+Did+Nev vs. Zid/Did, Zid+Did+Nev vs. Zid+Zal, and Zid+Did+Nev vs. Zid+Did at week 32.

The summary function provides the results using the robust variance and the small sample adjustment in the default settings. If users want to summarize results using the model variance and not using the small sample adjustment, specify summary(bcmmrmObject, robust = F, ssadjust = F). Further details of the summary function for the "bcmmrm" object can be obtained with ?summary.bcmmrm.

The inference results for the median differences at week 32 (fourth visit) can also be called as follows although the levels of the group variable in the data frame are used without formatting.

R> res2$meddif.rob.adj[[4]]

  group1 group0    delta       SE   lower.CL  upper.CL  t.value      p.value
1      2      1 1.354438 1.041338 -0.6922404  3.401117 1.300672 1.940595e-01
2      3      1 3.000942 1.204919  0.6327547  5.369129 2.490575 1.312570e-02
3      4      1 9.697631 1.522831  6.7046091 12.690653 6.368159 4.869820e-10
4      3      2 1.646503 1.299231 -0.9070469  4.200054 1.267291 2.057293e-01
5      4      2 8.343192 1.596236  5.2058987 11.480486 5.226792 2.682862e-07
6      4      3 6.696689 1.709027  3.3377114 10.055667 3.918421 1.034880e-04

The "bcmmrm" object can be plotted with the plot function as follows (Figure 2).

graphic without alt text
Figure 2: Longitudinal plot of model median for each group, created by applying plot function to bcmmrm object.
R> plot(res2, xlab = "Week", ylab = "CD4+1")

The plot function provides a longitudinal plot in the default settings. However, a plot at a specified time point can be drawn with the following code (Figure 3):

R> plot(res2, timepoint = 32, xlab = "Treatment", ylab = "CD4+1", col = 1:4)
graphic without alt text
Figure 3: Plot of model median for each group at week 32, created by applying plot function to bcmmrm object with timepoint option.

Further, the plot function provides the results using the robust variance and the small sample adjustment in the default setting. Many other options such as main and legend can be used in the plot function. Further details can be obtained with ?plot.bcmmrm.

A diagnosis of a model fitting can be conducted with the boxplot function, which provides a box-whisker plot of the Box–Cox transformed residuals for each group. A sample code is provided as follows (Figure 4):

R> boxplot(res2)
graphic without alt text
Figure 4: Box-whisker plot of transformed residuals for each group at week 32, created by applying boxplot function to bcmmrm object.

The shape of the transformed residual for each group is not skewed and the median and mean are close to each other, which suggests that the median would not be biased at week 32. The boxplot function provides the results at the last time points in the default settings. A box-whisker plot at another time point can be obtained by specifying boxplot(bcmmrmObject, timepoint = xx).

4 Summary and discussion

We demonstrated the use of the bcmixed package. The bcmarg function provides the analysis results for a marginal model of a mixed effect model with the Box–Cox transformation. The results for the statistical test of the bcmarg function are meaningful. However, it is difficult to interpret coefficients (\(\mathbf{\beta}\)) on the transformed scale.

The bcmmrm function provides the model median inferences based on the MMRM with the Box–Cox transformation for longitudinal randomized clinical trials. Using this function, treatment effects can be interpreted as median differences between treatment groups at specified time points. (Maruo et al. 2017) also show that this method controls the type I error of the statistical test for the model median difference, and it has moderate or high performance for power compared with the existing methods (ordinary MMRM, MMRM for the log-transformed outcome, etc.) from their simulation studies (the simulation program is provided in https://github.com/kzkzmr/Maruo_2017_StatMed_Simulation with penalized power results proposed by (Cavus et al. 2019)). Thus, bcmmrm function analysis results with high power and high interpretability for longitudinal randomized clinical trials with skewed outcomes. Further, the bcmmrm function provides summarizing and visualization tools, which would be helpful to write clinical trial reports.

Although the bcmixed package can be used for data other than that of randomized clinical trials, the performances of these methods have not been evaluated well yet. Therefore, users should use them carefully. Although a model fitting can be diagnosed with the boxplot function, more model diagnosis tools may be implemented in the future. Users might apply the MissMech package(Jamshidian et al. 2014, 2015), which diagnoses multivariate normality and heteroscedasticity, to the transformed residuals stored in "bcmarg" object. Note that the MissMech package assumes missing mechanisms are missing completely at random (MCAR), and statistical tests for model fittings may lead to significant results for a medium-to-large sample even when the models fit the data adequately.

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number JP19K11849.

CRAN packages used

bcmixed, nlme, glme, lme4, CLME, PLmixed, MCMCglmm, ggplot2, MissMech

CRAN Task Views implied by cited packages

Bayesian, ChemPhys, Econometrics, Environmetrics, Finance, MixedModels, OfficialStatistics, Phylogenetics, Psychometrics, Spatial, SpatioTemporal, Survival, TeachingStatistics

Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

D. Bates, M. Mächler, B. Bolker and S. Walker. Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1): 1–48, 2015. DOI 10.18637/jss.v067.i01.
P. J. Bickel and K. A. Doksum. An analysis of transformations revisited. Journal of the American Statistical Association, 76: 296–311, 1981. URL https://doi.org/10.1080/01621459.1981.10477649.
G. E. P. Box and D. R. Cox. An analysis of transformations. Journals of the Royal Statistical Society Series B, 26: 211–246, 1964. URL https://doi.org/10.1111/j.2517-6161.1964.tb00553.x.
M. Cavus, B. Yazici and A. Sezer. Penalized power approach to compare the power of the tests when type i error probabilities are different. Communications in Statistics - Simulation and Computation, 0(0): 1–15, 2019. DOI 10.1080/03610918.2019.1588310.
D. R. Cox. Tests of separate families of hypotheses. In Proceeding of 4th berkley symposium 1, pages. 105–123 1961. California: University of California Press.
K. A. Doksum and C. W. Wong. Statistical tests based on transformed data. Journal of the American Statistical Association, 78: 411–417, 1983. URL https://doi.org/10.1080/01621459.1983.10477986.
G. M. Fitzmaurice, N. M. Laird and J. H. Ware. Applied longitudinal analysis, 2nd edition. New York: Wiley, 2011. URL https://doi.org/10.1002/9781119513469.
M. Gosho, A. Hirakawa, H. Noma, K. Maruo and Y. Sato. Comparison of bias-corrected covariance estimators for MMRM analysis in longitudinal data with dropouts. Statistical Methods in Medical Research, 26: 2389–2406, 2017. URL https://doi.org/10.1177/0962280215597938.
J. D. Hadfield. MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. Journal of Statistical Software, 33(2): 1–22, 2010. URL https://www.jstatsoft.org/v33/i02/.
K. Henry, A. Erice, C. Tierney, H. H. Balfour Jr., M. A. Fischl, A. Kmack, S. H. Liou, A. Kenton, M. S. Hirsch, J. Phair, et al. A randomized, controlled, double-blind study comparing the survival benefit of four different reverse transcriptase inhibitor therapies (three-drug, two-drug, and alternating drug) for the treatment of advanced AIDS. Journal of Acquired Immune Deficiency Syndromes and Human Retrovirology, 19: 339–349, 1998. URL https://doi.org/10.1097/00042560-199812010-00004.
M. Jamshidian, S. Jalal and C. Jansen. : An r package for testing homoscedasticity, multivariate normality, and missing completely at random (MCAR). Journal of Statistical Software, 56(6): 2014. URL https://doi.org/10.18637/jss.v056.i06.
M. Jamshidian, S. Jalal and C. Jansen. Package ’:’. 2015. URL https://CRAN.R-project.org/package=MissMech. R package version 1.0.2.
C. M. Jelsema and S. D. Peddada. CLME: An R package for linear mixed effects models under inequality constraints. Journal of Statistical Software, 75(1): 1–32, 2016. DOI 10.18637/jss.v075.i01.
S. R. Lipsitz, J. Ibrahim and G. Molenberghs. Using a BoxCox transformation in the analysis of longitudinal data with incomplete responses. Journal of the Royal Statistical Society, Series C, 49: 287–296, 2000. URL https://doi.org/10.1111/1467-9876.00192.
C. H. Mallinckrodt, W. S. Clark and S. R. David. Accounting for dropout bias using mixed-effects models. Journal of Biopharmaceutical Statistics, 11: 9–21, 2001. URL https://doi.org/10.1081/BIP-100104194.
K. Maruo, R. Ishii, Y. Yamaguchi and M. Gosho. Package ’’. 2020. URL https://CRAN.R-project.org/package=bcmixed. R package version 1.0.
K. Maruo, N. Isogawa and M. Gosho. Inference of median difference based on the BoxCox model in randomized clinical trials. Statistics in Medicine, 34: 1634–1644, 2015. URL https://doi.org/10.1002/sim.6408.
K. Maruo, N. Yamaguchi, H. Noma and M. Gosho. Interpretable inference on the mixed effect model with the BoxCox transformation. Statistics in Medicine, 36: 2420–2434, 2017. URL https://doi.org/10.1002/sim.7279.
J. Pinheiro, D. Bates, S. DebRoy, D. Sarkar and R Core Team. nlme: Linear and nonlinear mixed effects models. 2021. URL https://CRAN.R-project.org/package=nlme. R package version 3.1-152.
N. Rockwood and M. Jeon. Estimating complex measurement and growth models using the R package PLmixed. Multivariate Behavioral Research, 54(2): 288–306, 2019.
Sam Weerahandi, Berna Yazici, Ching-Ray Yu and Mustafa Cavus. Glme: Generalized linear mixed effects models. 2021. URL https://CRAN.R-project.org/package=glme. R package version 0.1.0.
M. D. Schluchter and J. D. Elashoff. Small-sample adjustments to tests with unbalanced repeated measures assuming several covariance structures. Journal of Statistical Computation and Simulation, 37: 69–87, 1990. URL https://doi.org/10.1080/00949659008811295.
H. Wickham. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016. URL https://ggplot2.tidyverse.org.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Maruo, et al., "bcmixed: A Package for Median Inference on Longitudinal Data with the Box--Cox Transformation", The R Journal, 2021

BibTeX citation

@article{RJ-2021-083,
  author = {Maruo, Kazushi and Ishii, Ryota and Yamaguchi, Yusuke and Gosho, Masahiko},
  title = {bcmixed: A Package for Median Inference on Longitudinal Data with the Box--Cox Transformation},
  journal = {The R Journal},
  year = {2021},
  note = {https://rjournal.github.io/},
  volume = {13},
  issue = {2},
  issn = {2073-4859},
  pages = {253-265}
}