Fitting a Quantile Regression Model for Residual Life with the R Package qris

Kyu Hyun Kim; Sangwook Kang; Sy Han Chiou

doi:10.32614/RJ-2024-007

1 Introduction

In the analysis of time-to-event data, standard statistical inference procedures often focus on quantities based on failure time and its relationship with covariates measured at baseline. However, throughout the follow-up process, inference procedures based on residual life become increasingly intuitive for assessing the survival of subjects and can offer insights into the effectiveness of treatments in prolonging the remaining lifetime. As covariates can substantially change over time and models based solely on baseline covariates have limited potential for long-term prognosis, there is a growing interest in modeling the remaining lifetime of a surviving subject with updated patient information. Many efforts have been made to model the mean residual life including proportional mean residual life models (Oakes and Dasu 1990, 2003; Maguluri and Zhang 1994; Chen et al. 2005), additive mean residual life models (Chen and Cheng 2006; Chen 2007; Zhang et al. 2010), and proportional scaled mean residual life models (Liu and Ghosh 2008). Given that failure times are usually right-skewed and heavy-tailed, the mean of the residual life might not be identifiable if the follow-up time is not sufficiently long. For this reason, quantiles, which are robust under skewed distribution, have traditionally been used more frequently as alternative summary measures. For example, the approach on the semiparametric quantile regression model for continuous responses (Koenker and Bassett Jr 1978) has been extended to uncensored failure time data (Jung 1996; Portnoy and Koenker 1997; Wei et al. 2006) and censored failure times data (Ying et al. 1995; Portnoy 2003; Peng and Huang 2008; Huang 2010).

When the outcome variable is the residual life, semiparametric quantile models that apply the inverse probability of censoring weighting (IPCW) principle to address right-censored observations have been explored (Jung et al. 2009; Kim et al. 2012; Li et al. 2016). These approaches are based on non-smooth estimating functions with respect to regression parameters, and the estimates of the regression parameters are obtained either through zero-crossing of non-smooth estimating functions using grid search techniques (Jung et al. 2009) or by optimizing non-smooth objective functions with \(L_1\)-minimization algorithms (Kim et al. 2012; Li et al. 2016). While these methods are relatively straightforward to implement, an additional challenge lies in standard error estimation, which necessitates the computationally intensive use of a multiplier bootstrap method (Li et al. 2016). Alternatively, Jung et al. (2009) and Kim et al. (2012) utilized the minimum dispersion statistic and the empirical likelihood method, respectively, to bypass the need to directly estimate the variance of the regression parameter estimator for hypothesis testing and constructing confidence intervals. The non-smooth nature of the estimating functions in these approaches precludes the estimation of variance using the robust sandwich-type variance estimator typically employed in equation-based estimation methods. To lessen the associated computational burden, an induced smoothing was proposed (Brown and Wang 2005), which modifies the non-smooth estimating equations into smooth ones. Leveraging the asymptotic normality of the non-smooth estimator, the smooth estimating functions are constructed by averaging out the random perturbations inherent in the non-smooth estimating functions. The resulting estimating functions become smooth with respect to the regression parameters, allowing for the straightforward application of standard numerical algorithms, such as the Newton-Raphson method. Furthermore, these smoothed estimating functions facilitate the straightforward computation of variances using the robust sandwich-type estimator. The induced smoothing approach has been employed in fitting semiparametric accelerated failure time (AFT) models via the rank-based approach (Johnson and Strawderman 2009; Chiou et al. 2015a, 2021; Kang 2017). Regarding quantile regression, Choi et al. (2018) considered the induced smoothing approach under a competing-risks setting. All of these methods are based on modeling event times. Recently, Kim et al. (2023) proposed an induced smoothing estimator for fitting a semiparametric quantile regression model for residual life.

The availability of published R packages for fitting quantile regression models is somewhat limited. The rq(), nlrq(), rqss(), and crq() functions in the package quantreg (Koenker 2022) are predominantly used and provide various features for fitting linear, nonlinear, non-parametric, and censored quantile regression models, respectively. The rq() function minimizes non-smooth objective functions to obtain point estimates of regression coefficients and can accommodate right-censored survival times by incorporating weights. By redefining survival times as the remaining lifetime at time \(t_0\), one can also obtain a non-smoothed estimator for quantile regression models for residual life (Kim et al. 2012). On the other hand, the nlrq() function is designed to fit a nonlinear quantile regression model, while the rqss() function fits additive quantile regression models with nonparametric terms, including univariate components and bivariate components, using smoothing splines and total variation regularization techniques (Koenker et al. 1994; Koenker and Mizera 2004). Furthermore, the crq() function fits a quantile regression model for censored data on the \(\tau\)-th conditional quantile function of the response variable. Overall, the quantreg implements three methods for handling right-censored survival times: Powell (1986)’s estimator, Portnoy (2003)’s estimator and Peng and Huang (2008)’s estimator. However, none of the implemented methods in the nlrq(), rqss(), or crq() functions are applicable for handling censored residual life using the induced smoothing methods. The only function that implements the induced smoothing method is the aftsrr() function in the package aftgee (Chiou et al. 2021), but it is specifically designed for fitting semiparametric AFT models, which are not directly applicable to fitting quantile regression models.

Other R packages that can be used to fit quantile regression models for survival data include the package ctqr (Frumento 2021), package Brq (Alhamzawi 2020), package brms (Bürkner 2018), and package cmprskQR (Dlugosz et al. 2019). The ctqr() function in the package ctqr implements the methods proposed in Frumento (2021) for right or interval-censored failure times with left-truncation. The Bqr() function in the package Brq implements Bayesian methods based on the asymmetric Laplace distribution. In the package brms, the brm() function with the family=asym_laplace() option enables the implementation of full Bayesian inference. The crrQR() function in the package cmprskQR allows fitting quantile regression models with competing risks. All of these R packages are designed for fitting quantile regression models for failure times defined from a baseline and are not applicable to the residual life setting.

The recently developed R package qris (Kim et al. 2022) provides an efficient tool for fitting semiparametric quantile regression models for residual life subject to right censoring. The qris package offers three methods for estimating the regression parameters: \(L_1\)-minimization of non-smooth objective functions, induced smoothing with a non-iterative approach, and an iterative procedure. For standard error estimation, the qris package provides two resampling-based approaches: the partial multiplier bootstrap and the full multiplier bootstrap methods. The partial multiplier bootstrap method utilizes the robust sandwich-type estimator by incorporating the sample variance of perturbed estimating functions, while the full multiplier bootstrap method is obtained by considering the sample variance from the solutions of perturbed estimating functions. To enhance the interpretability of results, the qris package incorporates graphical visualizations of covariate effects at different quantiles and base times, utilizing the plotting environment similar to that in the ggplot2 package (Wickham et al. 2022), thereby allowing for extensive flexibility and customization. The ultimate goal of creating the qris package is to facilitate the easy incorporation of quantile regression for residual life into daily routines. The package qris is available on the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=qris.

The rest of the article is organized as follows: Section 2 introduces a semiparametric regression model for quantiles of residual life and the estimation methods implemented in the package. Section 3 provides details about computing algorithms. Illustrations of the package using a simulated dataset and the real data from the North Central Cancer Treatment Group are presented in Section 4. Finally, in Section 5, concluding remarks are provided along with some discussions.

2 Semiparametric quantile regression for residual life

Define \(T\) as the potential failure time that is subject to right censoring by \(C\) and \(\mathbf{X}\) as a \(p \times 1\) vector of covariates, where \(p\) is the number of covariates, including an intercept. The observed data consists of \(n\) independent copies of \((Z, \delta, \mathbf{X})\), where \(Z = \min(T, C)\), \(\delta = I(T \leq C)\), and \(I(\cdot)\) is an indicator function. We also assume \(T\) and \(C\) are marginally independent. Define the \(\tau\)-th quantile of the residual life at \(t_0 > 0\) as \(\theta_{\tau}(t_0)\) that satisfies \(P(T_i - t_0 \geq \theta_{\tau}(t_0) \ | \ T_i > t_0) = 1 - \tau\). We consider the semiparametric quantile regression model for the residual life (Kim et al. 2012; Kim et al. 2023). Given \(T_i > t_0\), \[\label{qr:mod1} \log(T_i - t_0) = \mathbf{X}_{i}^{\top}\boldsymbol{\mathbf{\beta}}_0(\tau, t_0) + \epsilon_i, i = 1, \ldots, n, %\label{qr:mod2} \tag{1}\] where \(\boldsymbol{\mathbf{\beta}}_0(\tau, t_0)\) is a \(p \times 1\) vector of regression coefficients, and \(\epsilon_i\) is a random error having zero \(\tau\)-th quantile. The quantile regression model for a continuous response (Koenker and Bassett Jr 1978) is a special case of Equation (1) when \(t_0 = 0\). For ease of notation, we omit \(\tau\) and \(t_0\) in \(\boldsymbol{\mathbf{\beta}}_0(\tau, t_0)\) and \(\theta_{\tau}(t_0)\) and write \(\boldsymbol{\mathbf{\beta}}_0\) and \(\theta\). We present different estimation procedures to estimate \(\boldsymbol{\mathbf{\beta}}_0\) given \(\tau\) and \(t_0\) in the following.

Estimation using non-smooth functions

When there is no censoring, an estimator for \(\beta_0\) in Equation (1) can be obtained by solving the estimating equation (Kim et al. 2012), where \[\label{eq:ns:obj1} \frac{1}{n}\sum_{i=0}^{n}I[T_i \ge t_0] \mathbf{X}_i \left\{I\left[\log(T_i - t_0) \leq \mathbf{X}_i^{\top}\boldsymbol{\mathbf{\beta}} \right] - \tau \right\} = 0. \tag{2}\] However, Equation (2) cannot be directly used when \(T_i - t_0\) is subject to right censoring. The IPCW technique can be incorporated into Equation (2) to account for the right censoring (Li et al. 2016). Specifically, in the presence of right censoring, the estimator for \(\boldsymbol{\mathbf{\beta}}_0\) in Equation (1) can be obtained as the root of the following weighted estimating equations: \[\label{eq:nsm:ipw} U_{t_0}(\boldsymbol{\mathbf{\beta}}, \tau) = \frac{1}{n}\sum_{i=1}^{n}I[Z_i \ge t_0] \mathbf{X}_i \left\{I \left[\log(Z_i - t_0) \leq \mathbf{X}_i^{\top} \boldsymbol{\mathbf{\beta}} \right]\frac{\delta_i}{\widehat{G}(Z_i)/\widehat{G}(t_0)} -\tau \right\}, \tag{3}\] where \(\widehat{G}(\cdot)\) is the Kaplan-Meier estimate of the survival function \(G(\cdot)\) of the censoring time \(C\) and \(\widehat{G}(t) = \prod_{i: t_i \leq t} (1 - \sum_{j=1}^n (1 - \delta_j)I(Z_j \leq t_i) / \sum_{j=1}^n I(Z_j \geq t_i))\). A computational challenge arises because the exact solution to Equation (3) might not exist due to the non-smoothness in \(\beta\) caused by the involvement of indicator functions. When the exact solutions do not exist, the root of Equation (3) can be approximated by minimizing the \(L_1\)-objective function \(L_{t_0}(\boldsymbol{\mathbf{\beta}}, \tau)\) (Li et al. 2016), \[\begin{aligned} \label{l1:nsm} \nonumber L_{t_0}(\boldsymbol{\mathbf{\beta}}, \tau) = & \frac{1}{n}\sum_{i=1}^n \frac{\delta_i I[Z_i > t_0]}{\widehat{G}(Z_i)/\widehat{G}(t_0)} \left| \log(Z_i - t_0) - \mathbf{X}_i^{\top}\beta \right| + \\ & \left| M - \boldsymbol{\mathbf{\beta}}^{\top}\sum_{l=1}^n - \mathbf{X}_l \frac{\delta_l I[Z_l > t_0]}{\widehat{G}(Z_l)/\widehat{G}(t_0)}\right| + \ \left| M - \boldsymbol{\mathbf{\beta}}^{\top}\sum_{l=1}^n 2\tau \mathbf{X}_l I[Z_l > t_0]\right|, \end{aligned} \tag{4}\] where \(M > 0\) bounds \(\left| \boldsymbol{\mathbf{\beta}}^{\top}\sum_{i=1}^n - \mathbf{X}_i \frac{\delta_i I[Z_i > t_0]}{\widehat{G}(Z_i)/ \widehat{G}(t_0)}\right|\) and \(\left| \boldsymbol{\mathbf{\beta}}^{\top}\sum_{i=1}^n 2\tau \mathbf{X}_i I[Z_i > t_0]\right|\) from above. Numerically, the limit \(M\) is set to be an extremely large number, and the qris() function uses \(M = 10^6\). Denote the resulting estimator to be \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny NS}\). It has been shown that \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny NS}\) is consistent for \(\boldsymbol{\mathbf{\beta}}_0\) and asymptotically normally distributed (Li et al. 2016).

Despite the well-established asymptotic properties, directly estimating the variance of \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny NS}\) is impractical because it involves the derivative of non-smooth functions. A multiplier bootstrap method has typically been employed (Li et al. 2016) to address this difficulty. The multiplier bootstrap method considers the perturbed version of \(U_{t_0}(\beta, \tau)\), defined as \[\label{eq:nsm:rev} U_{t_0}^{\ast}(\beta, \tau) = \frac{1}{n}\sum_{i=1}^{n} \eta_i I[Z_i \ge t_0] \mathbf{X}_i \left\{I \left[\log(Z_i - t_0) \leq \mathbf{X}_i^{\top} \boldsymbol{\mathbf{\beta}} \right]\frac{\delta_i}{\widehat{G}^{\ast}(Z_i)/\widehat{G}^{\ast}(t_0)} -\tau \right\}, \tag{5}\] where \(\eta_i, i = 1, \ldots, n,\) are independently and identically (iid) generated from a positive random variable with unity mean and variance, and \(\widehat{G}^\ast(\cdot)\) is a perturbed version of \(\widehat{G}(\cdot)\), constructed as \(\widehat{G}^\ast(t) = \prod_{i: t_i \leq t} (1 - \sum_{j=1}^n \eta_j(1 - \delta_j)I(Z_j \leq t_i) / \sum_{j=1}^n \eta_jI(Z_j \geq t_i))\) for a given realization of \(\eta_i\). On the other hand, a perturbed \(L_1\)-objective function, denoted as \(L_{t_0}^{\ast}(\boldsymbol{\mathbf{\beta}}, \tau)\), can be similarly constructed, where \[\begin{aligned} L_{t_0}^{\ast}(\boldsymbol{\mathbf{\beta}}, \tau) = & \frac{1}{n}\sum_{i=1}^n \frac{\delta_i I[Z_i > t_0]}{\widehat{G}^{\ast}(Z_i)/\widehat{G}^{\ast}(t_0)} \left| \log(Z_i - t_0) - \mathbf{X}_i^{\top}\boldsymbol{\mathbf{\beta}} \right| + \nonumber \\ & \left| M - \boldsymbol{\mathbf{\beta}}^{\top}\sum_{l=1}^n - \mathbf{X}_l \frac{\delta_l I[Z_l > t_0]}{\widehat{G}^{\ast}(Z_l)/\widehat{G}^{\ast}(t_0)}\right| + \ \left| M - \beta^{\top}\sum_{l=1}^n 2\tau \mathbf{X}_l \eta_l I[Z_l > t_0]\right|. \end{aligned}\] Solving for \(U_{t_0}^{\ast}(\boldsymbol{\mathbf{\beta}}, \tau) = 0\), or equivalently, minimizing \(L_{t_0}^{\ast}(\boldsymbol{\mathbf{\beta}}, \tau)\), yields one realization of \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny NS}\). The multiplier bootstrap variance is computed as the sample variance of a large number of realizations of \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny NS}\).

Estimation using induced smoothed functions

The regression coefficient in Equation (1) can be more efficiently obtained through the induced smoothed version of Equation (3). The induced smoothed estimating functions are constructed by taking the expectation with respect to a mean-zero random noise added to the regression parameters in Equation (3). Specifically, \[\begin{aligned} \label{eq:is} \widetilde{U}_{t_0}(\boldsymbol{\mathbf{\beta}}, \tau, H) & = E_w \{U_{t_0}(\boldsymbol{\mathbf{\beta}}+\mathbf{H}^{1/2}\mathbf{W}, \tau)\}\nonumber\\ & = \frac{1}{n} \sum_{i=1}^{n} I[Z_i > t_0] \mathbf{X}_i \left\{ \Phi\left(\frac{\mathbf{X}_i^\top\boldsymbol{\mathbf{\beta}}-\log(Z_i-t_0)}{\sqrt{\mathbf{X}_i^{\top} \mathbf{H} \mathbf{X}_{i}}}\right)\frac{\delta_i}{\widehat{G}(Z_i)/\widehat{G}(t_0) } -\tau \right\}, \end{aligned} \tag{6}\] where \(\mathbf{H} = O(n^{-1})\), \(\mathbf{W} \sim N(0, \mathbf{I}_p)\) is a standard normal random vector, \(\mathbf{I}_p\) is the \(p \times p\) identity matrix, and \(\Phi(\cdot)\) is the cumulative distribution function of a standard normal random variable. A typical choice for \(\mathbf{H}\) is to fix it at \(n^{-1}\mathbf{I}_p\), while some alternative choices are explored in Chiou et al. (2015b). Let \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS}\) be the solution to \(\widetilde{U}_{t_0}(\boldsymbol{\mathbf{\beta}}, \tau, \mathbf{H}) = 0\). Since Equation (6) is a smooth function in \(\boldsymbol{\mathbf{\beta}}\), the estimator can be obtained using standard numerical algorithms such as the Newton-Raphson method. Moreover, the induced smoothed estimator for \(\boldsymbol{\mathbf{\beta}}_0\) has been shown to be asymptotically equivalent to its non-smooth counterpart (Kim et al. 2023).

Following the idea in Section 2.1, the multiplier bootstrap procedure can be similarly employed to estimate the variance of \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS}\). The perturbed version of Equation (6) takes the form of \[\label{eq:7} \widetilde{U}^{\ast}_{t_0}(\boldsymbol{\mathbf{\beta}}, \tau, \mathbf{H}) = \frac{1}{n} \sum_{i=1}^{n} \eta_i I[Z_i > t_0] \mathbf{X}_i \left\{ \Phi\left(\frac{\mathbf{X}_i^\top\boldsymbol{\mathbf{\beta}} - \log(Z_i-t_0)}{\sqrt{\mathbf{X}_i^{\top} \mathbf{H} \mathbf{X}_{i}}}\right)\frac{\widehat{G}^{\ast}(t_0) \delta_i}{\widehat{G}^{\ast}(Z_i)} -\tau \right\}. \tag{7}\] The multiplier bootstrap procedure estimates the variance of \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS}\) by calculating the sample variance of a large number of realizations of \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS}\) obtained by repeatedly solving Equation (7).

It has been shown that the asymptotic variance \(\mathop{\rm Var}\nolimits(\boldsymbol{\mathbf{\beta}}, \tau)\) can be decomposed into \(\mathbf{A}(\boldsymbol{\mathbf{\beta}})^{\top} \mathbf{V}(\boldsymbol{\mathbf{\beta}}) \mathbf{A}(\boldsymbol{\mathbf{\beta}})\) (Kim et al. 2023), where the two components, \(\mathbf{A}(\boldsymbol{\mathbf{\beta}})\) and \(\mathbf{V}(\boldsymbol{\mathbf{\beta}})\), can be estimated separately. Since Equation (6) is a smooth function in \(\boldsymbol{\mathbf{\beta}}\), the slope matrix, \(\mathbf{A}(\boldsymbol{\mathbf{\beta}})\), can be conveniently estimated by differentiating \(\widetilde{U}_{t_0}(\boldsymbol{\mathbf{\beta}}, \tau, \mathbf{H})\) with respect to \(\boldsymbol{\mathbf{\beta}}\). The explicit form of \(\mathbf{A}(\boldsymbol{\mathbf{\beta}})\) is as follows: \[\begin{aligned} \label{eq:cov:slp} \mathbf{A}(\boldsymbol{\mathbf{\beta}}) & = \frac{\partial \widetilde{U}_{t_0}(\boldsymbol{\mathbf{\beta}}, \tau, \mathbf{H})}{\partial \boldsymbol{\mathbf{\beta}}} \nonumber \\ & = \frac{1}{n}\sum_{i=1}^{n} I[Z_i > t_0] \mathbf{X}_i \frac{G(t_0) \delta_i}{G(Z_i)} \phi\left(\frac{{\mathbf{X}_i}^{\top}\boldsymbol{\mathbf{\beta}} - \log(Z_i-t_0)}{\sqrt{{\mathbf{X}_i}^{\top}\mathbf{H} \mathbf{X}_i}}\right)\left(\frac{-{\mathbf{X}_i}}{\sqrt{{\mathbf{X}_i}^{\top} \mathbf{H} {\mathbf{X}_i}}}\right), \end{aligned} \tag{8}\] where \(\phi (\cdot)\) is the density function of the standard normal random variable.

The slope matrix, \(\widehat{\mathbf{A}}(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS})\), can be evaluated directly by plugging in \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS}\) and \(\widehat{G}(\cdot)\). On the other hand, the variance of the estimating function, \(\widehat{\mathbf{V}}(\boldsymbol{\mathbf{\beta}})\), can be obtained by a computationally efficient resampling method motivated by the multiplier bootstrap procedure in Section 2.1. Specifically, we propose estimating \(\widehat{\mathbf{V}}(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS})\) as the simple variance of a large set of realizations of the perturbed version of \(\widetilde{U}_{t_0}(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS}, \tau, \mathbf{H})\) presented in Equation (7). We refer to this procedure as the partial multiplier bootstrapping approach because it utilizes the perturbed estimating function, similar to the full multiplier bootstrapping approach, but the computation of \(\widehat{\mathbf{A}}(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS})\) and \(\widehat{\mathbf{V}}(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS})\) does not involve the repeated solving of the perturbed estimating equations. Thus, the partial multiplier bootstrapping approach is expected to be computationally more efficient than the multiplier bootstrap method. A similar procedure and its performance have been studied in modeling failure times with semiparametric AFT models (Chiou et al. 2014, 2021).

Iterative procedure in induced smoothing estimation

The induced estimator \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS}\) is obtained with a fixed \(\mathbf{H}\), as described in Section 2.2, and its variance is estimated separately. This estimation procedure can be viewed as a special case of the following iterative procedure, which updates \(\mathbf{H}\) and \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS}\) iteratively. Specifically, the iterative algorithm utilizes the Newton-Raphson method while sequentially updating \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS}\) and \(\widehat{\mathop{\rm Var}\nolimits}(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IS})\) until convergence. Similar iterative algorithms have also been considered previously in the induced smoothing approach for semiparametric AFT models (Johnson and Strawderman 2009; Chiou et al. 2014, 2015a; Choi et al. 2018). The iterative procedure is summarized as follows:

Step 1:: Set the initial values \(\widehat{\boldsymbol{\mathbf{\beta}}}^{(0)}\), \(\widehat{\mathbf{\Sigma}}^{(0)} = \mathbf{I}_{p}\), and \(\mathbf{H}^{(0)} = n^{-1}\widehat{\mathbf{\Sigma}}^{(0)}\).
Step 2:: Given \(\widehat{\boldsymbol{\mathbf{\beta}}}^{(k)}\) and \(\mathbf{H}^{(k)}\) at the \(k\)-th step, update \(\widehat{\boldsymbol{\mathbf{\beta}}}^{(k)}\) by \[\widehat{\boldsymbol{\mathbf{\beta}}}^{(k+1)}=\widehat{\boldsymbol{\mathbf{\beta}}}^{(k)} - \widehat{\mathbf{A}}(\widehat{\boldsymbol{\mathbf{\beta}}}^{(k)})^{-1}{\widetilde{U}_{t_0}(\widehat{\boldsymbol{\mathbf{\beta}}}^{(k)}, \tau, \mathbf{H}^{(k)}}).\]
Step 3:: Given \(\widehat{\boldsymbol{\mathbf{\beta}}}^{(k+1)}\) and \(\widehat{\mathbf{\Sigma}}^{(k)}\), update \(\widehat{\mathbf{\Sigma}}^{(k)}\) by \[\widehat{\mathbf{\Sigma}}^{(k+1)} = \widehat{\mathbf{A}}(\widehat{\boldsymbol{\mathbf{\beta}}}^{(k+1)})^{-1} \widehat{\mathbf{V}}(\widehat{\boldsymbol{\mathbf{\beta}}}^{(k+1)}, \tau) \widehat{\mathbf{A}}(\widehat{\boldsymbol{\mathbf{\beta}}}^{(k+1)})^{-1}.\]
Step 4:: Set \(\mathbf{H}^{(k+1)} = n^{-1}\widehat{\mathbf{\Sigma}}^{(k+1)}\). Repeat Steps 2, 3 and 4 until \(\widehat{\boldsymbol{\mathbf{\beta}}}^{(k)}\) and \(\widehat{\mathbf{\Sigma}}^{(k)}\) converge.

The initial value, \(\widehat{\boldsymbol{\mathbf{\beta}}}^{(0)}\), could be chosen as \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny NS}\). We define \(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IT}\) and \(\widehat{\boldsymbol{\mathbf{\Sigma}}}_{\tiny IT}\) as the values of \(\widehat{\boldsymbol{\mathbf{\beta}}}^{(k)}\) and \(\widehat{\mathbf{\Sigma}}^{(k)}\) at convergence, and \(\widehat{\mathop{\rm Var}\nolimits}(\widehat{\boldsymbol{\mathbf{\beta}}}_{\tiny IT}) = n^{-1}\widehat{\mathbf{\Sigma}}_{\tiny IT}\). In Step 3, \(\widehat{\mathbf{V}}(\widehat{\boldsymbol{\mathbf{\beta}}}^{(k+1)}, \tau)\) is obtained using the partial multiplier bootstrap approach. However, the full multiplier bootstrap approach can also be employed but would require longer computation times.

3 Package implementation

The main function in the qris package for estimating the regression parameters in the quantile regression model for residual life is the qris() function. The qris() function is written in C++ and incorporated into R using the Rcpp (Eddelbuettel et al. 2022a) and RcppArmadillo (Eddelbuettel et al. 2022b) packages. The synopsis of qris is:

args(qris)

function (formula, data, t0 = 0, Q = 0.5, nB = 100, method = c("smooth", 
    "iterative", "nonsmooth"), se = c("fmb", "pmb"), init = c("rq", 
    "noeffect"), verbose = FALSE, control = qris.control()) 
NULL

The required argument is formula, which specifies the quantile regression model to be fitted using the variables in data. The formula assumes that the response variable is a ‘Surv’ object created by the Surv() function in the survival package (Therneau 2021). This formula structure is commonly adopted for handling survival data in R, as seen in functions like survreg() and coxph() in the survival package. The argument t0 specifies the base time used in defining residual life. The default value of t0 is set to zero, in which case residual life reduces to a failure time. The Q argument is used to specify the target quantile of residual life to estimate, with the default value being set to 0.5 (median). The nB argument specifies the bootstrapping size used in standard error estimation, with the default value set to 100. The method argument specifies one of the three estimation methods: "nonsmooth", "smooth", and "iterative", corresponding to the estimating procedures outlined in Sections 2.1, 2.2, and 2.3, respectively. Given the point estimates of the regression parameters, their standard errors can be estimated using one of two implemented methods: se = "fmb" and se = "pmb". The se = "fmb" method employs a full-multiplier bootstrapping approach to estimate the variance by the sample variance of large realizations of \(\widehat\beta\). The se = "pmb" method estimates the variance using a robust sandwich variance estimator and employs the computationally efficient partial multiplier bootstrapping approach described in Section 2.2. The "fmb" option is available for all three point estimation methods, whereas the "pmb" option is not available for the "nonsmooth" point estimation method due to the lack of a closed-form sandwich variance estimator. The init argument allows users to specify the initial value for estimating regression parameters by either a \(p\)-dimensional numerical vector or a character string. In the latter case, the options init = "rq" and init = "noeffect" correspond to the point estimate obtained from the rq() function in the quantreg package and a \(p\)-dimensional vector of zeros, respectively. The default value for init is init = "rq". Among the three methods implemented for point estimation, method = "smooth" and method = "nonsmooth" are non-iterative, in the sense that point estimation is performed separately from the estimation of standard errors. On the other hand, method = "iterative" calculates point estimates and the corresponding standard error estimates simultaneously through iterative updates. When method = "iterative", users can define specific convergence criteria using qris.control(). The available options in qris.control() are as follows.

args(qris.control)

function (maxiter = 10, tol = 0.001, trace = FALSE) 
NULL

The maxiter argument specifies the maximum number of iterations. The default value for maxiter is ten, as the proposed algorithm typically converges within ten steps based on our exploration. The convergence tolerance is controlled using the tol argument, which has a default value of 1e-3. The trace argument takes a logical value and is used to determine whether to print the result for each iteration. The default setting is trace = FALSE. The ‘qris’ object is fully compatible with many of R’s generic functions, including coef(), confint(), plot(), predict(), print(), residuals(), summary(), and vcov().

Among the available S3 methods, a unique feature of the qris package’s S3 plot method, when applied to a ‘qris’ object, is its ability to automatically update the original object by extending the range of \(\tau\) or \(t_0\) values. This extension enables the generation of a covariate effect plot over the newly specified values of \(\tau\) or \(t_0\), providing a comprehensive visualization of the covariate effects across the extended range. The S3 method for plotting a ‘qris’ object is shown below.

argsAnywhere(plot.qris)

function (x, t0s = NULL, Qs = NULL, nB = NULL, vari = NULL, byQs = FALSE, 
    ggextra = NULL, ...) 
NULL

The argument x is a ‘qris’ object created using the qris() function. The t0s and Qs arguments are numeric vectors that enable users to specify the values of \(t_0\) or \(\tau\) for plotting the covariate effect. If t0s and Qs are not specified, the covariate effects are plotted against \(\tau = 0.1, 0.2, \ldots, 0.9\) at the base time (\(t_0\)) inherited from the ‘qris’ object specified in x. The nB argument is a numerical variable that controls the sample size for bootstrapping, used to compute standard error estimations based on the variance estimation specified in the original ‘qris’ object. When nB is specified, the function calculates standard errors for all combinations of \(t_0\) and \(\tau\) specified in t0s and Qs, computes 95% confidence intervals accordingly, and includes them in the covariate effect plot. The vari argument is a character string that allows users to specify the names of the covariates they want to display in the effect plots. When the vari argument is not specified, all covariates will be included in the plots by default. The coefficient event plot can be plotted against the specified quantiles by setting byQs = TRUE or against the specified base times by setting byQs = FALSE. Finally, the ggextra argument allows users to pass additional graphical parameters to the ggplot2 package, offering further customization options for the plots. When the plot() function is called, it internally invokes the qris.extend() function to compute the covariate effects at additional values. The syntax for the qris.extend() function is provided below:

args(qris.extend)

function (x, t0s = NULL, Qs = NULL, nB = NULL, vari = NULL) 
NULL

The arguments in qris.extend() are inherited from the arguments specified in the plot() function. To reduce runtime when repeatedly calling the plot(), one can calculate the desired covariate effects by applying qris.extend() outside of plot() first and then supply the results to plot(). This approach allows for pre-computation of the covariate effects, making it more efficient when generating multiple plots. Overall, the unique plotting feature in qris provides users with a seamless and effortless approach to conducting a comprehensive assessment of the covariate effects across different quantiles or base times.

4 Illustration

Simulated data

In this subsection, we present a simple simulation example to validate the implementations in the proposed qris package. The simulation involves five covariates, denoted as \(X_1, \ldots, X_5\). Among these covariates, \(X_1\) and \(X_4\) follow a standard uniform distribution, \(X_2\) follows a binomial distribution with a success probability of 0.5, \(X_3\) follows a standard normal distribution, and \(X_5\) follows a standard exponential distribution. We assume that \(X_2, X_3, X_4\), and \(X_5\) do not impact the residual life, meaning their corresponding coefficient values \(\beta_2\), \(\beta_3\), \(\beta_4\), and \(\beta_5\) are zero. The survival time \(T\) is generated from a Weibull distribution with the survival function \(S(t) = \exp\{-(\rho t)^\kappa\}\) for \(t > 0\), where \(\kappa = 2\), and \(\rho\) is obtained by solving \[\label{eq:sim:weibull} \rho^{-1}\{ (\rho t_0)^\kappa - \log (1-\tau) \}^{(1/\kappa)}- t_0 = \exp\{\beta_0 + \beta_1 X_1\}, \tag{9}\] for a specified \(t_0\) and \(\tau\). We set the intercept \(\beta_0 = \log(5)\) and \(\beta_1 = \log(2)\) at \(t_0 = 0\). Given \(\rho\), \(\tau\), and \(X_1\), the true values of \(\beta_0\) and \(\beta_1\) can be obtained sequentially from Equation (9) for different \(t_0 > 0\). In our case, the corresponding true values of \(\beta_0\) are approximately 1.411 and 1.219 for \(t_0=1\) and 2, respectively. Similarly, the true values of \(\beta_1\) are approximately 0.797 and 0.907 for \(t_0=1\) and 2, respectively. The closed-form expression for generating \(T\) is then \(\{ -\log(1 - u) \}^{1/\kappa} / \rho\), where \(u\) is a uniform random variable over \((0, 1)\). Given these specifications, we have implemented the data.gen() function to generate simulation data. The data.gen() function takes four arguments: n, t0, cen, and Q, representing the sample size, \(t_0\), censoring proportion, and \(\tau\), respectively. We generate censoring times \(C\) from an independent uniform distribution over \((0, c)\), where \(c\) is chosen to achieve the desired censoring proportions of 10% and 30%. Using the generated dataset, we fit the model using three different estimation methods: induced smoothing, non-smooth, and iterative-induced smoothing. All analyses were conducted on a 4.2 GHz Intel(R) quad Core(TM) i7-7700K central processing unit (CPU) using R 4.3.0 (R Core Team 2021). The following code demonstrates the implementation of data.gen() to generate a simulation dataset.

The data.gen() function generates a data.frame containing seven variables. The Time variable represents the observed survival time, while the status variable serves as the event indicator, taking the value 1 for observed events and 0 for censored observations. The variables X1, \(\ldots\), X5 are the covariates. The implementation in the data.gen() function generates the Weibull survival times using the inverse probability integral transform technique. Alternatively, users can use the rweibull() function with the parameters shape = 2 and scale = 1 / rho to generate these Weibull survival times directly.

We assess the performance of the proposed implementation across various scenarios, including three sample sizes (\(n = 200, 400, 1000\)), three levels of \(t_0\) (\(0, 1, 2\)), two censoring proportions (10% and 30%), and two values of \(\tau\) (0.25 and 0.50). For a given dataset, we apply the full-multiplier bootstrapping approach with 200 bootstrap samples to all three available estimating procedures: method = "nonsmooth", method = "smooth", and method = "iterative". To facilitate the evaluation process, we create the do_fmb() function to record the coefficient estimates, standard errors, and computing times for fitting a single simulated dataset generated from data.gen(). The following is the implementation of the do_fmb() function and the corresponding code to run the simulation with 200 replications. We present the code and result of the simulation experiments conducted at three different sample sizes, with \(t_0\) values set to 0 and 1, while holding the censoring proportion at 30% and \(\tau\) value at 0.5. The results for other simulation scenarios are provided in the Supplementary Materials.

do_fmb <- function(n, t0, cen, Q, nB) {
  dat <- data.gen(n, t0, cen, Q)
  fm <- Surv(Time, status) ~ X1 + X2 + X3 + X4 + X5
  stamp <- NULL
  stamp[1] <- Sys.time()
  f1 <- qris(fm, data = dat, t0 = t0, Q = Q, nB = nB, method = "smooth", se = "fmb")
  stamp[2] <- Sys.time()
  f2 <- qris(fm, data = dat, t0 = t0, Q = Q, nB = nB, method = "nonsmooth", se = "fmb")
  stamp[3] <- Sys.time()
  f3 <- qris(fm, data = dat, t0 = t0, Q = Q, nB = nB, method = "iterative", se = "fmb")
  stamp[4] <- Sys.time()
  list(smooth = c(f1$coef, f1$std),
       nonsmooth = c(f2$coef, f2$std),
       iter = c(f3$coef, f3$std),
       times = diff(stamp))
}

B <- 200
set.seed(2)
sims0_fmb <- mapply(function(n, t0)
    replicate(B, do_fmb(n, t0 = t0, cen = .3, Q = .5, nB = 200)),
    n = c(200, 400, 1000), t0 = c(0, 0, 0), SIMPLIFY = F)
sim1_fmb <- mapply(function(n, t0)
    replicate(B, do_fmb(n, t0 = t0, cen = .3, Q = .5, nB = 200)),
    n = c(200, 400, 1000), t0 = c(1, 1, 1), SIMPLIFY = F)

Figure 1 displays violin plots that provide visualizations of the empirical distribution of the coefficient estimates. As expected, all three estimators exhibit small biases, which are calculated as the difference between the point estimates (PE) and the true regression coefficients. Furthermore, the empirical distributions of the PEs demonstrate a normal-like shape, aligning with the asymptotic properties of the proposed method (Li et al. 2016; Kim et al. 2023). When the sample size is smaller (e.g., \(n = 200\) and 400), the nonsmooth approach appears to yield slightly larger empirical standard errors (ESE) compared to the smooth or iterative approaches. However, when \(n = 1000\), the ESEs are similar across all approaches. On the other hand, the comprehensive simulation results presented in Table 1 of the Supplementary Materials confirm that all coefficient estimates closely approximate the true regression coefficients. On the other hand, the ESEs and the averaged estimated standard errors (ASE) are in close agreement for all scenarios, indicating the validity of the variance estimation. Furthermore, the computation times, which are presented separately in the upper panel of Table 1, indicate that when employing the full multiplier bootstrapping approach, the nonsmooth approach demonstrates a slight advantage in terms of computational efficiency over the smooth approach, while the iterative approach takes 5.1 to 9.5 times longer than the smooth approach. In summary, the timing results show that the proposed method can yield valid inference results within seconds, even with large datasets of up to 1000 observations or when using the computationally demanding full multiplier bootstrapping approach for variance estimation.

Figure 1: \(t_0 = 0\)

Figure 2: \(t_0 = 1\)

When \(t_0 = 0\), the targeted semiparametric quantile regression model for residual life simplifies to the standard quantile regression model for survival time. In such cases, existing functions like crq() from the quantreg package (Koenker 2022) can be employed. A comparison between the performance of crq() and our proposed implementation when \(t_0 = 0\) is presented in the Supplementary Materials, where the standard errors of the crq() are obtained from the bootstrap method with 200 bootstrap samples. Overall, the performance of crq() is comparable to the proposed methods in terms of bias and standard errors. However, we have occasionally encountered situations where the crq() function fails to converge, particularly when the sample size is large, as in the case of \(n = 1000\). In the other extended simulation scenarios outlined in the Supplementary Materials, which encompass various levels of \(t_0\), censoring proportions, and \(\tau\), the proposed methods consistently exhibit satisfactory performance across all settings.

The true potential of the proposed smooth approach lies in its capability for efficient variance estimation through the implementation of the partial multiplier bootstrapping approach. This approach eliminates the need for repetitive solving of estimating equations, resulting in improved computational efficiency in variance estimation. To demonstrate its usefulness, we conducted a simulation using both the smooth approach and the iterative approach with the partial multiplier bootstrapping approach (se = "pmb"). This simulation was conducted under the settings of \(\tau = 0.5\), \(t_0 = 0\) and \(1\), and a 30% censoring rate. The do_pmb() function was accordingly modified as follows.

do_pmb <- function(n, t0, cen, Q, nB) {
  dat <- data.gen(n, t0, cen, Q)
  fm <- Surv(Time, status) ~ X1 + X2 + X3 + X4 + X5
  stamp <- NULL
  stamp[1] <- Sys.time()
  f1 <- qris(fm, data = dat, t0 = t0, Q = Q, nB = nB, method = "smooth", se = "pmb")
  stamp[2] <- Sys.time()
  f2 <- qris(fm, data = dat, t0 = t0, Q = Q, nB = nB, method = "iterative", se = "pmb")
  stamp[3] <- Sys.time()
  list(smooth = c(f1$coef, f1$std),
       iter = c(f2$coef, f2$std),
       times = diff(stamp))
}

set.seed(2)
sims0_pmb <- mapply(function(n, t0)
    replicate(B, do_pmb(n, t0 = t0, cen = .3, Q = .5, nB = 200)),
    n = c(200, 400, 1000), t0 = c(0, 0, 0), SIMPLIFY = F)
sims1_pmb <- mapply(function(n, t0)
    replicate(B, do_pmb(n, t0 = t0, cen = .3, Q = .5, nB = 200)),
    n = c(200, 400, 1000), t0 = c(1, 1, 1), SIMPLIFY = F)

The simulation results obtained using the partial multiplier bootstrapping approach are presented in Figure 3 and Tables 7 – 12 in the Supplementary Materials, while the computing times are displayed in the lower panel of Table 1. Overall, the estimation results obtained using se = "pmb" in Figure 3 closely resemble those in Figure 1 with se = "fmb". As seen in Tables 7 and 8, the ESEs from the non-iterative and iterative methods are comparable, while the ASEs slightly overestimate the ESEs when the sample size is small. The gaps are slightly smaller for the iterative method, as shown in some cases (Johnson and Strawderman 2009; Kim et al. 2021). The magnitudes of the differences are not large, and they also become smaller when the sample size reaches \(n = 1000\). More importantly, the computing times with se = "pmb" show significant speed improvements compared to when se = "fmb" is used in every case; we observed up to 79% timing improvements.

Figure 3: \(t_0 = 0\)

Figure 4: \(t_0 = 1\)

Table 1: Runtimes (in seconds) when `se = fmb` and `se = pmb`.
		\(t_0 = 0\)			\(t_0 = 1\)
se	method	200	400	1000	200	400	1000
`fmb`	Smooth	0.103	0.174	0.471	0.106	0.178	0.480
	Nonsmooth	0.080	0.142	0.472	0.080	0.141	0.468
	Iterative	0.981	1.500	2.410	0.985	1.567	2.882
`pmb`	Smooth	0.022	0.052	0.223	0.022	0.053	0.224
	Iterative	0.296	0.580	1.407	0.296	0.581	1.435

After confirming the satisfactory performance of the proposed methodologies, we now proceed to illustrate the application of the init argument. This argument controls the initial values assigned to the root-finding algorithm’s estimates and the plotting capacity of the qris package. For this illustrative example, we consider a simpler simulation scenario that involves a single binary covariate. This simplified simulation can be generated using the revised version of the data.gen() function provided below.

## Global parameters
rho0 <- .2 * sqrt(log(2))
rho1 <- .1 * sqrt(log(2))
data.gen <- function(n) {
    dat <- data.frame(censoring = runif(n, 0, 23.41),
                      Time0 = sqrt(-log(1 - runif(n))),
                      X = rbinom(n, 1, .5))
    dat$Time0 <- ifelse(dat$X > 0, dat$Time0 / rho1, dat$Time0 / rho0)
    dat$Time <- pmin(dat$Time0, dat$censoring)
    dat$status <- 1 * (dat$Time0 < dat$censoring)
    subset(dat, select = c(Time, status, X))
}
set.seed(10)
head(dat <- data.gen(200))

       Time status X
1  6.034713      1 1
2  7.181451      0 1
3  9.993908      0 1
4 16.225520      0 1
5  1.993033      0 1
6  5.277471      0 0

The updated data.gen() function returns a data.frame comprising three variables: Time, status, and X, representing the observed survival time, event indicator, and binary covariate, respectively. We will first illustrate the usage of the argument init by considering three different initial values: init = "rq", init = c(1,1), and a random vector init = rnorm(2), all used in conjunction with the smooth estimator method = "smooth". The following codes provide an example with different initial values.

(random <- rnorm(2))

[1] 1.5025446 0.5904095

f1 <- qris(Surv(Time, status) ~ X, data = dat, t0 = 1, init = "rq", nB = 0)
f2 <- update(f1, init = c(1, 1))
f3 <- update(f1, init = random)
all.equal(f1$coef, f2$coef)

[1] TRUE

all.equal(f2$coef, f3$coef)

[1] TRUE

The ‘qris’ object, with its call component, is compatible with the update() function, a built-in function commonly used for updating the attributes of an existing object without requiring redundant and repetitive code. In the example above, we used the update() function to modify the initial value specification in f1. We observed that different initial values yield identical point estimates, thereby affirming the robustness of the proposed method against fluctuations in initial values.

The covariate effects, along with their associated 95% point-wise confidence intervals across various quantiles or base times, can be visually assessed by applying the generic function plot() to a ‘qris’ object. We demonstrate this feature using the following qris fit, where the standard errors are obtained using se = "pmb", \(t_0 = 1\), and all other parameters are set to their default values. We update the qris fit with extended quantiles over \({0.4, 0.5, 0.6, 0.7}\) and plot the covariate effects against these quantiles using the plot() function.

fit <- qris(Surv(Time, status) ~ X, data = dat, t0 = 1, se = "pmb")
fit2 <- qris.extend(fit, Qs = 4:7 / 10)

The extended ‘qris’ fit generated by the qris.extend() function inherits all the attributes from the original ‘qris’ object and includes additional ggdat components. The following code compares the components of the returned values from the extended ‘qris’ fit and the original ‘qris’ fit.

class(fit2)

[1] "qris"

names(fit)

[1] "call"        "coefficient" "data"        "formula"    
[5] "para"        "stderr"      "varNames"    "vcov"

setdiff(names(fit2), names(fit))

[1] "ggdat"

Specifically, the extended ‘qris’ fit inherits call, coefficient, para, stderr, varNames, and vcov from the original ‘qris’ object. The call component is the function call from the original qris() fit, while coefficient, stderr, and vcov are used to store the point estimates, standard error estimates, and covariance matrix, respectively. The para component is a list containing the parameters specified during the fitting of the quantile regression model, and varNames is a character string representing the variable names in the function call. The newly added values are ggdat and gg. The ggdat is a data frame containing covariate information generated under the different quantiles and base times specified in the qris.extend(). Finally, the corresponding covariate effect plot can be generated by plotting the extended ‘qris’ fit as follows.

The true values of \(\beta\)’s at different quantiles and base times, computed from Equation (9), can be implemented in the following commands.

## Global parameters 
r <- 2:1 * sqrt(log(2)) / 10
k <- 2
trueB <- function(t0, tau) {
    b <- log(1 / r * ((r * t0) ^ k - log(1 - tau))^(1 / k) - t0)
    c(b[1], b[2] - b[1])
}
true_Q <- c(t(sapply(4:7 / 10, trueB, t0 = 1)))
true_t0 <- c(t(sapply(1:3, trueB, tau = .5)))

The following code extends the ‘ggplot’ objects generated by plot.qris() by adding additional layers of true value curves and incorporating various ggplot options. The resulting figures, Figure 5 and Figure 6, present the output based on whether the covariate effects are plotted against quantiles or base times, respectively. This observed trend aligns with the specifications described in Equation (9), where increasing \(\tau\) corresponds to an increasing \(\beta_0\) while keeping \(\rho\) and \(X\) fixed. On the other hand, the covariate effect does not change with quantiles but slightly increases with base times, echoing the model specification where \(\beta_0\) is inversely related to \(t_0\) and \(\beta_1\) increases as \(t_0\) increases.

library(ggplot2)
plot(fit2) + theme(legend.position = "bottom") + 
    geom_line(aes(x = Qs, y = true_Q, col = variable, linetype = "True value")) +
    scale_linetype_manual(name = "", values = c("True value" = "dotdash"))

b <- plot(fit2, t0s = 1:3, byQs = F)
b + theme(legend.position = "bottom") +
    geom_line(aes(x = t0s, y = true_t0, col = variable,
                  linetype = "True value")) +
    scale_linetype_manual(name = "", values = c("True value" = "dotdash"))

Figure 5: Plot for \(Q\in\{0.4, \ldots, 0.7\}\) at \(t_0 = 1\)

Figure 6: Plot for \(t_0\in\{1, \ldots, 3\}\) at \(Q = 0.5\)

North Central Cancer Treatment Group Lung Cancer Data

The North Central Cancer Treatment Group Lung Cancer Data records the survival of patients with advanced lung cancer, along with assessments of the patients’ performance status measured by both physicians and the patients themselves (Loprinzi et al. 1994). The original objective of the study was to ascertain whether descriptive information from a patient-completed questionnaire could offer prognostic insights. The original objective of the study was to determine whether descriptive information from a patient-completed questionnaire could provide prognostic information. However, for this illustration, we focus on how gender and weight loss affect the quantiles of residual life for patients diagnosed with advanced lung cancer at different time points. The lung cancer data are publicly available from the survival package (Therneau 2021) as lung. The following code displays the structure of the lung dataset with variables of interest.

data(cancer, package = "survival")
str(subset(lung, select = c(time, status, sex, wt.loss)))

'data.frame':   228 obs. of  4 variables:
 $ time   : num  306 455 1010 210 883 ...
 $ status : num  2 2 1 2 2 1 2 2 2 2 ...
 $ sex    : num  1 1 1 1 1 1 2 2 1 1 ...
 $ wt.loss: num  NA 15 15 11 0 0 10 1 16 34 ...

The lung data contains 228 patients whose observed survival times in days and censoring status (1 = censored, 2 = dead) are recorded in the time and the status columns, respectively. Although the censoring status in this dataset is not recorded in the typical 0-1 fashion, the Surv() function is still applicable to create the corresponding “Surv" object. The lung data yields a censoring rate of \(27.6\%\) with a median survival time of 310 days. The covariates of interest are gender (sex = 1 if male, sex = 2 if female) and weight loss (wt.loss). In the following, we use the proposed semiparametric quantile regression models to assess the gender and standardized weight loss effects on different quantiles of residual life at different base times.

We first model the median residual life (Q = 0.5) when the base time is one month (t0 = 30). Since the estimated median survival times for combined lung cancers are typically less than one year, with a range of 8 to 13 months (Siegel et al. 2021), setting the base time at one month provides insight into how gender and weight loss impact the residual time in early follow-up. In the following, we obtain the regression coefficient estimates using the induced smoothing functions and the corresponding variance estimate with the partial multiplier bootstrap approach.

lung$male <- factor(lung$sex, 1:2, c("Male", "Female"))
lung$std.wt.loss <- scale(lung$wt.loss)
fit1 <- qris(Surv(time, status) ~ male + std.wt.loss,
             data = lung, t0 = 30, Q = .5, nB = 100,
             method = "smooth", se = "pmb")
summary(fit1)

Call:
qris(formula = Surv(time, status) ~ male + std.wt.loss, data = lung, 
    t0 = 30, Q = 0.5, nB = 100, method = "smooth", se = "pmb")

qris Estimator
            estimate std.Error z.value p.value    
(Intercept)   5.5611    0.0950  58.539  <2e-16 ***
maleFemale    0.4804    0.1616   2.972  0.0030 ** 
std.wt.loss  -0.0731    0.0807  -0.905  0.3652    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Subjects with missing values (in any of the variables relevant for the modeling task) are automatically removed when qris() is called. The estimated intercept implies that the median residual life for patients who have survived up to 30 days is \(\exp(5.5611) = 260.1\) days for a male with an average weight loss. More interestingly, the summary shows that the gender effect is statistically significant at the 0.05 significance level, indicating that a female patient is expected to have a median residual life at 30 days that is \(\exp(0.4804) = 1.617\) times that of a male patient with the same weight loss. The effect of the weight loss is not statistically significant at the 0.05 level. In addition to summary(), important statistics such as the coefficient and variance estimates can be extracted by S3 methods coef() and vcov(), respectively.

coef(fit)

(Intercept)           X 
  1.4302204   0.9322628

Moreover, the corresponding 95% Wald-type confidence interval can be printed by applying the confint() function to the ‘qris’ object.

confint(fit1)

                 2.5 %     97.5 %
(Intercept)  5.3749261 5.74731362
maleFemale   0.1636200 0.79726457
std.wt.loss -0.2312535 0.08510084

The update() function can be conveniently applied to update existing ‘qris’ objects. The following examples update the method and se arguments from fit1. The updated results yield similar coefficient estimates, but the non-smooth procedure (method = "nonsmooth") yields slightly greater standard error estimates.

summary(fit2 <- update(fit1, method = "nonsmooth", se = "fmb"))

Call:
qris(formula = Surv(time, status) ~ male + std.wt.loss, data = lung, 
    t0 = 30, Q = 0.5, nB = 100, method = "nonsmooth", se = "fmb")

qris Estimator
            estimate std.Error z.value p.value    
(Intercept)   5.5585    0.1025  54.249  <2e-16 ***
maleFemale    0.4695    0.1872   2.508  0.0122 *  
std.wt.loss  -0.0668    0.0944  -0.708  0.4789    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(update(fit1, method = "iterative"))

Call:
qris(formula = Surv(time, status) ~ male + std.wt.loss, data = lung, 
    t0 = 30, Q = 0.5, nB = 100, method = "iterative", se = "pmb")

qris Estimator
            estimate std.Error z.value p.value    
(Intercept)   5.5599    0.0904  61.518  <2e-16 ***
maleFemale    0.4818    0.1786   2.698  0.0070 ** 
std.wt.loss  -0.0716    0.0942  -0.761  0.4467    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

At a lower (Q = 0.25) and a higher (Q = 0.75) quantiles, the gender effect remains significant at the 0.05 significance level indicating female patients are associated with longer lower-quantile and higher-quantile residual life than male patients with the same weight loss. Among these models, we observed that female patients tend to have higher coefficient estimates when fitting higher-quantile residual life. While the sign of the estimated regression coefficient for weight loss changes to a negative value when considering the lower quantile, the effects remain statistically insignificant for both the lower and higher quantiles.

summary(update(fit1, Q = 0.25))

Call:
qris(formula = Surv(time, status) ~ male + std.wt.loss, data = lung, 
    t0 = 30, Q = 0.25, nB = 100, method = "smooth", se = "pmb")

qris Estimator
            estimate std.Error z.value p.value    
(Intercept)   4.9111    0.1000  49.115  <2e-16 ***
maleFemale    0.4651    0.1851   2.513  0.0120 *  
std.wt.loss   0.0543    0.0548   0.991  0.3218    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(update(fit1, Q = 0.75))

Call:
qris(formula = Surv(time, status) ~ male + std.wt.loss, data = lung, 
    t0 = 30, Q = 0.75, nB = 100, method = "smooth", se = "pmb")

qris Estimator
            estimate std.Error z.value p.value    
(Intercept)   6.0748    0.1085  55.971  <2e-16 ***
maleFemale    0.5237    0.1621   3.231  0.0012 ** 
std.wt.loss  -0.0171    0.1083  -0.158  0.8746    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We also consider the base time at six months t0 = 180, which enables us to assess gender and weight loss effects in median residual time at a moderate length of follow-up. The estimated effect for the gender and weight loss increases as \(t_0\) increases from \(30\) days to \(180\) days and becomes significant at the 0.05 significant level. Additionally, the effect of the weight loss seems to be associated with a shorter survival time after \(180\) days, with a \(p\)-value of \(0.0008\).

summary(update(fit1, t0 = 180))

Call:
qris(formula = Surv(time, status) ~ male + std.wt.loss, data = lung, 
    t0 = 180, Q = 0.5, nB = 100, method = "smooth", se = "pmb")

qris Estimator
            estimate std.Error z.value p.value    
(Intercept)   5.2243    0.0923  56.629  <2e-16 ***
maleFemale    0.5821    0.1975   2.948  0.0032 ** 
std.wt.loss  -0.2515    0.0891  -2.821  0.0048 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ‘qris’ object is designed to be compatible with S3 methods: predict() and residuals() functions. The following presents the fitted survival times for two hypothetical male and female patients with no weight loss, as well as the first five residual values for the dataset.

lung.new <- data.frame(male = c("Male", "Female"), std.wt.loss = 0)
predict(fit2, newdata = lung.new)

       1        2 
444.9026 289.4422

head(residuals(fit2), 5)

         1          2          3          4          5 
 -20.86127 -575.86127  232.44474 -416.82295 -555.82295

To better understand the covariate effects on different quantiles of residual time and across different base times, we plot the estimated regression coefficients of the intercept, sex, and weight loss in fit1 and fit2. Figures 7 and 8 display the estimated regression coefficients when method = "smooth" and method = "nonsmooth", respectively, at different quantiles ranging from 0.2 and 0.5 at \(t_0 = 30\) days. The plot.qris() function is currently not available for the iterative estimator. This is mainly due to an extended computation time involved, as indicated by our simulation results, and the nature of plotting that necessitates computations across various quantiles or base times. As expected, the two plots show very similar patterns. We plot the estimated regression coefficients of the intercept, sex, and weight loss for different quantiles in the range of 0.2 to 0.5 at \(t_0= 50\), 60, 70, and 80 days (Figure 9), as well as for different base times in the range of 50 to 80 days at \(\tau=0.2\), 0.3, 0.4, and 0.5 (Figure 10). The estimation method used is non-iterative induced smoothed estimation (method = "smooth"). In Figure 9, the estimated intercept increases as the quantile increases (for a given base time). The estimated slopes for sex remain largely the same, but those for weight loss tend to decrease slightly across different quantiles (for a given base time). These patterns remain consistent for different base times. In Figure 10, the estimated intercepts increase as the quantiles increase, but with a given quantile, they remain flat across the different base times considered. The estimated regression coefficients for the two covariates do not appear to change significantly for different base times.

hide <- theme(legend.position = "none")
plot(fit1, Qs = 2:5 / 10, byQs = TRUE, ggextra = hide)

plot(fit2, Qs = 2:5 / 10, byQs = TRUE, ggextra = hide)

plot(fit1, Qs = 2:5 / 10, t0s = 5:8 * 10, byQs = TRUE, ggextra = hide)

plot(fit1, Qs = 2:5 / 10, t0s = 5:8 * 10, byQs = FALSE, ggextra = hide)

Figure 7: method = ”smooth” and se = ”pmb”

Figure 8: method = ”nonsmooth” and se = ”fmb”

Figure 9: method = ”smooth” and se = ”pmb”

Figure 10: Multiple covariate effect plot against base time

5 Conclusion

The purpose of the qris package is to provide a comprehensive tool for fitting quantile regression models on residual life for right-censored survival data, with the aim of promoting widespread dissemination and utilization. This package implements one estimation method based on non-smooth estimating functions and two estimation methods based on their induced smoothed versions. The non-smooth estimator is calculated through \(L_{1}\)-type minimization while incorporating the IPCW technique, and its variance is calculated using full multiplier bootstrapping. The first type of the induced smoothed estimator, a non-iterative version, directly solves estimating functions, and its variance can be calculated using either the full multiplier bootstrapping or the robust sandwich form with partial multiplier bootstrapping. As evidenced by the simulation results, this enables one to substantially reduce computing times without sacrificing estimation accuracy and stability compared to the original non-smooth function-based method. The iterative smoothed estimator has an advantage in obtaining more precise estimates than its non-iterative version, although it requires longer computing times. For all these methods, estimates of the regression coefficients and their variances can be calculated at user-defined quantiles and base times, as long as they are identifiable. Additionally, the package provides features for plotting estimates with associated 95% confidence intervals against quantiles and base times using the generic plot function. These plots visualize patterns of estimates at different quantiles and base times, helping users to easily grasp the overall picture. The package qris and its included functions are verified through illustrations using simulated data with interpretation of the results demonstrated through a real data application.

Some possible directions for extending our package are as follows. Efforts can be made to reduce the computational burden associated with variance estimation, which currently accounts for a significant portion of the computing time. In particular, the iterative-induced smoothed method employs the partial multiplier bootstrap method to calculate variance estimates in each iteration. Since this method requires multiple iterations, it is crucial to explore more computationally efficient variance estimation procedures for each iteration to reduce the currently relatively longer computation time. One approach is to utilize a closed-form estimation of the mid-part of the sandwich-type variance, as discussed in Chiou et al. (2014; Choi et al. 2018). Implementing this direct variance estimation in each iteration is expected to further enhance computation efficiency. Another direction is to generalize the approaches to allow for the inclusion of sampling weights, which is useful for bias correction when failure time data are generated from non-random sampling designs, such as case-cohort designs (Prentice 1986; Chiou et al. 2015a). The current estimating functions implemented in the qris package assume that the data are randomly sampled, with sampling weights set to 1." To the best of our knowledge, there is a lack of model-checking procedures and model-comparison methods specifically designed for the non-smooth estimator, and a logical next step would be to develop these procedures for subsequent integration into the package.

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

Fitting a Quantile Regression Model for Residual Life with the R Package qris

1 Introduction

2 Semiparametric quantile regression for residual life

Estimation using non-smooth functions

Estimation using induced smoothed functions

Iterative procedure in induced smoothing estimation

3 Package implementation

4 Illustration

Simulated data

North Central Cancer Treatment Group Lung Cancer Data

5 Conclusion

6 CRAN packages used

7 CRAN Task Views implied by cited packages

8 Note

References

Reuse

Citation