Semi-competing risks refer to the setting where primary scientific interest lies in estimation and inference with respect to a non-terminal event, the occurrence of which is subject to a terminal event. In this paper, we present the R package **SemiCompRisks** that provides functions to perform the analysis of independent/clustered semi-competing risks data under the illness-death multi-state model. The package allows the user to choose the specification for model components from a range of options giving users substantial flexibility, including: accelerated failure time or proportional hazards regression models; parametric or non-parametric specifications for baseline survival functions; parametric or non-parametric specifications for random effects distributions when the data are cluster-correlated; and, a Markov or semi-Markov specification for terminal event following non-terminal event. While estimation is mainly performed within the Bayesian paradigm, the package also provides the maximum likelihood estimation for select parametric models. The package also includes functions for univariate survival analysis as complementary analysis tools.

Semi-competing risks refer to the general setting where primary scientific interest lies in estimation and inference with respect to a non-terminal event (e.g., disease diagnosis), the occurrence of which is subject to a terminal event (e.g., death) (Fine et al. 2001; Jazić et al. 2016). When there is a strong association between two event times, naïve application of a univariate survival model for non-terminal event time will result in overestimation of outcome rates as the analysis treats the terminal event as an independent censoring mechanism (Haneuse and Lee 2016). The semi-competing risks analysis framework appropriately treats the terminal event as a competing event and considers the dependence between non-terminal and terminal events as part of the model specification.

Toward formally describing the structure of semi-competing risks data, let \(T_{1}\) and \(T_{2}\) denote the times to the non-terminal and terminal events, respectively. From the modeling perspective, the focus in the semi-competing risks setting is to characterize the distribution \(T_{1}\) and its potential relationship with the distribution of \(T_{2}\), i.e. the joint distribution of (\(T_1\), \(T_2\)). For example, from an initial state (e.g., transplantation), as time progresses, a subject could make a transition into the non-terminal or terminal state (see Figure 1.a). In the case of a transition into the non-terminal state, the subject could subsequently transition into the terminal state even if these transitions cannot occur in the reverse order. The main disadvantage of the competing risks framework (see Figure 1.b) to the study of non-terminal event is that it does not utilize the information on the occurrence and timing of terminal event following the non-terminal event, which could be used to understand the dependence between the two events.

The current literature for the analysis of semi-competing risks data is
composed of three approaches: methods that specify the dependence
between non-terminal and terminal events via a copula
(Fine et al. 2001; Wang 2003; Jiang et al. 2005; Ghosh 2006; Peng and Fine 2007; Hsieh et al. 2008; Lakhal et al. 2008; Fu et al. 2013);
methods based on multi-state models, specifically the so-called
*illness-death* model
(Liu et al. 2004; Putter et al. 2007; Ye et al. 2007; Kneib and Hennerfeind 2008; Zeng and Lin 2009; Xu et al. 2010; Zeng et al. 2012; Han et al. 2014; Zhang et al. 2014; Lee et al. 2015, 2016);
and methods built upon the principles of causal inference
(Zhang and Rubin 2003; Egleston et al. 2007; Tchetgen Tchetgen 2014; Varadhan et al. 2014).

The *SemiCompRisks* package is designed to provide a comprehensive suite
of functions for the analysis of semi-competing risks data based on the
illness-death model, together with, as a complementary suite of tools,
functions for the analysis of univariate time-to-event data. While
Bayesian methods are used for estimation and inference for all available
models, maximum likelihood estimation is also provided for select
parametric models. Furthermore, *SemiCompRisks* offers flexible
parametric and non-parametric specifications for baseline survival
functions and cluster-specific random effects distributions under
accelerated failure time and proportional hazards models. The
functionality of the package covers methods proposed in a series of
recent papers on the analysis of semi-competing risks
data (Lee et al. 2015, 2016, 2017c).

The remainder of the paper is organized as follows.
Section 2 summarizes existing R packages that provide
methods for multi-state modeling, and explains the key contributions of
the *SemiCompRisks* package. Section 3 introduces an
on-going study of stem cell transplantation and provides a description
of the data available in the package. Section 4 presents
different specifications of models and estimation methods implemented in
our package. Section 5 summarizes the core components of
the *SemiCompRisks* package, including datasets, functions for fitting
models, functions, the structure of output provided to analysts.
Section 6 illustrates the usage of the main functions in
the package through three semi-competing risks analyses of the stem cell
transplantation data. Finally, Section 7 concludes with
discussion and an overview of the extensions we are working on.

As we elaborate upon below, the illness-death model for semi-competing
risks, that is the focus on the *SemiCompRisks* package, is a special
case of the broader class of multi-state models. Currently, there are
numerous R packages that permit estimation and inference for a
multi-state model and that could conceivably be used to analyze
semi-competing risks data.

The *mvna* package computes the Nelson-Aalen estimator of the cumulative
transition hazard for arbitrary Markov multi-state models with
right-censored and left-truncated data, but it does not compute
transition probability matrices (Allignol et al. 2008). The *TPmsm* implements
non-parametric and semi-parametric estimators for the transition
probabilities in 3-state models, including the Aalen-Johansen estimator
and estimators that are consistent even without Markov assumption or in
case of dependent censoring (Araújo et al. 2014). The *p3state.msm* package
performs inference in an illness-death model (Meira-Machado and Roca-Pardiñas 2011). Its main
feature is the ability for obtaining non-Markov estimates for the
transition probabilities. The *etm* package calculates the empirical
transition probability matrices and corresponding variance estimates for
any time-inhomogeneous multi-state model with finite state space and
data subject to right-censoring and left-truncation, but it does not
account for the influence of covariates (Allignol et al. 2011). The *msm*
package is able to fit time-homogeneous Markov models to panel count
data and hidden Markov models in continuous time (Jackson 2011). The
time-homogeneous Markov approach could be a particular case of the
illness-death model, where interval-censored data can be considered. The
*tdc.msm* package may be used to fit the time-dependent proportional
hazards model and multi-state regression models in continuous time, such
as Cox Markov model, Cox semi-Markov model, homogeneous Markov model,
non-homogeneous piecewise model, and non-parametric Markov model
(Meira-Machado et al. 2007). The *SemiMarkov* package performs parametric (Weibull or
exponentiated Weibull specification) estimation in a homogeneous
semi-Markov model (Król and Saint-Pierre 2015). Moreover, the effects of covariates on
the process evolution can be studied using a semi-parametric Cox model
for the distributions of sojourn times. The *flexsurv* package provides
functions for fitting and predicting from fully-parametric multi-state
models with Markov or semi-Markov specification (Jackson 2016). In
addition, the multi-state models implemented in *flexsurv* give the
possibility to include interval-censoring and some of them also
left-truncation. The *msSurv* calculates non-parametric estimation of
general multi-state models subject to independent right-censoring and
possibly left-truncation (Ferguson et al. 2012). This package also computes the
marginal state occupation probabilities along with the corresponding
variance estimates, and lower and upper confidence intervals. The
*mstate* package can be applied to right-censored and left-truncated
data in semi-parametric or non-parametric multi-state models with or
without covariates and it may also be used to competing risk models
(Wreede et al. 2011). Specifically for Cox-type illness-death models to
interval-censored data, we highlight the packages *coxinterval*
(Boruvka and Cook 2015) and *SmoothHazard* (Touraine et al. 2017), where the latter also
allows that the event times to be left-truncated. Finally, *frailtypack*
package permits the analysis of correlated data under select
clusterings, as well as the analysis of left-truncated data, through a
focus on frailty models using penalized likelihood estimation or
parametric estimation (Rondeau et al. 2012).

While these packages collectively provide broad functionality, each of
them is either non-specific to semi-competing risks or only permits
consideration of a narrow model specifications. In developing the
*SemiCompRisks* package, the goal was to provide a single package within
which a broad range of models and model specifications could be
entertained. The *frailtypack* package, for example, can also be used to
analyze cluster-correlated semi-competing risks data but it is
restricted to the proportional hazards model with either
patient-specific or cluster-specific random effects but not
both (Liquet et al. 2012). Furthermore, estimation/inference is within the
frequentist framework so that estimation of hospital-specific random
effects, of particular interest in health policy
applications (Lee et al. 2016), together with the quantification of
uncertainty is incredibly challenging. This, however, is (relatively)
easily achieved through the functionality of *SemiCompRisks* package.
Given the breadth of the functionality of the package, in addition to
the usual help files, we have developed a series of model-specific
vignettes which can be accessed through the CRAN (Lee et al. 2017b) or R
command `vignette("SemiCompRisks")`

, covering a total of 12 distinct
model specifications.

The example dataset used throughout this paper was obtained from the Center for International Blood and Marrow Transplant Research (CIBMTR), a collaboration between the National Marrow Donor Program and the Medical College of Wisconsin representing a worldwide network of transplant centers (Lee et al. 2017a). For illustrative purposes, we consider a hypothetical study in which the goal is to investigate risk factors for grade III or IV acute graft-versus-host disease (GVHD) among \(9,651\) patients who underwent the first allogeneic hematopoietic cell transplant (HCT) between January \(1999\) and December \(2011\).

As summarized in Table 1, after administratively censoring follow-up at \(365\) days post-transplant, each patient can be categorized according to their observed outcome information into four groups: (i) acute GVHD and death; (ii) acute GVHD and censored for death; (iii) death without acute GVHD; and (iv) censored for both. Furthermore, for each patient, the following covariates are available:gender (Male, Female); age (\(<\)10, 10-19, 20-29, 30-39, 40-49, 50-59, 60+); disease type (AML, ALL, CML, MDS); disease stage (Early, Intermediate, Advanced); and HLA compatibility (Identical sibling, 8/8, 7/8).

We note that due to confidentiality considerations the original study
outcomes (`time1`

, `time2`

, `event1`

, `event2`

: times and censoring
indicators to the non-terminal and terminal events) are not available in
*SemiCompRisks* package. As such we provide the five original covariates
together with estimates of parameters from the analysis of CIBMTR data,
so that one could simulate semi-competing risks outcomes (see the
simulation procedure in Appendix 9.2). Based on this, the
data shown in Table 1 reflects simulated outcome
data using \(1405\) as the seed.

Outcome category (%) | ||||||

Both acute GVHD & death | Acute GVHD & censored for death | Death without acute GVHD | Censored for both | |||

\(N\) | % | |||||

Total subjects | 9,651 | 100.0 | 9.5 | 8.9 | 28.8 | 52.8 |

`Gender` |
||||||

Male | 5,366 | 55.6 | 9.7 | 9.5 | 28.1 | 52.7 |

Female | 4,285 | 44.4 | 9.1 | 8.3 | 29.7 | 52.9 |

`Age` , years |
||||||

\(<\)10 | 653 | 6.8 | 5.0 | 11.9 | 23.4 | 59.7 |

10-19 | 1,162 | 12.0 | 8.0 | 11.4 | 24.0 | 56.6 |

20-29 | 1,572 | 16.3 | 9.7 | 9.9 | 27.4 | 53.0 |

30-39 | 1,581 | 16.4 | 9.8 | 10.7 | 28.5 | 51.0 |

40-49 | 2,095 | 21.7 | 11.0 | 9.6 | 29.7 | 49.7 |

50-59 | 2,008 | 20.8 | 9.8 | 5.1 | 32.3 | 52.8 |

60+ | 580 | 6.0 | 9.9 | 4.8 | 33.1 | 52.2 |

`Disease type` |
||||||

AML | 4,919 | 51.0 | 8.2 | 8.0 | 30.3 | 53.5 |

ALL | 2,071 | 21.5 | 9.9 | 9.0 | 29.3 | 51.8 |

CML | 1,525 | 15.8 | 12.1 | 11.3 | 22.2 | 54.4 |

MDS | 1,136 | 11.8 | 11.0 | 10.0 | 30.0 | 49.0 |

`Disease status` |
||||||

Early | 4,873 | 50.5 | 8.4 | 11.0 | 23.6 | 57.0 |

Intermediate | 2,316 | 24.0 | 9.7 | 8.5 | 30.1 | 51.7 |

Advanced | 2,462 | 25.5 | 11.5 | 5.4 | 37.7 | 45.4 |

`HLA compatibility` |
||||||

Identical sibling | 3,941 | 40.8 | 7.4 | 8.5 | 26.3 | 57.8 |

8/8 | 4,100 | 42.5 | 10.5 | 9.7 | 30.3 | 49.5 |

7/8 | 1,610 | 16.7 | 12.2 | 8.1 | 30.9 | 48.8 |

We offer three flexible multi-state illness-death models for the analysis of semi-competing risks data: accelerated failure time (AFT) models for independent data; proportional hazards regression (PHR) models for independent data; and PHR models for cluster-correlated data. These models accommodate parametric or non-parametric specifications for baseline survival functions as well as a Markov or semi-Markov assumptions for terminal event following non-terminal event.

In the AFT model specification, we directly model the connection between event times and covariates (Wei 1992). For the analysis of semi-competing risks data, we consider the following AFT model specifications under the illness-death modeling framework (Lee et al. 2017c): \[\begin{aligned} \label{eq:AFTind1} \log(T_{i1}) &=& \mathbf{x}_{i1}^{\top}\mathbf{\beta}_{1} + \gamma_{i} + \epsilon_{i1}, ~~ T_{i1} > 0, \end{aligned} \tag{1}\]

\[\begin{aligned} \label{eq:AFTind2} \log(T_{i2}) &=& \mathbf{x}_{i2}^{\top}\mathbf{\beta}_{2} + \gamma_{i} + \epsilon_{i2}, ~~ T_{i2} > 0, \end{aligned} \tag{2}\]

\[\begin{aligned} \label{eq:AFTind3} \log(T_{i2}-T_{i1}) &=& \mathbf{x}_{i3}^{\top}\mathbf{\beta}_{3} + \gamma_{i} + \epsilon_{i3}, ~~ T_{i2} > T_{i1}, \end{aligned} \tag{3}\] where \(T_{i1}\) and \(T_{i2}\) denote the times to the non-terminal and terminal events, respectively, from subject \(i=1,\ldots,n\), \(\mathbf{x}_{ig}\) is a vector of transition-specific covariates, \(\mathbf{\beta}_{g}\) is a corresponding vector of transition-specific regression parameters, and \(\epsilon_{ig}\) is a transition-specific random variable whose distribution determines that of the corresponding transition time, \(g \in \left\{1, 2, 3\right\}\). Finally, in each of ((1))-((3)), \(\gamma_{i}\) is a study subject-specific random effect that induces positive dependence between the two event times. We assume that \(\gamma_{i}\) follows a Normal(0, \(\theta\)) distribution and adopt a conjugate inverse Gamma distribution, denoted by IG\((a^{(\theta)}, b^{(\theta)})\) for the variance component \(\theta\). For regression parameters \(\mathbf{\beta}_{g}\), we adopt non-informative flat prior on the real line.

From models ((1))-((3)), we can adopt either
a fully parametric or a semi-parametric approach depending on the
specification of the distributions for \(\epsilon_{i1}\), \(\epsilon_{i2}\),
\(\epsilon_{i3}\). We build a parametric modeling based on the log-Normal
formulation, where \(\epsilon_{ig}\) follows a
Normal\((\mu_{g}, \sigma^{2}_{g})\) distribution. We adopt non-informative
flat priors on the real line for \(\mu_{g}\) and independent
IG\((a_{g}^{(\sigma)},b_{g}^{(\sigma)})\) for \(\sigma_{g}^{2}\). As an
alternative, a semi-parametric framework can be considered by adopting
independent non-parametric Dirichlet process mixtures (DPM) of \(M_{g}\)
Normal\((\mu_{gr}, \sigma^{2}_{gr})\) distributions,
\(r \in \left\{1,\ldots,M_{g}\right\}\), for each \(\epsilon_{ig}\).
Following convention in the literature, we refer to each component
Normal distribution as being specific to some "class" (Neal 2000).
Since the class-specific \((\mu_{gr}, \sigma^{2}_{gr})\) are unknown, they
are assumed to be draws from a so-called the *centering distribution*.
Specifically, we take a Normal distribution centered at \(\mu_{g0}\) with
a variance \(\sigma_{g0}^{2}\) for \(\mu_{gr}\) and an
IG\((a_{g}^{(\sigma_{gr})}, b_{g}^{(\sigma_{gr})})\) for
\(\sigma_{gr}^{2}\). Furthermore, since the "true" class membership for
any given study subject is unknown, we let \(p_{gr}\) denote the
probability of belonging to the \(r\)th class for transition \(g\) and
\(\mathbf{p}_{g}=(p_{g1},\ldots,p_{gM_{g}})^{\top}\) the collection of
such probabilities. In the absence of prior knowledge regarding the
distribution of class memberships for the \(n\) subjects across the
\(M_{g}\) classes, \(\mathbf{p}_{g}\) is assumed to follow a conjugate
symmetric Dirichlet\((\tau_{g}/M_{g},\ldots,\tau_{g}/M_{g})\)
distribution, where \(\tau_{g}\) is referred to as the *precision
parameter* (for more details, see Lee et al. 2017c).

Our AFT modeling framework can also handle interval-censored and/or left-truncated semi-competing risks data. Suppose that subject \(i\) was observed at follow-up times \(\left\{c_{i1},\ldots, c_{im_{i}}\right\}\) and let \(c_{i}^{*}\) and \(L_{i}\) denote the time to the end of study (or administrative right-censoring) and the time at study entry (i.e., the left-truncation time), respectively. Considering interval-censoring for both events, \(T_{i1}\) and \(T_{i2}\), for \(i=1,\ldots,n\), satisfy \(c_{ij} \leq T_{i1} < c_{ij+1}\) for some \(j\) and \(c_{ik} \leq T_{i2} < c_{ik+1}\) for some \(k\), respectively. Therefore, the observed outcome information for interval-censored and left-truncated semi-competing risks data for the subject \(i\) can be represented by \(\{L_i, c_{ij}, c_{ij+1}, c_{ik}, c_{ik+1}\}\).

We consider an illness-death multi-state model with proportional hazards assumptions characterized by three hazard functions (see Figure 1.a) that govern the rates at which subjects transition between the states: a cause-specific hazard for non-terminal event, \(h_{1}(t_{i1})\); a cause-specific hazard for terminal event, \(h_{2}(t_{i2})\); and a hazard for terminal event conditional on a time for non-terminal event, \(h_{3}(t_{i2} \mid t_{i1})\). We consider the following specification for hazard functions (Xu et al. 2010; Lee et al. 2015): \[\begin{aligned} \label{eq:PHRind1} h_{1}(t_{i1} \mid \gamma_{i}, \mathbf{x}_{i1}) &=& \gamma_{i}\,h_{01}(t_{i1})\exp(\mathbf{x}_{i1}^{\top}\mathbf{\beta}_{1}), ~~ t_{i1} > 0, \end{aligned} \tag{4}\]

\[\begin{aligned} \label{eq:PHRind2} h_{2}(t_{i2} \mid \gamma_{i}, \mathbf{x}_{i2}) &=& \gamma_{i}\,h_{02}(t_{i2})\exp(\mathbf{x}_{i2}^{\top}\mathbf{\beta}_{2}), ~~ t_{i2} > 0, \end{aligned} \tag{5}\]

\[\begin{aligned} \label{eq:PHRind3} h_{3}(t_{i2} \mid t_{i1}, \gamma_{i}, \mathbf{x}_{i3}) &=& \gamma_{i}\,h_{03}(z(t_{i1},t_{i2}))\exp(\mathbf{x}_{i3}^{\top}\mathbf{\beta}_{3}), ~~ t_{i2} > t_{i1}, \end{aligned} \tag{6}\] where \(h_{0g}\) is an unspecified baseline hazard function and \(\mathbf{\beta}_{g}\) is a vector of log-hazard ratio regression parameters associated with the covariates \(\mathbf{x}_{ig}\). Finally, in each of ((4))-((6)), \(\gamma_{i}\) is a study subject-specific shared frailty following a Gamma(\(\theta^{-1}\), \(\theta^{-1}\)) distribution, parametrized so that \(E\left[\gamma_{i}\right]=1\) and \(V\left[\gamma_{i}\right]=\theta\). The model ((6)) is referred to as being Markov or semi-Markov depending on whether we assume \(z(t_{i1},t_{i2})=t_{i2}\) or \(z(t_{i1},t_{i2})=t_{i2}-t_{i1}\), respectively.

The Bayesian approach for models ((4))-((6))
requires the specification of prior distributions for unknown
parameters. For the regression parameters \(\mathbf{\beta}_{g}\), we adopt
a non-informative flat prior distribution on the real line. For the
variance in the subject-specific frailties, \(\theta\), we adopt a
Gamma\((a^{(\theta)}, b^{(\theta)})\) for the precision \(\theta^{-1}\). For
the parametric specification for baseline hazard functions, we consider
a Weibull model:
\(h_{0g}(t)=\alpha_{g} \, \kappa_{g} \, t^{\alpha_{g}-1}\). We assign a
Gamma\((a_{g}^{(\alpha)},b_{g}^{(\alpha)})\) for \(\alpha_{g}\) and a
Gamma\((c_{g}^{(\kappa)},d_{g}^{(\kappa)})\) for \(\kappa_{g}\). As an
alternative, a non-parametric piecewise exponential model (PEM) is
considered for baseline hazard functions based on taking each of the
log-baseline hazard functions to be a flexible mixture of piecewise
constant function. Let \(s_{g,max}\) denote the largest observed event
time for each transition and construct a finite partition of the time
axis,
\(0=s_{g,0} < s_{g,1} < s_{g,2} < \ldots < s_{g,K_{g}+1} = s_{g,max}\).
Letting
\(\mathbf{\lambda}_{g}=(\lambda_{g,1},\ldots,\lambda_{g,K_{g}},\lambda_{g,K_{g}+1})^{\top}\)
denote the heights of the log-baseline hazard function on the disjoint
intervals based on the time splits
\(\mathbf{s}_{g}=(s_{g,1}, \ldots, s_{g,K_{g}+1})^{\top}\), we assume that
\(\mathbf{\lambda}_{g}\) follows a multivariate Normal distribution (MVN),
MVN\((\mu_{\lambda_{g}}\mathbf{1},\sigma_{\lambda_{g}}^{2}\mathbf{\Sigma}_{\lambda_{g}})\),
where \(\mu_{\lambda_{g}}\) is the overall mean,
\(\sigma_{\lambda_{g}}^{2}\) represents a common variance component for
the \(K_{g}+1\) elements, and \(\mathbf{\Sigma}_{\lambda_{g}}\) specifies
the covariance structure these elements. We adopt a flat prior on the
real line for \(\mu_{\lambda_{g}}\) and a conjugate
Gamma\((a_{g}^{(\sigma)},b_{g}^{(\sigma)})\) distribution for the
precision \(\sigma_{\lambda_{g}}^{-2}\). In order to relax the assumption
of fixed partition of the time scales, we adopt a
Poisson\((\alpha_{g}^{(K)})\) prior for the number of splits, \(K_{g}\), and
conditioned on the number of splits, we consider locations,
\(\mathbf{s}_{g}\), to be *a priori* distributed as the even-numbered
order statistics:
\[\label{eq:locpartition}
\pi(\mathbf{s}_{g} \mid K_{g}) \propto \frac{(2K_{g}+1)!\prod_{k=1}^{K_{g}+1}(s_{g,k}-s_{g,k-1})}{(s_{g,K_{g}+1})^{2K_{g}+1}}. \tag{7}\]

Note that the prior distributions of \(K_{g}\) and \(\mathbf{s}_{g}\) jointly form a time-homogeneous Poisson process prior for the partition \((K_{g},\mathbf{s}_{g})\). For more details, see Lee et al. (2015).

Lee et al. (2016) proposed hierarchical models that accommodate correlation in the joint distribution of the non-terminal and terminal events across patients for the setting where patients are clustered within hospitals. The hierarchical models for cluster-correlated semi-competing risks data build upon the illness-death model given in ((4))-((6)). Let \(T_{ji1}\) and \(T_{ji2}\) denote the times to the non-terminal and terminal events for the \(i\)th subject in the \(j\)th cluster, respectively, for \(i=1,\ldots,n_{j}\) and \(j=1,\ldots,J\). The general modeling specification is given by: \[\begin{aligned} \label{eq:PHRclus1} h_{1}(t_{ji1} \mid \gamma_{ji}, \mathbf{x}_{ji1}, V_{j1}) &=& \gamma_{ji}\,h_{01}(t_{ji1})\exp(\mathbf{x}_{ji1}^{\top}\mathbf{\beta}_{1} + V_{j1}), ~~ t_{ji1} > 0, \end{aligned} \tag{8}\]

\[\begin{aligned} \label{eq:PHRclus2} h_{2}(t_{ji2} \mid \gamma_{ji}, \mathbf{x}_{ji2}, V_{j2}) &=& \gamma_{ji}\,h_{02}(t_{ji2})\exp(\mathbf{x}_{ji2}^{\top}\mathbf{\beta}_{2} + V_{j2}), ~~ t_{ji2} > 0, \end{aligned} \tag{9}\]

\[\begin{aligned} \label{eq:PHRclus3} h_{3}(t_{ji2} \mid t_{ji1}, \gamma_{ji}, \mathbf{x}_{ji3}, V_{j3}) &=& \gamma_{ji}\,h_{03}(z(t_{ji1},t_{ji2}))\exp(\mathbf{x}_{ji3}^{\top}\mathbf{\beta}_{3} + V_{j3}), ~~ t_{ji2} > t_{ji1}, \end{aligned} \tag{10}\] where \(h_{0g}\) is an unspecified baseline hazard function and \(\mathbf{\beta}_{g}\) is a vector of log-hazard ratio regression parameters associated with the covariates \(\mathbf{x}_{jig}\). A study subject-specific shared frailty \(\gamma_{ji}\) is assumed to follow a Gamma(\(\theta^{-1}\), \(\theta^{-1}\)) distribution and \(\mathbf{V}_{j}=(V_{j1}, V_{j2}, V_{j3})^{\top}\) is a vector of cluster-specific random effects, each specific to one of the three possible transitions.

From a Bayesian perspective for models ((8))-((10)), we can adopt either a parametric Weibull or non-parametric PEM specification for baseline hazard functions \(h_{0g}\) with their respective configurations of prior distributions analogous to those outlined in Section 4.2. For the parametric specification of cluster-specific random effects, we assume that \(\mathbf{V}_{j}\) follows MVN\(_{3}(\mathbf{0},\mathbf{\Sigma}_{V})\) distribution. We adopt a conjugate inverse-Wishart\((\mathbf{\Psi}_{v},\rho_{v})\) prior for the variance-covariance matrix \(\mathbf{\Sigma}_{V}\). For the non-parametric specification, we adopt a DPM of MVN distributions with a centering distribution, \(G_{0}\), and a precision parameter, \(\tau\). Here we take \(G_{0}\) to be a multivariate Normal/inverse-Wishart (NIW) distribution for which the probability density function can be expressed as the product: \[\label{eq:NIW} f_{NIW}(\mathbf{\mu},\mathbf{\Sigma} \mid \mathbf{\Psi}_{0},\rho_{0}) = f_{MVN}(\mathbf{\mu} \mid \mathbf{0}, \mathbf{\Sigma}) \times f_{inverse-Wishart}(\mathbf{\Sigma} \mid \mathbf{\Psi}_{0}, \rho_{0}), \tag{11}\] where \(\mathbf{\Psi}_{0}\) and \(\rho_{0}\) are the hyperparameters of \(f_{NIW}(\cdot)\). We assign a Gamma\((a_{\tau},b_{\tau})\) prior distribution for \(\tau\). Finally, for \(\mathbf{\beta}_{g}\) and \(\theta\), we adopt the same priors as those adopted for the model in Section 4.2. For more details, see Lee et al. (2016).

Bayesian estimation and inference is available for all models in the
*SemiCompRisks*. Additionally, one may also choose to use maximum
likelihood estimation for the parametric Weibull PHR model described in
Section 4.2.

To perform Bayesian estimation and inference, we use a random scan Gibbs sampling algorithm to generate samples from the full posterior distribution. Depending on the complexity of the model adopted, the Markov chain Monte Carlo (MCMC) scheme may also include additional strategies, such as Metropolis-Hastings and reversible jump MCMC (Metropolis-Hastings-Green) steps. Specific details of each implementation can be seen in the online supplemental materials of Lee et al. (2015, 2016, 2017c).

The *SemiCompRisks* package contains three key functions, `FreqID_HReg`

,
`BayesID_HReg`

and

`BayesID_AFT`

, focused on models for semi-competing risks data as well
as the analogous univariate survival models, `FreqSurv_HReg`

,
`BayesSurv_HReg`

and `BayesSurv_AFT`

. It also provides two auxiliary
functions, `initiate.startValues_HReg`

and `initiate.startValues_AFT`

,
that can be used to generate initial values for Bayesian estimation;
`simID`

and `simSurv`

functions for simulating semi-competing risks and
univariate survival data, respectively; five covariates and parameter
estimates from `CIBMTR`

data; and the `BMT`

dataset referring to 137
bone marrow transplant patients.

Table 2 shows the modeling options implemented
in the *SemiCompRisks* package for both semi-competing risks and
univariate analysis. Specifically, we categorize the approaches based on
the analysis type (semi-competing risks or univariate), the survival
model (AFT or PHR), data type (independent or clustered), accommodation
to left-truncation and/or interval-censoring in addition to
right-censoring, and also statistical paradigms (frequentist or
Bayesian).

Analysis | Model | Data type | L-T and/or I-C | Statistical paradigm |
---|---|---|---|---|

AFT | Independent | No | B | |

Yes | B | |||

Clustered | No | x | ||

Semi-competing | Yes | x | ||

risks | PHR | Independent | No | B & F |

Yes | x | |||

Clustered | No | B | ||

Yes | x | |||

Univariate | AFT | Independent | No | B |

Yes | B | |||

Clustered | No | x | ||

Yes | x | |||

PHR | Independent | No | B & F | |

Yes | x | |||

Clustered | No | B | ||

Yes | x |

L-T: left-truncation; I-C: interval-censoring; B: Bayesian; F: frequentist; x: not available

The full description of functionality of the *SemiCompRisks* package can
be accessed through the R command `help("SemiCompRisks")`

or
`vignette("SemiCompRisks")`

which provides in detail the specification
of all models implemented in the package. Below we describe the input
data format and some crucial arguments for defining and fitting a model
for semi-competing risks data using the *SemiCompRisks* package.

From a semi-competing risks dataset, we jointly define the outcomes and
covariates in a `Formula`

object. Here we use the `simCIBMTR`

dataset,
obtained from the simulation procedure presented in
Appendix 9.2:

```
> form <- Formula(time1 + event1 | time2 + event2 ~ dTypeALL + dTypeCML +
R+ dTypeMDS + sexP | dTypeALL + dTypeCML + dTypeMDS | dTypeALL +
+ dTypeCML + dTypeMDS)
```

The outcomes `time1`

, `time2`

, `event1`

and `event2`

denote the times
and censoring indicators to the non-terminal and terminal events,
respectively, and the covariates of each hazard function are separated
by \(|\) (vertical bar).

The specification of the `Formula`

object varies slightly if the
semi-competing risks model accommodates left-truncated and/or
interval-censored data (see vignette documentation Lee et al. (2017b)).

Most functions for semi-competing risks analysis in the *SemiCompRisks*
package take common arguments. These arguments and their descriptions
are shown as follows:

`id`

: a vector of cluster information for \(n\) subjects, where cluster membership corresponds to one of the positive integers \(1,\ldots,J\).`model`

: a character vector that specifies the type of components in a model. It can have up to three elements depending on the model specification. The first element is for the assumption on \(h_{3}\): "semi-Markov" or "Markov". The second element is for the specification of baseline hazard functions for PHR models - "Weibull" or "PEM" - or baseline survival distribution for AFT models - "LN" (log-Normal) or "DPM". The third element needs to be set only for clustered semi-competing risks data and is for the specification of cluster-specific random effects distribution: "MVN" or "DPM".`hyperParams`

: a list containing vectors for hyperparameter values in hierarchical models.`startValues`

: a list containing vectors of starting values for model parameters.`mcmcParams`

: a list containing variables required for MCMC sampling.

Hyperparameter values, starting values for model parameters, and MCMC arguments depend on the specified Bayesian model and the assigned prior distributions. For a list of illustrations, see vignette documentation Lee et al. (2017b).

`FreqID_HReg`

The function `FreqID_HReg`

fits Weibull PHR models for independent
semi-competing risks data, as in
((4))-((6)), based on maximum likelihood
estimation. Its default structure is given by:

`FreqID_HReg(Formula, data, model="semi-Markov", frailty=TRUE),`

where `Formula`

represents the outcomes and the linear predictors
jointly, as presented in Section 5.1; `data`

is a data
frame containing the variables named in `Formula`

; `model`

is one of the
critical arguments of the *SemiCompRisks* package (see
Section 5.1), in which it specifies the type of model
based on the assumption on \(h_{3}(t_{i2} \mid t_{i1}, \cdot)\) in
((6)). Here, `model`

can be "`Markov`

" or
"`semi-Markov`

". Finally, `frailty`

is a logical value (`TRUE`

or
`FALSE`

) to determine whether to include the subject-specific shared
frailty term \(\gamma\) into the illness-death model.

`BayesID_HReg`

The function `BayesID_HReg`

fits parametric and semi-parametric PHR
models for independent or cluster-correlated semi-competing risks data,
as in ((4))-((6)) or
((8))-((10)), based on Bayesian inference.
Its default structure is given by:

```
BayesID_HReg(Formula, data, id=NULL, model=c("semi-Markov","Weibull"), hyperParams,
path=NULL). startValues, mcmcParams,
```

`Formula`

and `data`

are analogous to the previous case; `id`

, `model`

,
`hyperParams`

, `startValues`

, and `mcmcParams`

are all critical
arguments of the *SemiCompRisks* package (see
Section 5.1), where `id`

indicates the cluster that each
subject belongs to (for independent data, `id=NULL`

); `model`

allows us
to specify either "`Markov`

" or "`semi-Markov`

" assumption, whether
the priors for baseline hazard functions are parametric ("`Weibull`

")
or non-parametric ("`PEM`

"), and whether the cluster-specific random
effects distribution is parametric ("`MVN`

") or non-parametric
("`DPM`

"). The third element of `model`

is only required for models
for clustered-correlated data given in
((8))-((10)).

The `hyperParams`

argument defines all model hyperparameters: `theta`

(a
numeric vector for hyperparameters, \(a^{(\theta)}\) and \(b^{(\theta)}\),
in the prior of subject-specific frailty variance component), `WB`

(a
list containing numeric vectors for Weibull hyperparameters
(\(a_{g}^{(\alpha)}\), \(b_{g}^{(\alpha)}\)) and (\(c_{g}^{(\kappa)}\),
\(d_{g}^{(\kappa)}\)) for \(g \in \left\{1, 2, 3\right\}\): `WB.ab1`

,
`WB.ab2`

, `WB.ab3`

, `WB.cd1`

, `WB.cd2`

, `WB.cd3`

), `PEM`

(a list
containing numeric vectors for PEM hyperparameters (\(a_{g}^{(\sigma)}\),
\(b_{g}^{(\sigma)}\)), and \(\alpha_{g}^{(K)}\) for
\(g \in \left\{1, 2, 3\right\}\): `PEM.ab1`

, `PEM.ab2`

, `PEM.ab3`

,
`PEM.alpha1`

, `PEM.alpha2`

, `PEM.alpha3`

); and for the analysis of
clustered semi-competing risks data, additional components are required:
`MVN`

(a list containing numeric vectors for MVN hyperparameters
\(\mathbf{\Psi}_{v}\) and \(\rho_{v}\): `Psi_v`

, `rho_v`

), `DPM`

(a list
containing numeric vectors for DPM hyperparameters \(\mathbf{\Psi}_{0}\),
\(\rho_{0}\), \(a_{\tau}\), and \(b_{\tau}\): `Psi0`

, `rho0`

, `aTau`

, `bTau`

).

The `startValues`

argument specifies initial values for model
parameters. This specification can be done manually or through the
auxiliary function `initiate.startValues_HReg`

. The `mcmcParams`

argument sets the information for MCMC sampling: `run`

(a list
containing numeric values for setting for the overall run: `numReps`

,
total number of scans; `thin`

, extent of thinning; `burninPerc`

, the
proportion of burn-in), `storage`

(a list containing numeric values for
storing posterior samples for subject- and cluster-specific random
effects: `nGam_save`

, the number of \(\gamma\) to be stored; `storeV`

, a
vector of three logical values to determine whether all the posterior
samples of \(\mathbf{V}_{j}\), for \(j=1,\ldots,J\) are to be stored),
`tuning`

(a list containing numeric values relevant to tuning parameters
for specific updates in Metropolis-Hastings-Green (MHG) algorithm:
`mhProp_theta_var`

, the variance of proposal density for \(\theta\);
`mhProp_Vg_var`

, the variance of proposal density for \(\mathbf{V}_{j}\)
in DPM models; `mhProp_alphag_var`

, the variance of proposal density for
\(\alpha_{g}\) in Weibull models; `Cg`

, a vector of three proportions that
determine the sum of probabilities of choosing the birth and the death
moves in PEM models (the sum of the three elements should not exceed
\(0.6\)); `delPertg`

, the perturbation parameters in the birth update in
PEM models (the values must be between \(0\) and \(0.5\)); `rj.scheme`

: if
`rj.scheme`

=1, the birth update will draw the proposal time split from
1:`sg_max`

and if `rj.scheme`

=2, the birth update will draw the proposal
time split from uniquely ordered failure times in the data. For PEM
models, additional components are required: `Kg_max`

, the maximum number
of splits allowed at each iteration in MHG algorithm for PEM models;
`time_lambda1`

, `time_lambda2`

, `time_lambda3`

, time points at which the
posterior distribution of log-hazard functions are calculated. Finally,
`path`

indicates the name of directory where the results are saved. For
more details and examples, see Lee et al. (2017b).

`BayesID_AFT`

The function `BayesID_AFT`

fits parametric and semi-parametric AFT
models for independent semi-competing risks data, given in
((1))-((3)), based on Bayesian inference.
Its default structure is given by:

`BayesID_AFT(Formula, data, model="LN", hyperParams, startValues, mcmcParams, path=NULL),`

where `data`

, `startValues`

(auxiliary function
`initiate.startValues_AFT`

), and `path`

are analogous to functions
described in previous sections. Here, `Formula`

has a different
structure of outcomes, since the AFT model accommodates more complex
censoring, such as interval-censoring and/or left-truncation (see
Section 4.1). It takes the generic form
`Formula(LT | y1L + y1U | y2L + y2U cov1 | cov2 | cov3)`

, where `LT`

represents the left-truncation time, (`y1L`

, `y1U`

) and (`y2L`

, `y2U`

)
are the interval-censored times to the non-terminal and terminal events,
respectively, and `cov1`

, `cov2`

and `cov3`

are covariates of each
linear regression. The `model`

argument specifies whether the baseline
survival distribution is parametric ("`LN`

") or non-parametric
("`DPM`

"). The `hyperParams`

argument defines all model
hyperparameters: `theta`

is for hyperparameters (\(a^{(\theta)}\) and
\(b^{(\theta)})\)); `LN`

is a list containing numeric vectors, `LN.ab1`

,
`LN.ab2`

, `LN.ab3`

, for log-Normal hyperparameters (\(a_{g}^{(\sigma)}\),
\(b_{g}^{(\sigma)}\)) with \(g \in \left\{1, 2, 3\right\}\); `DPM`

is a list
containing numeric vectors, `DPM.mu1`

, `DPM.mu2`

, `DPM.mu3`

,
`DPM.sigSq1`

, `DPM.sigSq2`

, `DPM.sigSq3`

, `DPM.ab1`

, `DPM.ab2`

,
`DPM.ab3`

, `Tau.ab1`

, `Tau.ab2`

, `Tau.ab3`

for DPM hyperparameters
(\(\mu_{g0}\), \(\sigma_{g0}^{2}\)), (\(a_{g}^{(\sigma_{gr})}\),
\(b_{g}^{(\sigma_{gr})}\)), and \(\tau_{g}\) with
\(g \in \left\{1, 2, 3\right\}\). The `mcmcParams`

argument sets the
information for MCMC sampling: `run`

(see Section 5.3),
`storage`

(`nGam_save`

; `nY1_save`

, the number of `y1`

to be stored;
`nY2_save`

, the number of `y2`

to be stored; `nY1.NA_save`

, the number
of `y1==NA`

to be stored), `tuning`

(`betag.prop.var`

, the variance of
proposal density for \(\mathbf{\beta}_{g}\); `mug.prop.var`

, the variance
of proposal density for \(\mu_{g}\); `zetag.prop.var`

, the variance of
proposal density for \(1/\sigma_{g}^{2}\); `gamma.prop.var`

, the variance
of proposal density for \(\gamma\)).

The functions `FreqSurv_HReg`

, `BayesSurv_HReg`

and `BayesSurv_AFT`

provide the same flexibility as functions `FreqID_HReg`

, `BayesID_HReg`

and `BayesID_AFT`

, respectively, but in a univariate context (i.e., a
single outcome).

The function `FreqSurv_HReg`

fits a Weibull PHR model based on maximum
likelihood estimation. This model is described by:
\[\label{eq:FuniPHR}
h(t_{i} \mid \mathbf{x}_{i}) = \alpha \, \kappa \, t_{i}^{\alpha-1}\exp(\mathbf{x}_{i}^{\top}\mathbf{\beta}), ~~ t_{i} > 0. \tag{12}\]

The function `BayesSurv_HReg`

implements Bayesian PHR models given by:
\[\label{eq:BuniPHR}
h(t_{ji} \mid \mathbf{x}_{ji}) = h_{0}(t_{ji})\exp(\mathbf{x}_{ji}^{\top}\mathbf{\beta} + V_{j}), ~~ t_{i} > 0. \tag{13}\]

We can adopt either a parametric Weibull or a non-parametric PEM specification for \(h_{0}\). Cluster-specific random effects \(V_{j}\), \(j=1,\ldots,J\), can be assumed to follow a parametric Normal distribution or a non-parametric DPM of Normal distributions.

Finally, the function `BayesSurv_AFT`

implements Bayesian AFT models
expressed by:
\[\label{eq:BuniAFT}
\log(T_{i}) = \mathbf{x}_{i}^{\top}\mathbf{\beta} + \epsilon_{i}, ~~ T_{i} > 0, \tag{14}\]
where we can adopt either a fully parametric log-Normal or a
non-parametric DPM specification for \(\epsilon_{i}\).

The functions presented in Sections 5.2,
5.3 and 5.4 return objects of classes
`Freq_HReg`

, `Bayes_HReg`

and `Bayes_AFT`

, respectively. Each of these
objects represents results from its respective semi-competing risks
analysis. These results can be visualized using several R methods, such
as `print`

, `summary`

, `predict`

, `plot`

, `coef`

, and `vcov`

.

The function `print`

shows the estimated parameters and, in the Bayesian
case, also the MCMC description (number of chains, scans, thinning, and
burn-in) and the potential scale reduction factor (PSRF) convergence
diagnostic for each model parameter (Gelman and Rubin 1992; Brooks and Gelman 1998). If the
PSRF is close to 1, a group of chains have mixed well and have converged
to a stable distribution. The function `summary`

presents the regression
parameters in exponential format (hazard ratios) and the estimated
baseline hazard function components. Along with a summary of analysis
results, the output from `summary`

includes two model diagnostics and
performance metrics, log-pseudo marginal likelihood (LPML)
(Geisser and Eddy 1979; Gelfand and Mallick 1995) and deviance information
criterion (DIC) (Spiegelhalter et al. 2002; Celeux et al. 2006), for
Bayesian illness-death models.

Functions `predict`

and `plot`

complement each other. The former uses
the fitted model to predict an output of interest (survival or hazard)
at a given time interval from new covariates. From the object created by
`predict`

, `plot`

displays survival (`plot.est="Surv"`

) or hazard
(`plot.est="Haz"`

) functions with their respective
credibility/confidence intervals. In order to predict the joint
probability involving two event times for a new covariate profile, one
can use the function `PPD`

, which is calculated from the joint posterior
predictive distribution of \((T_1, T_2)\) (Lee et al. 2015).

*SemiCompRisks* also provides the standard functions `coef`

(model
coefficients) and `vcov`

(variance-covariance matrix for a fitted
frequentist model). For examples with more details, see Lee et al. (2017b).

The function `simID`

simulates semi-competing risks outcomes from
independent or cluster-correlated data (for more details of the
simulation algorithm, see Appendix 9.1). The simulation is
based on a semi-Markov Weibull PHR modeling and, in the case of the
cluster-correlated approach, the cluster-specific random effects follow
a MVN distribution. We provide a simulation example of independent
semi-competing risks data in Appendix 9.2.

Analogously, the function `simSurv`

simulates univariate
independent/cluster-correlated survival data under a Weibull PHR model
with cluster-specific random effects following a Normal distribution.

It is composed of \(5\) covariates that come from a study of acute GVHD with \(9,651\) patients who underwent the first allogeneic hematopoietic cell transplant between January \(1999\) and December \(2011\) (see Section 3).

It refers to a well-known study of bone marrow transplantation for acute
leukemia (Klein and Moeschberger 2003). This data frame contains \(137\) patients with \(22\)
variables and its description can be viewed from the R command
`help(BMT)`

.

To illustrate the usage of the *SemiCompRisks* package, we present two
PHR models (one parametric model with maximum likelihood estimation and
another semi-parametric model based on Bayesian inference) and one
Bayesian AFT model using stem cell transplantation data, described in
Section 3.

In our first example we employ the modeling
((4))-((6)) for independent data,
semi-Markov assumption and Weibull baseline hazards. Here, `Formula`

(`form`

) is defined as in Section 5.1. We fit the model
using the function `FreqID_HReg`

, described in
Section 5.2, and visualize the results through the
function `summary`

:

```
> fitFreqPHR <- FreqID_HReg(form, data=simCIBMTR, model="semi-Markov")
R> summary(fitFreqPHR)
R
-competing risks data
Analysis of independent semi-Markov assumption for h3
semi: 0.05
Confidence level
:
Hazard ratios
beta1 LL UL beta2 LL UL beta3 LL UL1.49 1.20 1.8 1.37 1.09 1.7 0.99 0.78 1.3
dTypeALL 1.78 1.41 2.3 0.83 0.64 1.1 1.30 0.99 1.7
dTypeCML 1.64 1.26 2.1 1.39 1.04 1.9 1.49 1.09 2.0
dTypeMDS 0.89 0.79 1.0 NA NA NA NA NA NA
sexP
:
Variance of frailties
Estimate LL UL7.8 7.3 8.4
theta
function components:
Baseline hazard -PM LL UL h2-PM LL UL h3-PM LL UL
h1: log-kappa -6.14 -6.4 -5.90 -11.33 -11.74 -10.93 -6.873 -7.189 -6.557
Weibull: log-alpha 0.15 0.1 0.21 0.86 0.82 0.91 0.022 -0.033 0.077 Weibull
```

As shown in Section 5.6, `summary`

provides estimates of
all model parameters. Using the auxiliary functions `predict`

(default
option `x1new=x2new=x3new=NULL`

which corresponds to the baseline
specification) and `plot`

, we can graphically visualize the results:

```
> pred <- predict(fitFreqPHR, time=seq(0,365,1), tseq=seq(from=0,to=365,by=30))
R> plot(pred, plot.est="Surv")
R> plot(pred, plot.est="Haz") R
```

Figure 2 displays estimated baseline survival and hazard functions (solid line) with their corresponding \(95\%\) confidence intervals (dotted line).

Our second example is also based on the models
((4))-((6)) adopting a semi-Markov
assumption for \(h_{3}\), but now we use the non-parametric PEM
specification for baseline hazard functions. Again, `Formula`

is defined
as in Section 5.1. Here we employ the Bayesian
estimation by means of the function `BayesID_HReg`

, described in
Section 5.3. The first step is to specify initial values
for model parameters through the `startValues`

argument using the
auxiliary function `initiate.startValues_HReg`

:

```
> startValues <- initiate.startValues_HReg(form, data=simCIBMTR,
R+ model=c("semi-Markov","PEM"), nChain=3)
```

The `nChain`

argument indicates the number of Markov chains that will be
used in the MCMC algorithm. Next step is to define all model
hyperparameters using the `hyperParams`

argument:

```
> hyperParams <- list(theta=c(0.5,0.05), PEM=list(PEM.ab1=c(0.5,0.05),
R+ PEM.ab2=c(0.5,0.05), PEM.ab3=c(0.5,0.05), PEM.alpha1=10,
+ PEM.alpha2=10, PEM.alpha3=10))
```

To recall what prior distributions are related to these hyperparameters,
see Section 4.3. Now we set the MCMC configuration
for the `mcmcParams`

argument, more specifically defining the overall
run, storage, and tuning parameters for specific updates:

```
> sg_max <- c(max(simCIBMTR$time1[simCIBMTR$event1==1]),
R+ max(simCIBMTR$time2[simCIBMTR$event1==0 & simCIBMTR$event2==1]),
+ max(simCIBMTR$time2[simCIBMTR$event1==1 & simCIBMTR$event2==1]))
> mcmcParams <- list(run=list(numReps=5e6, thin=1e3, burninPerc=0.5),
R+ storage=list(nGam_save=0, storeV=rep(FALSE,3)),
+ tuning=list(mhProp_theta_var=0.05, Cg=rep(0.2,3), delPertg=rep(0.5,3),
+ rj.scheme=1, Kg_max=rep(50,3), sg_max=sg_max, time_lambda1=seq(1,sg_max[1],1),
+ time_lambda2=seq(1,sg_max[2],1), time_lambda3=seq(1,sg_max[3],1)))
```

As shown above, we set `sg_max`

to the largest observed failure times
for \(g \in \left\{1, 2, 3\right\}\). For more details of each item of
`mcmcParams`

, see Section 5.3.

Given this setup, we fit the PHR model using the function
`BayesID_HReg`

:

```
> fitBayesPHR <- BayesID_HReg(form, data=simCIBMTR, model=c("semi-Markov","PEM"),
R+ startValues=startValues, hyperParams=hyperParams, mcmcParams=mcmcParams)
```

We note that, depending on the complexity of the model specification
(e.g. if PEM baseline hazards are adopted) and the size of the dataset,
despite the functions having been written in C and compiled for R, the
MCMC scheme may require a large number of MCMC scans to ensure
convergence. As such, some models may take a relatively long time to
converge. The example we present below, for example, took 45 hours on a
Windows laptop with an Intel(R) Core(TM) i5-3337U 1.80GHz processor, 2
cores, 4 logical processors, 4GB of RAM and 3MB of cache memory to cycle
through the 6 millions scans for 3 chains. In lieu of attempting to
reproduce the exact results we present here, while readers are of course
free to do, Appendix 9.3 provides the code for this same
semi-competing risks model and its respective posterior summary, but
based on a reduced number of scans of the MCMC scheme (specifically
50,000 scans for 3 chains). Based on the full set of scans, the `print`

method for object returned by `BayesID_HReg`

, yields:

```
> print(fitBayesPHR, digits=2)
R
-competing risks data
Analysis of independent semi-Markov assumption for h3
semi
: 3
Number of chains: 5e+06
Number of scans: 1000
Thinning: 50%
Percentage of burnin
######
Potential Scale Reduction Factor
:
Variance of frailties, theta1
:
Regression coefficients
beta1 beta2 beta31 1 1
dTypeALL 1 1 1
dTypeCML 1 1 1
dTypeMDS 1 NA NA
sexP
function components:
Baseline hazard
: summary statistics
lambda11st Qu. Median Mean 3rd Qu. Max.
Min. 1.00 1.01 1.01 1.01 1.02 1.02
: summary statistics
lambda21st Qu. Median Mean 3rd Qu. Max.
Min. 1.00 1.00 1.00 1.00 1.00 1.02
: summary statistics
lambda31st Qu. Median Mean 3rd Qu. Max.
Min. 1.00 1.00 1.00 1.00 1.00 1.01
h1 h2 h31 1 1
mu 1 1 1
sigmaSq 1 1 1
K
...
```

Note that all parameters obtained PSRF close to \(1\), indicating that the chains have converged well (see Section 5.6). Convergence can also be assessed graphically through a trace plot:

```
> plot(fitBayesPHR$chain1$theta.p, type="l", col="red",
R+ xlab="iteration", ylab=expression(theta))
> lines(fitBayesPHR$chain2$theta.p, type="l", col="green")
R> lines(fitBayesPHR$chain3$theta.p, type="l", col="blue") R
```

Figure 3 shows convergence diagnostic for \(\theta\)
(subject-specific frailty variance component), where the three chains
have mixed and converged to a stable distribution. Any other model
parameter could be similarly evaluated. Analogous to the frequentist
example, we can also visualize the results through the function
`summary`

:

```
> summary(fitBayesPHR)
R
-competing risks data
Analysis of independent semi-Markov assumption for h3
semi
#####
: 85722
DIC: -42827
LPML: 0.05
Credibility level
#####
:
Hazard ratiosexp(beta1) LL UL exp(beta2) LL UL exp(beta3) LL UL
1.44 1.2 1.8 1.3 1.06 1.6 0.98 0.77 1.2
dTypeALL 1.71 1.4 2.1 0.8 0.63 1.0 1.25 0.96 1.6
dTypeCML 1.61 1.3 2.1 1.4 1.04 1.8 1.44 1.07 2.0
dTypeMDS 0.89 0.8 1.0 NA NA NA NA NA NA
sexP
:
Variance of frailties
theta LL UL6.7 6.1 7.4
function components:
Baseline hazard -PM LL UL h2-PM LL UL h3-PM LL UL
h1-5.60 -6.006 -5.0 -5.0 -9.5 -2.3 -6.74 -7.030 -6.5
mu 0.22 0.027 2.3 7.6 2.7 24.5 0.13 0.018 2.7
sigmaSq 10.00 5.000 17.0 15.0 11.0 20.0 10.00 4.000 17.0 K
```

Here we provide two model assessment measures (DIC and LPML) and estimates of all model parameters with their respective \(95\%\) credible intervals.

Our last example is based on AFT models
((1))-((3)) adopting a semi-Markov
assumption for \(h_{3}\) and the parametric log-Normal specification for
baseline survival distributions. Here we apply the Bayesian framework
via function `BayesID_AFT`

. As pointed out in
Section 5.4, `Formula`

argument for AFT models takes a
specific form:

```
> simCIBMTR$LT <- rep(0,dim(simCIBMTR)[1])
R> simCIBMTR$y1L <- simCIBMTR$y1U <- simCIBMTR[,1]
R> simCIBMTR$y1U[which(simCIBMTR[,2]==0)] <- Inf
R> simCIBMTR$y2L <- simCIBMTR$y2U <- simCIBMTR[,3]
R> simCIBMTR$y2U[which(simCIBMTR[,4]==0)] <- Inf
R
> formAFT <- Formula(LT | y1L + y1U | y2L + y2U ~ dTypeALL + dTypeCML + dTypeMDS +
R+ sexP | dTypeALL + dTypeCML + dTypeMDS | dTypeALL + dTypeCML + dTypeMDS)
```

Recall that `LT`

represents the left-truncation time, and (`y1L`

, `y1U`

)
and (`y2L`

, `y2U`

) are the interval-censored times to the non-terminal
and terminal events, respectively. Next step is to set the initial
values for model parameters through the `startValues`

argument, but now
using the auxiliary function `initiate.startValues_AFT`

:

```
> startValues <- initiate.startValues_AFT(formAFT, data=simCIBMTR,
R+ model="LN", nChain=3)
```

Again, we considered three Markov chains (`nChain`

=3). Using the
`hyperParams`

argument we specify all model hyperparameters:

```
> hyperParams <- list(theta=c(0.5,0.05), LN=list(LN.ab1=c(0.5,0.05),
R+ LN.ab2=c(0.5,0.05), LN.ab3=c(0.5,0.05)))
```

Each pair of hyperparameters defines shape and scale of an inverse Gamma
prior distribution (see Section 4.1). Similar to the
previous example, we must specify overall run, storage, and tuning
parameters for specific updates through the `mcmcParams`

argument:

```
> mcmcParams <- list(run=list(numReps=5e6, thin=1e3, burninPerc=0.5),
R+ storage=list(nGam_save=0, nY1_save=0, nY2_save=0, nY1.NA_save=0),
+ tuning=list(betag.prop.var=rep(0.01,3), mug.prop.var=rep(0.01,3),
+ zetag.prop.var=rep(0.01,3), gamma.prop.var=0.01))
```

Analogous to the previous Bayesian model, a large number of scans are
also required here to achieve the convergence of the Markov chains.
Again, for a quickly reproducible example, the code for the AFT model
with simplified MCMC setting is provided in Appendix 9.3.
For more details of each item of `mcmcParams`

, see
Section 5.4. Finally, we fit the AFT model using the
function `BayesID_AFT`

and analyze the convergence of each parameter
through the function `print`

:

```
> fitBayesAFT <- BayesID_AFT(formAFT, data=simCIBMTR, model="LN",
R+ startValues=startValues, hyperParams=hyperParams, mcmcParams=mcmcParams)
> print(fitBayesAFT, digits=2)
R
-competing risks data
Analysis of independent semi
: 3
Number of chains: 5e+06
Number of scans: 1000
Thinning: 50%
Percentage of burnin
######
Potential Scale Reduction Factor
: 1
Variance of frailties, theta
:
Regression coefficients
beta1 beta2 beta31 1 1
dTypeALL 1 1 1
dTypeCML 1 1 1
dTypeMDS 1 NA NA
sexP
function components:
Baseline survival =1 g=2 g=3
g1 1.2 1
mu 1 1.1 1
sigmaSq
...
```

Again, the PSRF for each parameter indicates the convergence. As a last
step, we visualize the estimate of each parameter and their respective
\(95\%\) credible intervals through the function `summary`

:

```
> summary(fitBayesAFT)
R
-competing risks data
Analysis of independent semi
#####
: 21400
DIC: -12597
LPML: 0.05
Credibility level
#####
:
Acceleration factorsexp(beta1) LL UL exp(beta2) LL UL exp(beta3) LL UL
0.68 0.54 0.84 0.94 0.86 1.0 1.08 0.85 1.4
dTypeALL 0.53 0.42 0.67 1.27 1.12 1.4 0.92 0.71 1.2
dTypeCML 0.58 0.44 0.75 0.88 0.78 1.0 0.78 0.58 1.0
dTypeMDS 1.16 0.99 1.36 NA NA NA NA NA NA
sexP
:
Variance of frailties
theta LL UL2.6 2.5 2.8
function components:
Baseline survival =1: PM LL UL g=2: PM LL UL g=3: PM LL UL
g-Normal: mu 8.2 8.0 8.4 6.293 6.244 6.335 6.5 6.4 6.7
log-Normal: sigmaSq 7.2 6.4 8.0 0.013 0.005 0.033 1.7 1.5 2.0 log
```

This paper discusses the implementation of a comprehensive R package
*SemiCompRisks* for the analyses of independent/cluster-correlated
semi-competing risks data. The package allows to fit parametric or
semi-parametric models based on either accelerated failure time or
proportional hazards regression approach. It is also flexible in that
one can adopt either a Markov or semi-Markov specification for terminal
event following non-terminal event. The estimation and inference are
mostly based on the Bayesian paradigm, but parametric PHR models can
also be fitted using the maximum likelihood estimation. Users can easily
obtain numerical and graphical presentation of model fits using R
methods, as illustrated in the stem cell transplantation example in
Section 6. In addition, the package provides functions
for performing univariate survival analysis. We would also like to
emphasize that the vignette documentation (Lee et al. 2017b) provides a list
of detailed examples applying each of the implemented models in the
package.

Given the complexity of some Bayesian models in the package, it may take relatively long time to implement the models for large datasets. We are currently looking into possibility to parallelize parts of the algorithm and to add support for OpenMP to the package, which can bring significant gains in computational time.

*SemiCompRisks* provides researchers with valid and practical analysis
tools for semi-competing risks data. The application examples in this
paper were run using version v3.30 of the package, available from the
CRAN at https://cran.r-project.org/package=SemiCompRisks. We plan to
constantly update the package to incorporate more functionality and
flexibility to the models for semi-competing risks analysis.

Funding for this work was provided by National Institutes of Health grants R01 CA181360-01. The authors also gratefully acknowledge the CIBMTR (grant U24-CA076518) for providing the covariates of the illustrative example.

The *SemiCompRisks* package contains a function, `simID`

, for simulating
independent or cluster-correlated semi-competing risks data. In this
section, we provide the details on the simulation algorithm used in
`simID`

for generating cluster-correlated semi-competing risks data
based on a parametric Weibull-MVN semi-Markov illness-death model, as
presented in Section 4.3, where the baseline hazard
functions are defined as
\(h_{0g}(t)=\alpha_{g} \, \kappa_{g} \, t^{\alpha_{g}-1}\), for
\(g \in \left\{1, 2, 3\right\}\). The step by step algorithm is given as
follows:

Generate \(\mathbf{V}_{j}=(V_{j1}, V_{j2}, V_{j3})^{\top}\) from a MVN(\(\mathbf{0}\), \(\Sigma_{V}\)), for \(j=1,\ldots,J\).

For each \(j\), repeat the following steps for \(i=1,\ldots,n_{j}\).

Generate \(\gamma_{ji}\) from a Gamma(\(\theta^{-1}\), \(\theta^{-1}\)).

Calculate \(\eta_{jig}=\log(\gamma_{ji})+\mathbf{x}_{jig}^{\top}\mathbf{\beta}_{g} + V_{jg}\), for \(g \in \left\{1, 2, 3\right\}\).

Generate \(t_{1}^{\ast}\) from a Weibull(\(\alpha_{1}\), \(\kappa_{1} \, e^{\eta_{ji1}})\) and \(t_{2}^{\ast}\) from a Weibull(\(\alpha_{2}\), \(\kappa_{2} \, e^{\eta_{ji2}})\).

- If \(t_{1}^{\ast} \leq t_{2}^{\ast}\), generate \(t^{\ast}\) from a Weibull(\(\alpha_{3}\), \(\kappa_{3} \, e^{\eta_{ji3}})\) and set \(t_{ji1}=t_{1}^{\ast}\), \(t_{ji2}=t_{1}^{\ast}+t^{\ast}\).
- Otherwise, set \(t_{ji1}=\infty\), \(t_{ji2}=t_{2}^{\ast}\).

Generate a censoring time \(c_{ji}\) from Uniform(\(c_{L}\), \(c_{U}\)).

Set the observed outcome information (

`time1`

,`time2`

,`event1`

,`event2`

) as follows:- (\(t_{ji1}\), \(t_{ji2}\), 1, 1), if \(t_{ji1}<t_{ji2}<c_{ji}\).
- (\(t_{ji1}\), \(c_{ji}\), 1, 0), if \(t_{ji1}<c_{ji}<t_{ji2}\).
- (\(t_{ji2}\), \(t_{ji2}\), 0, 1), if \(t_{ji1}=\infty\) and \(t_{ji2}<c_{ji}\).
- (\(c_{ji}\), \(c_{ji}\), 0, 0), if \(t_{ji1}>c_{ji}\) and \(t_{ji2}>c_{ji}\).

We note that the function `simID`

is flexible in that one can set the
\(\theta\) argument as zero (`theta.true`

=0) to simulate the data under
the model without the subject-specific shared frailty term
(\(\gamma_{ji}\)), which is analogous to the model proposed by
(Liquet et al. 2012). One can generate independent semi-competing risks data
outlined in Section 4.2 by setting the `id`

and
\(\Sigma_{V}\) arguments as nulls (`id`

=`NULL`

and `SimgaV.true`

=`NULL`

).

The true values of model parameters are set to estimates obtained by fitting a semi-Markov Weibull PHR model to the original CIBMTR data.

```
> data(CIBMTR_Params)
R> beta1.true <- CIBMTR_Params$beta1.true
R> beta2.true <- CIBMTR_Params$beta2.true
R> beta3.true <- CIBMTR_Params$beta3.true
R> alpha1.true <- CIBMTR_Params$alpha1.true
R> alpha2.true <- CIBMTR_Params$alpha2.true
R> alpha3.true <- CIBMTR_Params$alpha3.true
R> kappa1.true <- CIBMTR_Params$kappa1.true
R> kappa2.true <- CIBMTR_Params$kappa2.true
R> kappa3.true <- CIBMTR_Params$kappa3.true
R> theta.true <- CIBMTR_Params$theta.true
R> cens <- c(365, 365) R
```

The next step is to define the covariates matrices and then simulate
outcomes using the `simID`

function, available in the *SemiCompRisks*
package.

```
> data(CIBMTR)
R# Sex (M: reference category)
> CIBMTR$sexP <- as.numeric(CIBMTR$sexP)-1
R
# Age (LessThan10: reference category)
> CIBMTR$ageP20to29 <- as.numeric(CIBMTR$ageP=="20to29")
R> CIBMTR$ageP30to39 <- as.numeric(CIBMTR$ageP=="30to39")
R> CIBMTR$ageP40to49 <- as.numeric(CIBMTR$ageP=="40to49")
R> CIBMTR$ageP50to59 <- as.numeric(CIBMTR$ageP=="50to59")
R> CIBMTR$ageP60plus <- as.numeric(CIBMTR$ageP=="60plus")
R
# Disease type (AML: reference category)
> CIBMTR$dTypeALL <- as.numeric(CIBMTR$dType=="ALL")
R> CIBMTR$dTypeCML <- as.numeric(CIBMTR$dType=="CML")
R> CIBMTR$dTypeMDS <- as.numeric(CIBMTR$dType=="MDS")
R
# Disease status (Early: reference category)
> CIBMTR$dStatusInt <- as.numeric(CIBMTR$dStatus=="Int")
R> CIBMTR$dStatusAdv <- as.numeric(CIBMTR$dStatus=="Adv")
R
# HLA compatibility (HLA_Id_Sib: reference category)
> CIBMTR$donorGrp8_8 <- as.numeric(CIBMTR$donorGrp=="8_8")
R> CIBMTR$donorGrp7_8 <- as.numeric(CIBMTR$donorGrp=="7_8")
R
# Covariate matrix
> x1 <- CIBMTR[,c("sexP", "ageP20to29", "ageP30to39", "ageP40to49",
R+ "ageP50to59", "ageP60plus", "dTypeALL", "dTypeCML", "dTypeMDS",
+ "dStatusInt", "dStatusAdv", "donorGrp8_8", "donorGrp7_8")]
> x2 <- CIBMTR[,c("sexP", "ageP20to29", "ageP30to39", "ageP40to49",
R+ "ageP50to59", "ageP60plus", "dTypeALL", "dTypeCML", "dTypeMDS",
+ "dStatusInt", "dStatusAdv", "donorGrp8_8", "donorGrp7_8")]
> x3 <- CIBMTR[,c("sexP", "ageP20to29", "ageP30to39", "ageP40to49",
R+ "ageP50to59", "ageP60plus", "dTypeALL", "dTypeCML", "dTypeMDS",
+ "dStatusInt", "dStatusAdv", "donorGrp8_8", "donorGrp7_8")]
> set.seed(1405)
R> simOutcomes <- simID(id=NULL, x1=x1, x2=x2, x3=x3,
R+ beta1.true, beta2.true, beta3.true, alpha1.true, alpha2.true, alpha3.true,
+ kappa1.true, kappa2.true, kappa3.true, theta.true, SigmaV.true=NULL, cens)
> names(simOutcomes) <- c("time1", "event1", "time2", "event2")
R> simCIBMTR <- cbind(simOutcomes, CIBMTR[,c("sexP", "ageP20to29", "ageP30to39",
R+ "ageP40to49", "ageP50to59", "ageP60plus", "dTypeALL", "dTypeCML", "dTypeMDS",
+ "dStatusInt", "dStatusAdv", "donorGrp8_8", "donorGrp7_8")])
```

In order to encourage the reproducibility of the results obtained
through our R package in a reasonable computational time, Bayesian
analyses contained in Section 6.2 are illustrated
below using a reduced number of scans (`numReps`

), extent of thinning
(`thin`

), and simplifying the design matrix. Given the complexity of
these Bayesian models, the reduction of scans/thinning results in
non-convergence of the Markov chains, but at least it is possible to
reproduce the results quickly.

```
> form <- Formula(time1 + event1 | time2 + event2 ~ sexP | sexP | sexP)
R
> startValues <- initiate.startValues_HReg(form, data=simCIBMTR,
R+ model=c("semi-Markov","PEM"), nChain=3)
> hyperParams <- list(theta=c(0.5,0.05), PEM=list(PEM.ab1=c(0.5,0.05),
R+ PEM.ab2=c(0.5,0.05), PEM.ab3=c(0.5,0.05), PEM.alpha1=10,
+ PEM.alpha2=10, PEM.alpha3=10))
> sg_max <- c(max(simCIBMTR$time1[simCIBMTR$event1==1]),
R+ max(simCIBMTR$time2[simCIBMTR$event1==0 & simCIBMTR$event2==1]),
+ max(simCIBMTR$time2[simCIBMTR$event1==1 & simCIBMTR$event2==1]))
> mcmcParams <- list(run=list(numReps=5e4, thin=5e1, burninPerc=0.5),
R+ storage=list(nGam_save=0, storeV=rep(FALSE,3)),
+ tuning=list(mhProp_theta_var=0.05, Cg=rep(0.2,3), delPertg=rep(0.5,3),
+ rj.scheme=1, Kg_max=rep(50,3), sg_max=sg_max, time_lambda1=seq(1,sg_max[1],1),
+ time_lambda2=seq(1,sg_max[2],1), time_lambda3=seq(1,sg_max[3],1)))
> fitBayesPHR <- BayesID_HReg(form, data=simCIBMTR, model=c("semi-Markov","PEM"),
R+ startValues=startValues, hyperParams=hyperParams, mcmcParams=mcmcParams)
> print(fitBayesPHR, digits=2)
R
-competing risks data
Analysis of independent semi-Markov assumption for h3
semi
: 3
Number of chains: 50000
Number of scans: 50
Thinning: 50%
Percentage of burnin
######
Potential Scale Reduction Factor
:
Variance of frailties, theta5.4
:
Regression coefficients
beta1 beta2 beta31.3 1.4 1.3
sexP
function components:
Baseline hazard
: summary statistics
lambda11st Qu. Median Mean 3rd Qu. Max.
Min. 1.1 2.7 3.0 3.0 3.3 4.0
: summary statistics
lambda21st Qu. Median Mean 3rd Qu. Max.
Min. 1.0 2.5 3.6 3.3 4.1 5.2
: summary statistics
lambda31st Qu. Median Mean 3rd Qu. Max.
Min. 1.12 1.42 1.60 1.59 1.70 2.17
h1 h2 h31.2 1.0 1.1
mu 1.2 1.1 1.0
sigmaSq 1.0 1.4 1.0
K
######
Estimates: 0.05
Credibility level
:
Variance of frailties, theta
Estimate SD LL UL9.4 0.71 8.9 11
:
Regression coefficients
Estimate SD LL UL-0.19 0.09 0.68 0.99
sexP -0.04 0.10 0.78 1.16
sexP -0.08 0.11 0.74 1.14
sexP
: Covariates are arranged in order of transition number, 1->3. Note
```

The joint posterior predictive probability involving two event times can
be obtained with the `PPD`

function:

```
# Prediction for a female patient (x1=x2=x3=1)
> predF <- PPD(fitBayesPHR, x1=1, x2=1, x3=1, t1=120, t2=300)
R> predF$F_u
R0.076
> predF$F_l
R0.26
```

`predF$F_u`

represents the joint posterior predictive probability of
dying within 300 days and being diagnosed with acute GVHD within 120
days for a female patient (the joint probability from the upper wedge
support, 0\(<\)`t1`

\(<\)`t2`

). On the other hand, `predF$F_l`

is the joint
posterior predictive probability of dying within 300 days without acute
GVHD for a female patient (the joint probability from the domain,
`t1`

=\(\infty\), `t2`

\(>\)0).

```
> simCIBMTR$LT <- rep(0,dim(simCIBMTR)[1])
R> simCIBMTR$y1L <- simCIBMTR$y1U <- simCIBMTR[,1]
R> simCIBMTR$y1U[which(simCIBMTR[,2]==0)] <- Inf
R> simCIBMTR$y2L <- simCIBMTR$y2U <- simCIBMTR[,3]
R> simCIBMTR$y2U[which(simCIBMTR[,4]==0)] <- Inf
R
> formAFT <- Formula(LT | y1L + y1U | y2L + y2U ~ sexP | sexP | sexP)
R
> startValues <- initiate.startValues_AFT(formAFT, data=simCIBMTR,
R+ model="LN", nChain=3)
> hyperParams <- list(theta=c(0.5,0.05), LN=list(LN.ab1=c(0.5,0.05),
R+ LN.ab2=c(0.5,0.05), LN.ab3=c(0.5,0.05)))
> mcmcParams <- list(run=list(numReps=5e4, thin=5e1, burninPerc=0.5),
R+ storage=list(nGam_save=0, nY1_save=0, nY2_save=0, nY1.NA_save=0),
+ tuning=list(betag.prop.var=rep(0.01,3), mug.prop.var=rep(0.01,3),
+ zetag.prop.var=rep(0.01,3), gamma.prop.var=0.01))
> fitBayesAFT <- BayesID_AFT(formAFT, data=simCIBMTR, model="LN",
R+ startValues=startValues, hyperParams=hyperParams, mcmcParams=mcmcParams)
> summary(fitBayesAFT, digits=2)
R
-competing risks data
Analysis of independent semi
#####
: 55244
DIC: -25839
LPML: 0.05
Credibility level
#####
:
Acceleration factorsexp(beta1) LL UL exp(beta2) LL UL exp(beta3) LL UL
1.2 0.95 1.4 0.92 0.86 0.99 0.93 0.8 1.1
sexP
:
Variance of frailties
theta LL UL1.5 0.96 1.8
function components:
Baseline survival =1: PM LL UL g=2: PM LL UL g=3: PM LL UL
g-Normal: mu 8.3 8.2 8.6 6.4 6.38 6.5 6.1 5.9 6.2
log-Normal: sigmaSq 10.1 9.2 11.8 1.1 0.82 1.7 1.9 1.6 2.5 log
```

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

A. Allignol, J. Beyersmann and M. Schumacher. Mvna: An R package for the Nelson-Aalen estimator in multistate models. *R News*, 8(2): 48–50, 2008. URL http://cran.r-project.org/doc/Rnews/Rnews_2008-2.pdf.

A. Allignol, M. Schumacher and J. Beyersmann. Empirical transition matrix of multi-state models: The etm package. *Journal of Statistical Software*, 38(4): 1–15, 2011. URL https://doi.org/10.18637/jss.v038.i04.

A. Araújo, L. Meira-Machado and J. Roca-Pardiñas. TPmsm: Estimation of the transition probabilities in 3-state models. *Journal of Statistical Software*, 62(4): 1–29, 2014. URL https://doi.org/10.18637/jss.v062.i04.

A. Boruvka and R. J. Cook. *Coxinterval: Cox-type models for interval-censored data.* 2015. URL https://cran.r-project.org/package=coxinterval. R package version 1.2.

S. P. Brooks and A. Gelman. General methods for monitoring convergence of iterative simulations. *Journal of Computational and Graphical Statistics*, 7(4): 434–455, 1998. URL https://doi.org/10.1080/10618600.1998.10474787.

G. Celeux, F. Forbes, C. P. Robert and D. M. Titterington. Deviance information criteria for missing data models. *Bayesian Analysis*, 1(4): 651–673, 2006. URL https://doi.org/10.1214/06-BA122.

B. L. Egleston, D. O. Scharfstein, E. E. Freeman and S. K. West. Causal inference for non-mortality outcomes in the presence of death. *Biostatistics*, 8(3): 526–545, 2007. URL https://doi.org/10.1093/biostatistics/kxl027.

N. Ferguson, S. Datta and G. Brock. msSurv: An R package for nonparametric estimation of multistate models. *Journal of Statistical Software*, 50(14): 1–24, 2012. URL https://doi.org/10.18637/jss.v050.i14.

J. P. Fine, H. Jiang and R. Chappell. On semi-competing risks data. *Biometrika*, 88(4): 907–919, 2001. URL https://doi.org/10.1093/biomet/88.4.907.

H. Fu, Y. Wang, J. Liu, P. M. Kulkarni and A. S. Melemed. Joint modeling of progression-free survival and overall survival by a Bayesian normal induced copula estimation model. *Statistics in Medicine*, 32(2): 240–254, 2013. URL https://doi.org/10.1002/sim.5487.

S. Geisser and W. F. Eddy. A predictive approach to model selection. *Journal of the American Statistical Association*, 74(365): 153–160, 1979. URL https://doi.org/10.2307/2286745.

A. E. Gelfand and B. K. Mallick. Bayesian analysis of proportional hazards models built from monotone functions. *Biometrics*, 51(3): 843–852, 1995. URL https://doi.org/10.2307/2532986.

A. Gelman and D. B. Rubin. Inference from iterative simulation using multiple sequences. *Statistical Science*, 7(4): 457–472, 1992. URL https://doi.org/10.1214/ss/1177011136.

D. Ghosh. Semiparametric inferences for association with semi-competing risks data. *Statistics in Medicine*, 25(12): 2059–2070, 2006. URL https://doi.org/10.1002/sim.2327.

B. Han, M. Yu, J. J. Dignam and P. J. Rathouz. Bayesian approach for flexible modeling of semicompeting risks data. *Statistics in Medicine*, 33(29): 5111–5125, 2014. URL https://doi.org/10.1002/sim.6313.

S. Haneuse and K. H. Lee. Semi-competing risks data analysis: Accounting for death as a competing risk when the outcome of interest is nonterminal. *Circulation: Cardiovascular Quality and Outcomes*, 9(3): 322–331, 2016. URL https://doi.org/10.1161/CIRCOUTCOMES.115.001841.

J. J. Hsieh, W. Wang and A. A. Ding. Regression analysis based on semicompeting risks data. *Journal of the Royal Statistical Society B*, 70(1): 3–20, 2008. URL https://doi.org/10.1111/j.1467-9868.2007.00621.x.

C. H. Jackson. Flexsurv: A platform for parametric survival modeling in R. *Journal of Statistical Software*, 70(8): 1–33, 2016. URL https://doi.org/10.18637/jss.v070.i08.

C. H. Jackson. Multi-state models for panel data: The msm package for R. *Journal of Statistical Software*, 38(8): 1–28, 2011. URL https://doi.org/10.18637/jss.v038.i08.

I. Jazić, D. Schrag, D. J. Sargent and S. Haneuse. Beyond composite endpoints analysis: Semicompeting risks as an underutilized framework for cancer research. *Journal of the National Cancer Institute*, 108(12): djw154, 2016. URL https://doi.org/10.1093/jnci/djw154.

H. Jiang, J. P. Fine and R. Chappell. Semiparametric analysis of survival data with left truncation and dependent right censoring. *Biometrics*, 61(2): 567–575, 2005. URL https://doi.org/10.1111/j.1541-0420.2005.00335.x.

J. P. Klein and M. L. Moeschberger. *Survival analysis: Techniques for censored and truncated data.* 2nd ed Springer-Verlag, 2003.

T. Kneib and A. Hennerfeind. Bayesian semi parametric multi-state models. *Statistical Modelling*, 8(2): 169–198, 2008. URL https://doi.org/10.1177/1471082X0800800203.

A. Król and P. Saint-Pierre. SemiMarkov: An R package for parametric estimation in multi-state semi-Markov models. *Journal of Statistical Software*, 66(6): 1–16, 2015. URL https://doi.org/10.18637/jss.v066.i06.

L. Lakhal, L. P. Rivest and B. Abdous. Estimating survival and association in a semicompeting risks model. *Biometrics*, 64(1): 180–188, 2008. URL https://doi.org/10.1111/j.1541-0420.2007.00872.x.

C. Lee, S. J. Lee and S. Haneuse. Time-to-event analysis when the event is defined on a finite time interval. *Submitted*, 2017a.

K. H. Lee, F. Dominici, D. Schrag and S. Haneuse. Hierarchical models for semicompeting risks data with application to quality of end-of-life care for pancreatic cancer. *Journal of the American Statistical Association*, 111(515): 1075–1095, 2016. URL https://doi.org/10.1080/01621459.2016.1164052.

K. H. Lee, S. Haneuse, D. Schrag and F. Dominici. Bayesian semiparametric analysis of semicompeting risks data: Investigating hospital readmission after a pancreatic cancer diagnosis. *Journal of the Royal Statistical Society C*, 64(2): 253–273, 2015. URL https://doi.org/10.1111/rssc.12078.

K. H. Lee, C. Lee, D. Alvares and S. Haneuse. *SemiCompRisks: Hierarchical Models for Parametric and Semi-Parametric Analyses of Semi-Competing Risks Data.* 2017b. URL https://cran.r-project.org/web/packages/SemiCompRisks/vignettes/SemiCompRisks.pdf. R package version 3.30.

K. H. Lee, V. Rondeau and S. Haneuse. Accelerated failure time models for semi-competing risks data in the presence of complex censoring. *Biometrics*, 73(4): 1401–1412, 2017c. URL https://doi.org/10.1111/biom.12696.

B. Liquet, J. F. Timsit and V. Rondeau. Investigating hospital heterogeneity with a multi-state frailty model: Application to nosocomial pneumonia disease in intensive care units. *BMC Medical Research Methodology*, 12(1): 1–14, 2012. URL https://doi.org/10.1186/1471-2288-12-79.

L. Liu, R. A. Wolfe and X. Huang. Shared frailty models for recurrent events and a terminal event. *Biometrics*, 60(3): 747–756, 2004. URL https://doi.org/10.1111/j.0006-341X.2004.00225.x.

L. Meira-Machado, C. Cadarso-Suárez and J. Uña-Álvarez. Tdc.msm: AnR library for the analysis of multi-state survival data. *Computer Methods and Programs in Biomedicine*, 86(2): 131–140, 2007. URL https://doi.org/10.1016/j.cmpb.2007.01.010.

L. Meira-Machado and J. Roca-Pardiñas. P3state.msm: Analyzing survival data from an illness-death model. *Journal of Statistical Software*, 38(3): 1–18, 2011. URL https://doi.org/10.18637/jss.v038.i03.

R. M. Neal. Markov chain sampling methods for Dirichlet process mixture models. *Journal of Computational and Graphical Statistics*, 9(2): 249–265, 2000. URL https://doi.org/10.1080/10618600.2000.10474879.

L. Peng and J. P. Fine. Regression modeling of semicompeting risks data. *Biometrics*, 63(1): 96–108, 2007. URL https://doi.org/10.1111/j.1541-0420.2006.00621.x.

H. Putter, M. Fiocco and R. B. Geskus. Tutorial in biostatistics: Competing risks and multi-state models. *Statistics in Medicine*, 26(11): 2389–2430, 2007. URL https://doi.org/10.1002/sim.2712.

V. Rondeau, Y. Mazroui and J. R. Gonzalez. Frailtypack: An R package for the analysis of correlated survival data with frailty models using penalized likelihood estimation or parametrical estimation. *Journal of Statistical Software*, 47(4): 1–28, 2012. URL https://doi.org/10.18637/jss.v047.i04.

D. J. Spiegelhalter, N. G. Best, B. P. Carlin and A. van der Linde. Bayesian measures of model complexity and fit. *Journal of the Royal Statistical Society B*, 64(4): 583–639, 2002. URL https://doi.org/10.1111/1467-9868.00353.

E. J. Tchetgen Tchetgen. Identification and estimation of survivor average causal effects. *Statistics in Medicine*, 33(21): 3601–3628, 2014. URL https://doi.org/10.1002/sim.6181.

C. Touraine, T. A. Gerds and P. Joly. SmoothHazard: An R package for fitting regression models to interval-censored observations of illness-death models. *Journal of Statistical Software*, 79(7): 1–22, 2017. URL https://doi.org/10.18637/jss.v079.i07.

R. Varadhan, Q. L. Xue and K. Bandeen-Roche. Semicompeting risks in aging research: Methods, issues and needs. *Lifetime Data Analysis*, 20(4): 538–562, 2014. URL https://doi.org/10.1007/s10985-014-9295-7.

W. Wang. Estimating the association parameter for copula models under dependent censoring. *Journal of the Royal Statistical Society B*, 65(1): 257–273, 2003. URL https://doi.org/10.1111/1467-9868.00385.

L. J. Wei. The accelerated failure time model: A useful alternative to the Cox regression model in survival analysis. *Statistics in Medicine*, 11(14-15): 1871–1879, 1992. URL https://doi.org/10.1002/sim.4780111409.

L. C. Wreede, M. Fiocco and H. Putter. Mstate: An R package for the analysis of competing risks and multi-state models. *Journal of Statistical Software*, 38(7): 1–30, 2011. URL https://doi.org/10.18637/jss.v038.i07.

J. Xu, J. D. Kalbfleisch and B. Tai. Statistical analysis of illness-death processes and semicompeting risks data. *Biometrics*, 66(3): 716–725, 2010. URL https://doi.org/10.1111/j.1541-0420.2009.01340.x.

Y. Ye, J. D. Kalbfleisch and D. E. Schaubel. Semiparametric analysis of correlated recurrent and terminal events. *Biometrics*, 63(1): 78–87, 2007. URL https://doi.org/10.1111/j.1541-0420.2006.00677.x.

D. Zeng, Q. Chen, M. H. Chen and J. G. Ibrahim. Estimating treatment effects with treatment switching via semicompeting risks models: An application to a colorectal cancer study. *Biometrika*, 99(1): 167–184, 2012. URL https://doi.org/10.1093/biomet/asr062.

D. Zeng and D. Y. Lin. Semiparametric transformation models with random effects for joint analysis of recurrent and terminal events. *Biometrics*, 65(3): 746–752, 2009. URL https://doi.org/10.1111/j.1541-0420.2008.01126.x.

J. L. Zhang and D. B. Rubin. Estimation of causal effects via principal stratification when some outcomes are truncated by “death.” *Journal of Educational and Behavioral Statistics*, 28(4): 353–368, 2003. URL https://doi.org/10.3102/10769986028004353.

Y. Zhang, M. H. Chen, J. G. Ibrahim, D. Zeng, Q. Chen, Z. Pan and X. Xue. Bayesian gamma frailty models for survival data with semi-competing risks and treatment switching. *Lifetime Data Analysis*, 20(1): 76–105, 2014. URL https://doi.org/10.1007/s10985-013-9254-8.

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

Alvares, et al., "SemiCompRisks: An R Package for the Analysis of Independent and Cluster-correlated Semi-competing Risks Data", The R Journal, 2019

BibTeX citation

@article{RJ-2019-038, author = {Alvares, Danilo and Haneuse, Sebastien and Lee, Catherine and Lee, Kyu Ha}, title = {SemiCompRisks: An R Package for the Analysis of Independent and Cluster-correlated Semi-competing Risks Data}, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {1}, issn = {2073-4859}, pages = {376-400} }