orthoDr is a package in R that solves dimension reduction problems using orthogonality constrained optimization approach. The package serves as a unified framework for many regression and survival analysis dimension reduction models that utilize semiparametric estimating equations. The main computational machinery of orthoDr is a first-order algorithm developed by (Wen and Yin 2012) for optimization within the Stiefel manifold. We implement the algorithm through Rcpp and OpenMP for fast computation. In addition, we developed a general-purpose solver for such constrained problems with user-specified objective functions, which works as a drop-in version of optim(). The package also serves as a platform for future methodology developments along this line of work.
Dimension reduction is a long-standing problem in statistics and data
science. While the traditional principal component analysis
(Jolliffe 1986) and related works provide a way of reducing the
dimension of the covariates, the term “sufficient dimension reduction”
is more commonly referring to a series of regression works originated
from the seminal paper on sliced inverse regression (Li 1991). In
such problems, we observe an outcome
Although there are celebrated theoretical and methodological advances,
estimating
The goal of our orthoDr
package is to develop a computationally efficient optimization platform
for solving the semiparametric estimating equation approaches proposed
in (Ma and Zhu 2013b), (Sun et al. 2017) and possibly any future work
along this line. Revisiting the rank preserving problem of
The purpose of this article is to provide a general overview of the orthoDr package (version 0.6.2) and provide some concrete examples to demonstrate its advantages. orthoDr is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=orthoDr and GitHub at https://github.com/teazrq/orthoDr. We begin by explaining the underlying formulation of the estimating equation problem and the parameter updating scheme that preserves orthogonality. Next, the software is introduced in detail using simulated data and real data as examples. We further demonstrate an example that utilizes the package as a general purpose solver. We also investigate the computational time of the package compared with existing solvers. Future plans for extending the package to other dimension reduction problems are also discussed.
To give a concrete example of the estimating equations, we use the
semiparametric inverse regression approach defined in (Sun et al. 2017)
to demonstrate the calculation. Following the common notations in the
survival analysis literature, let
However, this substantially increase the computational burden since the
double kernel estimator requires
which requires only
The algorithm works in the same fashion as a regular gradient decent,
except that we need to preserve the orthogonality at each iteration of
the update. As described in (Wen and Yin 2012), given any feasible point
There are several main functions in the orthoDr package:
orthoDr_surv
, ortho_reg
and ortho_optim
. They are corresponding to
the survival model described perviously (Sun et al. 2017), the
regression model in (Ma and Zhu 2012a), and a general constrained
optimization function, respectively. In this section, we demonstrate the
details of using these main functions, illustrate them with examples.
The orthoDr_surv
function implements the optimization problem defined
in Equation (9), where the kernel estimations and various
quantities are implemented and calculated within C++. Note that in
addition, the method defined previously, some simplified versions are
also implemented such as the counting process inverse regression models
and the forward regression models, which are all described in
(Sun et al. 2017). These specifications can be made using the method
parameter. A routine call of the function orthoDr_surv
proceed as
orthoDr_surv(x, y, censor, method, ndr, B.initial, bw, keep.data,
control, maxitr, verbose, ncore)
x
: A matrix or data.frame for features (numerical only).y
: A vector of observed survival times.censor
: A vector of censoring indicators.method
: The estimating equation method used.
"dm"
(default): semiparametric inverse regression given in
(4)."dn"
: counting process inverse regression."forward"
: forward regression model with one structural
dimensional.ndr
: The number of structural dimensional. For method
= "dn"
or "dm"
, the default is 2. For method
= "forward"
only one
structural dimension is allowed, hence the parameter is suppressed.B.initial
: Initial B
values. Unless specifically interested,
this should be left as default, which uses the computational
efficient approach (with the CPSIR()
function) in
(Sun et al. 2017) as the initial. If specified, must be a matrix
with ncol(x)
rows and ndr
columns. The matrix will be processed
by Gram-Schmidt if it does not satisfy the orthogonality constrain.bw
: A kernel bandwidth, assuming each variables have unit
variance. By default we use the Silverman rule-of-thumb formula
(Silverman 1986) to determine the bandwidth
silverman(n, d)
function
in our package.keep.data
: Should the original data be kept for prediction?
Default is FALSE
.control
: A list of tuning variables for optimization, including
the convergence criteria. In particular, epsilon
is the size for
numerically approximating the gradient, ftol
, gtol
, and btol
are tolerance levels for the objective function, gradients, and the
parameter estimations, respectively, for judging the convergence.
The default values are selected based on (Wen and Yin 2012) .maxitr
: Maximum number of iterations. Default is 500.verbose
: Should information be displayed? Default is FALSE
.ncore
: Number of cores for parallel computing when approximating
the gradients numerically. The default is the maximum number of
threads.We demonstrate the usage of orthoDr_surv
function by solving a problem
with generated survival data.
# generate some survival data with two structural dimensions
R> set.seed(1)
R> N = 350; P = 6; dataX = matrix(rnorm(N*P), N, P)
R> failEDR = as.matrix(cbind(c(1, 1, 0, 0, 0, 0, rep(0, P-6)),
+ c(0, 0, 1, -1, 0, 0, rep(0, P-6))))
R> censorEDR = as.matrix(c(0, 1, 0, 1, 1, 1, rep(0, P-6)))
R> T = exp(-2.5 + dataX %*% failEDR[,1] + 0.5*(dataX %*%
+ failEDR[,1])*(dataX %*% failEDR[,2]) + 0.25*log(-log(1-runif(N))))
R> C = exp( -0.5 + dataX %*% censorEDR + log(-log(1-runif(N))))
R> Y = pmin(T, C)
R> Censor = (T < C)
# fit the model
R> orthoDr.fit = orthoDr_surv(dataX, Y, Censor, ndr = 2)
R> orthoDr.fit
[,1] [,2]
[1,] -0.689222616 0.20206497
[2,] -0.670750726 0.19909057
[3,] -0.191817963 -0.66623300
[4,] 0.192766630 0.68605407
[5,] 0.005897188 0.02021414
[6,] 0.032829356 0.06773089
To evaluate the accuracy of this estimation, a distance function
distance()
can be used. This function calculates the distance between
the column spaces generated by the true
distance(s1, s2, method, x)
s1
: A matrix for the first column space (e.g., s2
: A matrix for the second column space (e.g.,
method
:
"dist"
: the Frobenius norm distance between the projection
matrices of the two given matrices, where for any given matrix
"trace"
: the trace correlation between two projection matrices
"canonical"
: the canonical correlation between
"sine"
: the sine angle distance x
: The design matrix NULL
), required only if
method
"canonical"
is used.We compare the accuracy of the estimations obtained by the
method =``"dm"
and "dn"
. Note that the "dm"
method enjoys double
robustness property of the estimating equations, hence the result is
usually better.
# Calculate the distance to the true parameters
R> distance(failEDR, orthoDr.fit$B, "dist")
[1] 0.1142773
# Compare with the counting process inverse regression model
R> orthoDr.fit1 = orthoDr_surv(dataX, Y, Censor, method = "dn", ndr = 2)
R> distance(failEDR, orthoDr.fit1$B, "dist")
[1] 0.1631814
The orthoDr_reg
function implements the semiparametric dimension
reduction methods proposed in (Ma and Zhu 2012a). A routine call of
the function orthoDr_reg
proceed as
orthoDr_reg(x, y, method, ndr, B.initial, bw, keep.data, control,
maxitr, verbose, ncore)
x
: A matrix or data.frame for features (numerical only).y
: A vector of observed continuous outcome.method
: We currently implemented two methods: the semiparametric
sliced inverse regression method ("sir"
), and the semiparametric
principal Hessian directions method ("phd"
).
"sir"
: semiparametric sliced inverse regression method solves
the sample version of the estimating equation
"phd"
: semiparametric principal Hessian directions method that
estimates ndr
: The number of structural dimensional (default is 2).B.initial
: Initial B
values. For each method, the initial values
are taken from the corresponding traditional inverse regression
approach using the dr package. The obtained matrix will be
processed by Gram-Schmidt for orthogonality.bw
, keep.data
, control
, maxitr
, verbose
and ncore
are
exactly the same as those in the orthoDr_surv
function.To demonstrate the usage of orthoDr_reg
, we consider the problem of
dimension reduction by fitting a semi-PHD model proposed by
(Ma and Zhu 2012a).
R> set.seed(1)
R> N = 100; P = 4; dataX = matrix(rnorm(N*P), N, P)
R> Y = -1 + dataX[,1] + rnorm(N)
R> orthoDr_reg(dataX, Y, ndr = 1, method = "phd")
Subspace for regression model using phd approach:
[,1]
[1,] 0.99612339
[2,] 0.06234337
[3,] -0.04257601
[4,] -0.04515279
The estimation equations of the dimension reduction problem in the survival and regression settings usually have a complicated form. Especially, multiple kernel estimations are involved, which results in difficulties in taking derivatives analytically. As an alternative, numerically approximated gradients are implemented using OpenMP. A comparison between a single core and multiple cores (4 cores) is given in the following example. Results from 20 independent simulation runes are summarized in Table 1. The data generating procedure used in this example is the same as the survival data used in Section 3.1. All simulations are performed on an i7-4770K CPU.
R> t0 = Sys.time()
R> dn.fit = orthoDr_surv(dataX, Y, Censor, method = "dn", ndr = ndr,
+ ncore = 4, control = list(ftol = 1e-6))
R> Sys.time() - t0
# of cores | ||
---|---|---|
2-3 | 1 | 4 |
3.9831 | 1.2741 | |
12.7780 | 3.4850 |
ortho_optim
is a general purpose optimization function that can
incorporate any user defined objective function ortho_optim
is similar to the
widely used optim()
function. A routine call of the function proceed
as
ortho_optim(B, fn, grad, ..., maximize, control, maxitr, verbose)
B
: Initial B
values. Must be a matrix, and the columns are
subject to the orthogonality constrains. It will be processed by
Gram-Schmidt if not orthogonal.fn
: A function that calculates the objective function value. The
first argument should be B
. Returns a single value.grad
: A function that calculate the gradient. The first argument
should be B
. Returns a matrix with the same dimension as B
. If
not specified, a numerical approximation is used....
: Arguments passed to fn
and grad
besides B
.maximize
: By default, the solver will try to minimize the
objective function unless maximize = TRUE
.maxitr
, verbose
and ncore
works in the same way
as introduced in the previous sections.To demonstrate the simple usage of ortho_optim
as a drop-in function
of optim()
, we consider the problem of searching for the first
principle component for a data matrix.
# an example of searching for the first principal component
R> set.seed(1)
R> N = 400; P = 100; X = scale(matrix(rnorm(N*P), N, P), scale = FALSE)
R> w = gramSchmidt(matrix(rnorm(P), P, 1))$Q
R> fx <- function(w, X) t(w) %*% t(X) %*% X %*% w
R> gx <- function(w, X) 2*t(X) %*% X %*% w
# fit the model
R> fit = ortho_optim(w, fx, gx, X = X, maximize = TRUE, verbose = 0)
R> head(fit$B)
[,1]
[1,] 0.01268226
[2,] -0.09065592
[3,] -0.01471700
[4,] 0.10583958
[5,] -0.02656409
[6,] -0.04186199
# compare results with the prcomp() function
R> library(pracma)
R> distance(fit$B, as.matrix(prcomp(X)$rotation[, 1]), type = "dist")
[1] 1.417268e-05
The ManifoldOptim (Martin et al. 2016) package is known for solving optimization problems on manifolds. We consider the problem of optimizing Brockett cost function (Huang et al. 2018) on the Stiefel manifold with objective and gradient functions written in R. The problem can be stated as
where
R> n = 150; p = 5; set.seed(1)
R> X <- matrix(rnorm(n*n), nrow=n)
R> X <- X + t(X)
R> D <- diag(p:1, p)
R> f1 <- function(B, X, D) { Trace( t(B) %*% X %*% B %*% D ) }
R> g1 <- function(B, X, D) { 2 * X %*% B %*% D }
R> b1 = gramSchmidt(matrix(rnorm(n*p), nrow=n, ncol=p))$Q
R> res2 = ortho_optim(b1, fn = f1, grad = g1, X, D)
R> head(res2$B)
[,1] [,2] [,3] [,4] [,5]
[1,] -0.110048632 -0.060656649 -0.001113691 -0.03451514 -0.063626067
[2,] -0.035495670 -0.142148873 -0.011204859 0.01784039 0.129255824
[3,] 0.052141162 0.015140614 -0.034893426 0.02600569 0.006868275
[4,] 0.151239722 -0.008553174 -0.096884087 0.01398827 0.132756189
[5,] -0.001144864 -0.056849007 0.080050182 0.23351751 -0.007219738
[6,] -0.140444290 -0.112932425 0.082197835 0.18644089 -0.057003273
Furthermore, we compare the performence with the ManifoldOptim
package, using four optimization methods: "LRBFGS"
, "LRTRSR1"
,
"RBFGS"
and "RTRSR1"
(Huang et al. 2018). We wrote the same required
functions for the Brockett problem in R. Further more, note that
different algorithms implements slightly different stoping criterion, we
run each algorithm a fixed number of iterations with a single core. We
consider three smaller settings with
We found that "LRBFGS"
and our orthoDr package usually achieve the
best performance, with functional value decreases the steepest in the
log scale. In terms of computing time, "LRBFGS"
and orthoDr
performers similarly. Although "LRTRSR1"
has similar computational
time, its functional value falls behind. This is mainly because the
theoretical complexity of second-order algorithms is similar to first
order algorithms, both are of order "LRBFGS"
or our orthoDr package regarding the efficiency of
the algorithm. However, first order algorithms may have an advantage
when developing methods for penalized high-dimensional models.
From left to
right, top to bottom: p = 5, 10, 20 and 50
respectively.
From left to right, top to bottom: p = 5, 10, 20 and 50 respectively.
iteration | ManifoldOpthm | orthoDr | |||||
---|---|---|---|---|---|---|---|
LRBFGS | LRTRSR1 | RBFGS | RTRSR1 | ||||
5 | 250 | 0.053 | 0.062 | 0.451 | 0.452 | 0.065 | |
10 | 500 | 0.176 | 0.201 | 4.985 | 5.638 | 0.221 | |
20 | 750 | 0.526 | 0.589 | 28.084 | 36.142 | 0.819 | |
50 | 1000 | 2.469 | 2.662 | – | – | 6.929 | |
5 | 250 | 0.403 | 0.414 | 7.382 | 7.426 | 0.423 | |
10 | 500 | 1.234 | 1.305 | 57.047 | 67.738 | 1.332 | |
20 | 750 | 3.411 | 3.6 | – | – | 3.974 | |
50 | 1000 | 13.775 | 14.43 | – | – | 19.862 |
We use the Concrete Compressive Strength (Yeh 1998) dataset as
an example to further demonstrate the orthoDr_reg
function and to
visualize the results. The dataset is obtained from the UCI Machine
Learning Repository.
Concrete is the most important material in civil engineering. The
concrete compressive strength is a highly nonlinear function of age and
ingredients. These ingredients include cement, blast furnace slag, fly
ash, water, superplasticizer, coarse aggregate, and fine aggregate. In
this dataset, we have
R> concrete_data = read.csv(choose.files())
R> X = as.matrix(concrete_data[,1:8])
R> colnames(X) = c("Cement", "Blast Furnace Slag", "Fly Ash", "Water",
"Superplasticizer", "Coarse Aggregate", "Fine Aggregate", "Age")
R> Y = as.matrix(concrete_data[,9])
R> result = orthoDr_reg(X, Y, ndr = 2, method = "sir", maxitr = 1000,
+ keep.data = TRUE)
R> rownames(result$B) = colnames(X)
R> result$B
[,1] [,2]
Cement 0.08354280 -0.297899899
Blast Furnace Slag 0.27563507 0.320304097
Fly Ash 0.82665328 -0.468889856
Water 0.20738201 0.460314093
Superplasticizer 0.43496780 0.540733516
Coarse Aggregate 0.01141892 0.011870495
Fine Aggregate 0.02936740 -0.004718979
Age 0.02220664 -0.290444936
Using the algorithm proposed by (Wen and Yin 2012) for optimization on the Stiefel manifold, we developed the orthoDr package that serves specifically for semi-parametric dimension reductions problems. A variety of dimension reduction models are implemented for censored survival outcome and regression problems. In addition, we implemented parallel computing for numerically appropriate the gradient function. This is particularly useful for semi-parametric estimating equation methods because the objective function usually involves kernel estimations and the gradients are difficult to calculate. Our package can also be used as a general purpose solver and is comparable with existing manifold optimization approaches. However, since the performances of different optimization approaches could be problem dependent, hence, it could be interesting to investigate other choices such as the “LRBFGS” approach in the ManifoldOptim package.
Our package also serves as a platform for future methodology
developments along this line of work. For example, we are currently
developing a personalized dose-finding model with dimension reduction
structure (Zhou and Zhu 2018). Also, when the number of covariates
Xin Zhang’s research was supported in part by grants DMS-1613154 and CCF-1617691 from U.S. National Science Foundation. Ruoqing Zhu’s research was supported in part by grant RB19046 from UIUC.
orthoDr, Rcpp, RcppArmadillo, ManifoldOpthm
HighPerformanceComputing, NumericalMathematics
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Zhu, et al., "orthoDr: Semiparametric Dimension Reduction via Orthogonality Constrained Optimization", The R Journal, 2019
BibTeX citation
@article{RJ-2019-006, author = {Zhu, Ruoqing and Zhang, Jiyang and Zhao, Ruilin and Xu, Peng and Zhou, Wenzhuo and Zhang, Xin}, title = {orthoDr: Semiparametric Dimension Reduction via Orthogonality Constrained Optimization}, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {2}, issn = {2073-4859}, pages = {24-37} }