SIHR: Statistical Inference in High-Dimensional Linear and Logistic Regression Models

We introduce the R package SIHR for statistical inference in high-dimensional generalized linear models with continuous and binary outcomes. The package provides functionalities for constructing confidence intervals and performing hypothesis tests for low-dimensional objectives in both one-sample and two-sample regression settings. We illustrate the usage of SIHR through simulated examples and present real data applications to demonstrate the package’s performance and practicality.

Prabrisha Rakshit (Rutgers, The State University of New Jersey) , Zhenyu Wang (Rutgers, The State University of New Jersey) , Tony Cai (University of Pennsylvania) , Zijian Guo (Rutgers, The State University of New Jersey)
2025-05-20

0.1 Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2024-028.zip

0.2 CRAN packages used

SIHR, hdi, DoubleML, selectiveInference, glmnet

0.3 CRAN Task Views implied by cited packages

CausalInference, Econometrics, MachineLearning, Survival

M. A. Beer and S. Tavazoie. Predicting gene expression from sequence. Cell, 117(2): 185–198, 2004.
A. Belloni, V. Chernozhukov and L. Wang. Square-root LASSO : Pivotal recovery of sparse signals via conic programming. Biometrika, 98(4): 791–806, 2011.
P. J. Bickel, Y. Ritov and A. B. Tsybakov. Simultaneous analysis of lasso and dantzig selector. The Annals of statistics, 37(4): 1705–1732, 2009.
P. Bühlmann and S. van de Geer. Statistics for high-dimensional data: Methods, theory and applications. Springer Science & Business Media, 2011.
T. T. Cai and Z. Guo. Semisupervised inference for explained variance in high dimensional linear regression and its applications. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2): 391–419, 2020.
T. T. Cai, Z. Guo and R. Ma. Statistical inference for high-dimensional generalized linear models with binary outcomes. Journal of the American Statistical Association, 1–14, 2021a.
T. Cai, T. Tony Cai and Z. Guo. Optimal statistical inference for individualized treatment effects in high-dimensional models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(4): 669–719, 2021b.
E. M. Conlon, X. S. Liu, J. D. Lieb and J. S. Liu. Integrating regulatory motif discovery and genome-wide expression analysis. Proceedings of the National Academy of Sciences, 100(6): 3339–3344, 2003.
D. Das, N. Banerjee and M. Q. Zhang. Interacting models of cooperative gene regulation. Proceedings of the National Academy of Sciences, 101(46): 16234–16239, 2004.
J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96: 1348–1360, 2011.
S. van de Geer, P. Bühlmann, Y. Ritov and R. Dezeure. On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42: 1166–1202, 2014.
Z. Guo, X. Li, L. Han and T. Cai. Robust inference for federated meta-learning. arXiv preprint arXiv:2301.00718, 2023.
Z. Guo, P. Rakshit, D. S. Herman and J. Chen. Inference for the case probability in high-dimensional logistic regression. The Journal of Machine Learning Research, 22(1): 11480–11533, 2021a.
Z. Guo, C. Renaux, P. Bühlmann and T. Cai. Group inference in high dimensions with applications to hierarchical testing. Electronic Journal of Statistics, 15(2): 6633–6676, 2021b.
Z. Guo, W. Wang, T. T. Cai and H. Li. Optimal estimation of genetic relatedness in high-dimensional linear models. Journal of the American Statistical Association, 114: 358–369, 2019.
J. Huang and C.-H. Zhang. Estimation and selection via absolute penalized convex minimization and its multistage adaptive applications. Journal of Machine Learning Research, 13(Jun): 1839–1864, 2012.
A. Javanmard and A. Montanari. Confidence intervals and hypothesis testing for high-dimensional regression. The Journal of Machine Learning Research, 15(1): 2869–2909, 2014.
R. Ma, Z. Guo, T. T. Cai and H. Li. Statistical inference for genetic relatedness based on high-dimensional logistic regression. arXiv preprint arXiv:2202.10007, 2022.
N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the lasso. The annals of statistics, 34(3): 1436–1462, 2006.
N. Meinshausen and B. Yu. LASSO-type recovery of sparse representations for high-dimensional data. Annals of Statistics, 37(1): 246–270, 2009.
S. Negahban, B. Yu, M. J. Wainwright and P. K. Ravikumar. A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers. In Advances in neural information processing systems, pages. 1348–1356 2009.
T. Sun and C.-H. Zhang. Scaled sparse linear regression. Biometrika, 99(4): 879–898, 2012.
R. Tibshirani. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58(1): 267–288, 1996.
M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_1\)-constrained quadratic programming (lasso). IEEE transactions on information theory, 55(5): 2183–2202, 2009.
Y. Yuan, L. Guo, L. Shen and J. S. Liu. Predicting gene expression from sequence: A reexamination. PLoS computational biology, 3(11): e243, 2007.
C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2): 894–942, 2010.
C.-H. Zhang and S. S. Zhang. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1): 217–242, 2014.
P. Zhao and B. Yu. On model selection consistency of lasso. The Journal of Machine Learning Research, 7: 2541–2563, 2006.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Rakshit, et al., "SIHR: Statistical Inference in High-Dimensional Linear and Logistic Regression Models", The R Journal, 2025

BibTeX citation

@article{RJ-2024-028,
  author = {Rakshit, Prabrisha and Wang, Zhenyu and Cai, Tony and Guo, Zijian},
  title = {SIHR: Statistical Inference in High-Dimensional Linear and Logistic Regression Models},
  journal = {The R Journal},
  year = {2025},
  note = {https://doi.org/10.32614/RJ-2024-028},
  doi = {10.32614/RJ-2024-028},
  volume = {16},
  issue = {3},
  issn = {2073-4859},
  pages = {27-45}
}