A random forest-based approach for survival curves comparing: principles, computational aspects and asymptotic time complexity analysis
Lubomír Štěpánek, Filip Habarta, Ivana Malá, Luboš Marek
DOI: http://dx.doi.org/10.15439/2021F89
Citation: Proceedings of the 16th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 25, pages 301–311 (2021)
Abstract. The log-rank test and Cox's proportional hazard model can be used to compare survival curves but are limited by strict statistical assumptions. In this study, we introduce a novel, assumption-free method based on a random forest algorithm able to compare two or more survival curves. A proportion of the random forest's trees with sufficient complexity is close to the test's p-value estimate. The pruning of trees in the model modifies trees' complexity and, thus, both the method's robustness and statistical power. The discussed results are confirmed using a simulation study, varying the survival curves and the tree pruning level.
References
- E. L. Kaplan and Paul Meier. “Nonparametric Estimation from Incomplete Observations”. In: Journal of the American Statistical Association 53.282 (June 1958), pp. 457–481. http://dx.doi.org/10.1080/01621459.1958.10501452. URL : https://doi.org/10.1080/01621459.1958.10501452.
- Huimin Li, Dong Han, Yawen Hou, et al. “Statistical Inference Methods for Two Crossing Survival Curves: A Comparison of Methods”. In: PLOS ONE 10.1 (Jan. 2015). Ed. by Zhongxue Chen, e0116774. http://dx.doi.org/10.1371/journal.pone.0116774. URL: https://doi.org/10.1371/journal.pone.0116774.
- R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Com- puting. Vienna, Austria, 2017. URL: https://www.R-project.org/.
- Therneau T. survival: A Package for Survival Analysis in R. Vienna, Austria, R package version 3.1-12. URL: https://CRAN.R-project.org/package=survival/.
- F. Kong. “Robust covariate-adjusted logrank tests”. In: Biometrika 84.4 (Dec. 1997), pp. 847–862. http://dx.doi.org/10.1093/biomet/84.4.847. URL: https://doi.org/10.1093/biomet/84.4.847.
- Rui Song, Michael R. Kosorok, and Jianwen Cai. “Robust Covariate-Adjusted Log-Rank Statistics and Corresponding Sample Size Formula for Recurrent Events Data”. In: Biometrics 64.3 (Dec. 2007), pp. 741–750. http://dx.doi.org/10.1111/j.1541- 0420.2007.00948.x. URL : https://doi.org/10.1111/j.1541-0420.2007.00948.x.
- Richard Peto and Julian Peto. “Asymptotically Efficient Rank Invariant Test Procedures”. In: Journal of the Royal Statistical Society. Series A (General) 135.2 (1972), p. 185. DOI : 10.2307/2344317. URL: https://doi.org/10.2307/2344317.
- Georg Heinze, Michael Gnant, and Michael Schemper. “Exact Log-Rank Tests for Unequal Follow-Up”. In: Biometrics 59.4 (Dec. 2003), pp. 1151–1157. http://dx.doi.org/10.1111/j.0006-341x.2003.00132.x. URL: https://doi.org/10.1111/j.0006-341x.2003.00132.x.
- Song Yang and Ross Prentice. “Improved Logrank-Type Tests for Survival Data Using Adaptive Weights”. In: Biometrics 66.1 (Apr. 2009), pp. 30–38. http://dx.doi.org/10.1111/j.1541-0420.2009.01243.x. URL: https://doi.org/10.1111/j.1541-0420.2009.01243.x.
- Chenxi Li. “Doubly robust weighted log-rank tests and Renyi-type tests under non-random treatment assignment and dependent censoring”. In: Statistical Methods in Medical Research 28.9 (July 2018), pp. 2649–2664. http://dx.doi.org/10.1177/0962280218785926. URL: https://doi.org/10.1177/0962280218785926.
- Donald G. Thomas. “Exact and asymptotic methods for the combination of 2 × 2 tables”. In: Computers and Biomedical Research 8.5 (Oct. 1975), pp. 423–446. http://dx.doi.org/10.1016/0010-4809(75)90048 - 8. URL : https://doi.org/10.1016/0010-4809(75)90048-8.
- Cyrus R. Mehta, Nitin R. Patel, and Robert Gray. “Computing an Exact Confidence Interval for the Common Odds Ratio in Several 2 × 2 Contingency Tables”. In: Journal of the American Statistical Association 80.392 (Dec. 1985), p. 969. DOI : 10.2307/2288562. URL: https: //doi.org/10.2307/2288562.
- Lubomír Štěpánek, Filip Habarta, Ivana Malá, et al. “Analysis of asymptotic time complexity of an assumption-free alternative to the log-rank test”. In: Proceedings of the 2020 Federated Conference on Computer Science and Information Systems. IEEE, Sept. 2020. http://dx.doi.org/10.15439/2020f198. URL : https://doi.org/10.15439/2020f198.
- Karl Mosler. Multivariate dispersion, central regions, and depth : the lift zonoid approach. New York: Springer, 2002. ISBN: 0387954120.
- Tomasz Smolinski. Computational intelligence in biomedicine and bioinformatics : current trends and applications. Berlin: Springer, 2008. ISBN: 978-3-540-70776-9.
- Alexander Kulikov. Combinatorial pattern matching : 25th annual symposium, CPM 2014 Moscow, Russia, June 16-18, 2014, proceedings. Cham: Springer, 2014. ISBN: 978-3-319-07565-5.
- Nihal Ata Tutkun and Muhammet Tekin. “Cox Regression Models with Nonproportional Hazards Applied to Lung Cancer Survival Data”. In: Hacettepe Journal of Mathematics and Statistics Volume 36 (Jan. 2007), pp. 157–167.
- Leo Breiman. “Random Forests”. In: Machine Learning 45.1 (2001), pp. 5–32. http://dx.doi.org/10.1023/a:1010933404324. URL : https://doi.org/10.1023/a:1010933404324.
- D. R. Cox. “Regression Models and Life-Tables”. In: Journal of the Royal Statistical Society. Series B (Methodological) 34.2 (1972), pp. 187–220. ISSN : 00359246. URL: http://www.jstor.org/stable/2985181.
- Lubomír Štěpánek, Filip Habarta, Ivana Malá, et al. “A Machine-learning Approach to Survival Time-event Predicting: Initial Analyses using Stomach Cancer Data”. In: 2020 International Conference on e-Health and Bioengineering (EHB). IEEE, Oct. 2020. http://dx.doi.org/10.1109/ehb50910.2020.9280301. URL: https://doi.org/10.1109/ehb50910.2020.9280301.
- Xiaonan Xue, Xianhong Xie, Marc Gunter, et al. “Testing the proportional hazards assumption in case-cohort analysis”. In: BMC Medical Research Methodology 13.1 (July 2013). DOI : 10.1186/1471-2288-13-88. URL: https://doi.org/10.1186/1471-2288-13-88.
- Leo Breiman. Classification and regression trees. New York: Chapman & Hall, 1993. ISBN: 9780412048418.
- Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Evaluation of facial attractiveness for purposes of plastic surgery using machine-learning methods and image analysis”. In: 2018 IEEE 20th International Conference on e-Health Networking, Applications and Services (Healthcom). IEEE, Sept. 2018. DOI : 10.1109/healthcom.2018.8531195. URL: https://doi.org/10.1109/healthcom.2018.8531195.
- Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Machine-learning at the service of plastic surgery: a case study evaluating facial attractiveness and emotions using R language”. In: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems. IEEE, Sept. 2019. http://dx.doi.org/10.15439/2019f264. URL: https://doi.org/10.15439/2019f264.
- Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Machine-Learning and R in Plastic Surgery – Evaluation of Facial Attractiveness and Classification of Facial Emotions”. In: Advances in Intelligent Systems and Computing. Springer International Publishing, Sept. 2019, pp. 243–252. DOI : 10.1007/978-3-030-30604-5_22. URL : https://doi.org/10.1007/978-3-030-30604-5_22.
- Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Machine-learning at the service of plastic surgery: a case study evaluating facial attractiveness and emotions using R language”. In: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems. IEEE, Sept. 2019. http://dx.doi.org/10.15439/2019f264. URL: https://doi.org/10.15439/2019f264.
- Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Evaluation of Facial Attractiveness after Undergoing Rhinoplasty Using Tree-based and Regression Methods”. In: 2019 E-Health and Bioengineering Conference (EHB). IEEE, Nov. 2019. http://dx.doi.org/10.1109/ehb47216.2019.8969932. URL: https://doi.org/10.1109/ehb47216.2019.8969932.