A random forest-based approach for survival curves comparing: principles, computational aspects and asymptotic time complexity analysis

Lubomír Štěpánek; Filip Habarta; Ivana Malá; Luboš Marek

A random forest-based approach for survival curves comparing: principles, computational aspects and asymptotic time complexity analysis

Lubomír Štěpánek, Filip Habarta, Ivana Malá, Luboš Marek

DOI: http://dx.doi.org/10.15439/2021F89

Citation: Proceedings of the 16th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 25, pages 301–311 (2021)

Full text

Abstract. The log-rank test and Cox's proportional hazard model can be used to compare survival curves but are limited by strict statistical assumptions. In this study, we introduce a novel, assumption-free method based on a random forest algorithm able to compare two or more survival curves. A proportion of the random forest's trees with sufficient complexity is close to the test's p-value estimate. The pruning of trees in the model modifies trees' complexity and, thus, both the method's robustness and statistical power. The discussed results are confirmed using a simulation study, varying the survival curves and the tree pruning level.

References

E. L. Kaplan and Paul Meier. “Nonparametric Estimation from Incomplete Observations”. In: Journal of the American Statistical Association 53.282 (June 1958), pp. 457–481. http://dx.doi.org/10.1080/01621459.1958.10501452. URL : https://doi.org/10.1080/01621459.1958.10501452.
Huimin Li, Dong Han, Yawen Hou, et al. “Statistical Inference Methods for Two Crossing Survival Curves: A Comparison of Methods”. In: PLOS ONE 10.1 (Jan. 2015). Ed. by Zhongxue Chen, e0116774. http://dx.doi.org/10.1371/journal.pone.0116774. URL: https://doi.org/10.1371/journal.pone.0116774.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Com- puting. Vienna, Austria, 2017. URL: https://www.R-project.org/.
Therneau T. survival: A Package for Survival Analysis in R. Vienna, Austria, R package version 3.1-12. URL: https://CRAN.R-project.org/package=survival/.
F. Kong. “Robust covariate-adjusted logrank tests”. In: Biometrika 84.4 (Dec. 1997), pp. 847–862. http://dx.doi.org/10.1093/biomet/84.4.847. URL: https://doi.org/10.1093/biomet/84.4.847.
Rui Song, Michael R. Kosorok, and Jianwen Cai. “Robust Covariate-Adjusted Log-Rank Statistics and Corresponding Sample Size Formula for Recurrent Events Data”. In: Biometrics 64.3 (Dec. 2007), pp. 741–750. http://dx.doi.org/10.1111/j.1541- 0420.2007.00948.x. URL : https://doi.org/10.1111/j.1541-0420.2007.00948.x.
Richard Peto and Julian Peto. “Asymptotically Efficient Rank Invariant Test Procedures”. In: Journal of the Royal Statistical Society. Series A (General) 135.2 (1972), p. 185. DOI : 10.2307/2344317. URL: https://doi.org/10.2307/2344317.
Georg Heinze, Michael Gnant, and Michael Schemper. “Exact Log-Rank Tests for Unequal Follow-Up”. In: Biometrics 59.4 (Dec. 2003), pp. 1151–1157. http://dx.doi.org/10.1111/j.0006-341x.2003.00132.x. URL: https://doi.org/10.1111/j.0006-341x.2003.00132.x.
Song Yang and Ross Prentice. “Improved Logrank-Type Tests for Survival Data Using Adaptive Weights”. In: Biometrics 66.1 (Apr. 2009), pp. 30–38. http://dx.doi.org/10.1111/j.1541-0420.2009.01243.x. URL: https://doi.org/10.1111/j.1541-0420.2009.01243.x.
Chenxi Li. “Doubly robust weighted log-rank tests and Renyi-type tests under non-random treatment assignment and dependent censoring”. In: Statistical Methods in Medical Research 28.9 (July 2018), pp. 2649–2664. http://dx.doi.org/10.1177/0962280218785926. URL: https://doi.org/10.1177/0962280218785926.
Donald G. Thomas. “Exact and asymptotic methods for the combination of 2 × 2 tables”. In: Computers and Biomedical Research 8.5 (Oct. 1975), pp. 423–446. http://dx.doi.org/10.1016/0010-4809(75)90048 - 8. URL : https://doi.org/10.1016/0010-4809(75)90048-8.
Cyrus R. Mehta, Nitin R. Patel, and Robert Gray. “Computing an Exact Confidence Interval for the Common Odds Ratio in Several 2 × 2 Contingency Tables”. In: Journal of the American Statistical Association 80.392 (Dec. 1985), p. 969. DOI : 10.2307/2288562. URL: https: //doi.org/10.2307/2288562.
Lubomír Štěpánek, Filip Habarta, Ivana Malá, et al. “Analysis of asymptotic time complexity of an assumption-free alternative to the log-rank test”. In: Proceedings of the 2020 Federated Conference on Computer Science and Information Systems. IEEE, Sept. 2020. http://dx.doi.org/10.15439/2020f198. URL : https://doi.org/10.15439/2020f198.
Karl Mosler. Multivariate dispersion, central regions, and depth : the lift zonoid approach. New York: Springer, 2002. ISBN: 0387954120.
Tomasz Smolinski. Computational intelligence in biomedicine and bioinformatics : current trends and applications. Berlin: Springer, 2008. ISBN: 978-3-540-70776-9.
Alexander Kulikov. Combinatorial pattern matching : 25th annual symposium, CPM 2014 Moscow, Russia, June 16-18, 2014, proceedings. Cham: Springer, 2014. ISBN: 978-3-319-07565-5.
Nihal Ata Tutkun and Muhammet Tekin. “Cox Regression Models with Nonproportional Hazards Applied to Lung Cancer Survival Data”. In: Hacettepe Journal of Mathematics and Statistics Volume 36 (Jan. 2007), pp. 157–167.
Leo Breiman. “Random Forests”. In: Machine Learning 45.1 (2001), pp. 5–32. http://dx.doi.org/10.1023/a:1010933404324. URL : https://doi.org/10.1023/a:1010933404324.
D. R. Cox. “Regression Models and Life-Tables”. In: Journal of the Royal Statistical Society. Series B (Methodological) 34.2 (1972), pp. 187–220. ISSN : 00359246. URL: http://www.jstor.org/stable/2985181.
Lubomír Štěpánek, Filip Habarta, Ivana Malá, et al. “A Machine-learning Approach to Survival Time-event Predicting: Initial Analyses using Stomach Cancer Data”. In: 2020 International Conference on e-Health and Bioengineering (EHB). IEEE, Oct. 2020. http://dx.doi.org/10.1109/ehb50910.2020.9280301. URL: https://doi.org/10.1109/ehb50910.2020.9280301.
Xiaonan Xue, Xianhong Xie, Marc Gunter, et al. “Testing the proportional hazards assumption in case-cohort analysis”. In: BMC Medical Research Methodology 13.1 (July 2013). DOI : 10.1186/1471-2288-13-88. URL: https://doi.org/10.1186/1471-2288-13-88.
Leo Breiman. Classification and regression trees. New York: Chapman & Hall, 1993. ISBN: 9780412048418.
Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Evaluation of facial attractiveness for purposes of plastic surgery using machine-learning methods and image analysis”. In: 2018 IEEE 20th International Conference on e-Health Networking, Applications and Services (Healthcom). IEEE, Sept. 2018. DOI : 10.1109/healthcom.2018.8531195. URL: https://doi.org/10.1109/healthcom.2018.8531195.
Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Machine-learning at the service of plastic surgery: a case study evaluating facial attractiveness and emotions using R language”. In: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems. IEEE, Sept. 2019. http://dx.doi.org/10.15439/2019f264. URL: https://doi.org/10.15439/2019f264.
Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Machine-Learning and R in Plastic Surgery – Evaluation of Facial Attractiveness and Classification of Facial Emotions”. In: Advances in Intelligent Systems and Computing. Springer International Publishing, Sept. 2019, pp. 243–252. DOI : 10.1007/978-3-030-30604-5_22. URL : https://doi.org/10.1007/978-3-030-30604-5_22.
Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Machine-learning at the service of plastic surgery: a case study evaluating facial attractiveness and emotions using R language”. In: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems. IEEE, Sept. 2019. http://dx.doi.org/10.15439/2019f264. URL: https://doi.org/10.15439/2019f264.
Lubomír Štěpánek, Pavel Kasal, and Jan Měšt’ák. “Evaluation of Facial Attractiveness after Undergoing Rhinoplasty Using Tree-based and Regression Methods”. In: 2019 E-Health and Bioengineering Conference (EHB). IEEE, Nov. 2019. http://dx.doi.org/10.1109/ehb47216.2019.8969932. URL: https://doi.org/10.1109/ehb47216.2019.8969932.