Non-parametric comparison of survival functions with censored data: A computational analysis of greedy and Monte Carlo approaches
Lubomír Štěpánek, Filip Habarta, Ivana Malá, Luboš Marek
DOI: http://dx.doi.org/10.15439/2024F223
Citation: Proceedings of the 19th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 39, pages 725–730 (2024)
Abstract. Comparison of two survival functions, which describe the probability of not experiencing an event of interest by a given time point in two different groups, is a typical task in survival analysis. There are several well-established methods for comparing survival functions, such as the log-rank test and its variants. However, these methods often come with rigid statistical assumptions. In this work, we introduce a non-parametric alternative for comparing survival functions that is nearly free of assumptions. Unlike the log-rank test, which requires the estimation of hazard functions derived from (or facilitating the derivation of) survival functions and assumes a minimum number of observations to ensure asymptotic properties, our method models all possible scenarios based on observed data. These scenarios include those in which the compared survival functions differ in the same way or even more significantly, thus allowing us to calculate the p-value directly. Individuals in these groups may experience an event of interest at specific time points or may be censored, i.e., they might experience the event outside the observed time points. Focusing on all scenarios where survival probabilities differ at least as much as observed usually requires computationally intensive calculations. Censoring is treated as a form of noise, increasing the range of scenarios that need to be calculated and evaluated. Therefore, to estimate the p-value, we compare a greedy approach that computes all possible scenarios in which groups' survival functions differ as observed or more, with a Monte Carlo simulation of these scenarios, alongside a traditional approach based on the log-rank test. Our proposed method reduces the first type error rate, enhancing its utility in studies where robustness against false positives is critical. We also analyze the asymptotic time complexity of both proposed approaches.
References
- Nathan Mantel. “Evaluation of survival data and two new rank order statistics arising in its consideration”. In: Cancer Chemotherapy Reports 50.3 (1966), pp. 163–170.
- F. Kong. “Robust covariate-adjusted logrank tests”. In: Biometrika 84.4 (Dec. 1997), pp. 847–862. http://dx.doi.org/10.1093/biomet/84.4.847.
- Rui Song, Michael R. Kosorok, and Jianwen Cai. “Robust Covariate-Adjusted Log-Rank Statistics and Corresponding Sample Size Formula for Recurrent Events Data”. In: Biometrics 64.3 (Dec. 2007), pp. 741–750. http://dx.doi.org/10.1111/j.1541-0420.2007.00948.x.
- Richard Peto and Julian Peto. “Asymptotically Efficient Rank Invariant Test Procedures”. In: Journal of the Royal Statistical Society. Series A (General) 135.2 (1972), p. 185. http://dx.doi.org/10.2307/2344317.
- Song Yang and Ross Prentice. “Improved Logrank-Type Tests for Survival Data Using Adaptive Weights”. In: Biometrics 66.1 (Apr. 2009), pp. 30–38. http://dx.doi.org/10.1111/j.1541-0420.2009.01243.x.
- Chenxi Li. “Doubly robust weighted log-rank tests and Renyi-type tests under non-random treatment assignment and dependent censoring”. In: Statistical Methods in Medical Research 28.9 (July 2018), pp. 2649–2664. DOI: 10.1177/0962280218785926.
- Donald E Knuth. “Big omicron and big omega and big theta”. In: ACM Sigact News 8.2 (1976), pp. 18–24.
- R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2023. URL: https://www.R-project.org/.