A random forest-based approach for survival curves comparison: principles, computational aspects, and asymptotic time complexity analysis

The log-rank test and Cox’s proportional hazard model can be used to compare survival curves but are limited by strict statistical assumptions. In this study, we introduce a novel, assumption-free method based on a random forest algorithm able to compare two or more survival curves. A proportion of the random forest’s trees with sufficient complexity is close to the test’s p-value estimate. The pruning of trees in the model modifies trees’ complexity and, thus, both the method’s robustness and statistical power. The discussed results are confirmed using a simulation study, varying the survival curves and the tree pruning level.


I. INTRODUCTION
C OMPARING two or more survival curves is relatively common in many applied areas such as biomedicine, econometrics, management, and others. When the curves are statistically significantly different, it may help treat the groups that are the curves built by in appropriate (separate) ways.
As typical for survival analysis, the variable of our interest usually describes a (time) development of proportions of individuals who have not experienced the event of interest yet (until each considered time point) in each consecutive time point of the time period of our interest.
Such a time development is commonly plotted using orthogonal polygonal lines, also known as survival curves in a twodimensional (survival) plot, sometimes called Kaplan-Meier plot [1]. Since groups of individuals that are about to be compared have their own time developments of non-experiencing the event of interest, one Kaplan-Meier survival plot may include more than only one curve, as shown in Fig. 1.  Regardless of the total length of the time period of interest, it has in principle be finite and, consequently, we cannot get any piece of information whether the individuals not experiencing the event of interest in the time period would register the event after an end of the period, or not. That is why the time-to-event (survival) data are also called right censored data.
Since the experiencing of the event of interest is usually irreversible in time within the scope of classical survival analysis (the event is, e. g., a death, diagnosis, bankruptcy, failure, etc.), a subject registering the event whenever in the time period of the interest, continues to stay in this state till the end of the referenced time (the time of the right censoring). Thus, the survival curves are monotonous and nonincreasing. Intuitively, when the development of the event of interest differs between two groups, we may expect their survival curves are hardly similar and relatively far from each other within each considered time point.
If there are two survival curves to be statistically compared, the log-rank test as a tool of choice is usually performed [2] and commonly implemented, e. g. in R language and statistical environment [3] and its library survival [4]. However, a usage of the log-rank test is limited by statistical assumptions, that (i) censoring should not affect anyhow the observed events and (ii) the censoring should occur equally or at least near-equally in both compared groups, generating the survival curves. Also, (iii) the group's sizes are expected to be large enough, enabling the log-rank test's χ 2 statistics to fulfill its asymptotic properties.
To overcome the limitations done by the statistical assumptions of the original log-rank test or to increase its robustness or statistical power, several modifications of the traditional log-rank test were published. The first approach is to modify the hazard functions slightly, i. e., functions of rates of events based on fixed proportions of the events in the past, and relax their assumptions to increase their robustness as suggested by [5]. Then, another option is to introduce new covariates (variables) to enrich the model comparing two survival curves and increase the robustness, published in [6]. Also, employing various weighting schemes for individual observations, usually growing significance for earlier ones, may increase the statistical power of the test as investigated originally in [7] and then improved in [8], [9] and [10]. Finally, robust combinatorial and exact calculations of all possible combinations of the event experiencing and non-experiencing subjects for given total numbers of subjects in the compared groups are researched in [11] and [12] and using survival curves' finite combinatorial geometry in [13].
An advantage of the latter approach, based on robust combinatorial computations when comparing two survival curves, is that makes possible an estimation of asymptotic time complexity, as is in details commented by [14], [15], [16] and partly by [13], too.
When one wants to statistically compare three or more survival curves, there is an option to use Cox's proportional hazard model or a score-rank test based on Cox's model. Unfortunately, Cox's proportional model is also limited -it assumes that hazard proportions for each group are constant across all considered time points, which is often not met in practice. Some more robust versions of Cox's model were derived to minimize the violation of the constant hazards' ratio by real-world data, e. g. based on an idea of stratification of each group into subgroups according to their hazard similarity; however, those advanced models are usually limited by other, more complex assumptions [17], though.
The decision (or regression) trees and random forests are classical algorithms used for classification or regression problems. An idea to apply decision trees and random forest on survival tasks and right-censored, time-to-event data originate from [18], but initial thoughts rather aimed to a robust estimation of hazard functions' parameters, e. g. Nelson-Aalen estimator etc. The decision trees and random forests are naturally assumption-free robust, and fully non-parametric, especially in comparison to the log-rank test or the Cox's proportional hazard model, which is a property also utilized in this study and by the proposed alternative method for survival curves comparison.
The proceeding proceeds as follows. Firstly, in the section Traditional methods for survival curves comparing and random forests revisited, we shortly remind the fundamental principles of the log-rank test, Cox's proportional hazard model, decision trees, and random forests. We also discuss assumptions and limitations of the named methods that create room for new approaches that are less dependent on statistical assumptions.
Then, in the section The proposed method for survival curves comparing, we introduce a novel alternative for two or more survival curves comparing, based on random forestbased generating of multiple decision trees, using variables derived from original time-to-event data of compared groups of individuals in their nodes. The level of the trees' pruning is adjustable as a hyperparameter; it enables to control a complexity of the trees, i. e., an average number of nodes and leaves per tree in the forest. If a given tree in the random forest is able to classify whatever new observation in each of the groups (described by its survival curve), i. e., there exists at least one leaf node for each group assigning the observation to such a group, that tends to be contradictory the null hypothesis, claiming there are no statistical differences between the groups (and their survival curves). A proportion of the trees with sufficient complexity to all trees in the forest serves by definition as an estimate of p-value as would be analogously 1 returned by the traditional log-rank test, i. e., a conditional probability of collecting data as extreme or even more given there is no difference between the survival curves. Since the p-value is partially determined by the proportion of sufficiently complex trees to all trees of the random forest, the level of pruning may affect the robustness or statistical power of the proposed random forest-based inference test, as is discussed more in details later.
The asymptotic time complexity of the p-value estimation, assuming the random forest model building, is then derived, and, finally, in the section Simulation study, a preliminary simulation study is performed to confirm the theoretically derived properties of the method. Besides others, the introduced approach offers a way how to compare more than two survival curves without any assumptions needed to be met.

II. TRADITIONAL METHODS FOR SURVIVAL CURVES
COMPARING AND RANDOM FORESTS REVISITED Firstly, we remind principles and assumptions of the logrank test and Cox's proportional hazard model that facilitates a better understanding of their limitations, which, consequently, opens room for improvements in survival curves comparing. We also recapitulate the logic of the decision trees and random forest heavily equipped in the proposed method for survival curves comparison.

A. Principles, assumptions, and limitations of the log-rank test
Principles of the log-rank test. Let's assume k distinct time points where the event of interest could take a place; the j-th time point is marked as t j , where j ∈ {1, 2, 3, . . . , k}, and all the time points are ordered in a tuple (t 1 , t 2 , . . . , t k ) T . Also, let's suppose there are two groups of subjects, marked by subscripts 1 and 2, respectively. For each of the time points, let's say for the j-th one (t j ) there are r 1,j and r 2,j individuals at risk (they have not experienced the event of interest yet or have been censored) in the group 1 and group 2, respectively, and d 1,j and d 2,j individuals who experienced the event in the group 1 and group 2, respectively. Thus, following the previous logic, we can construct a (contingency) table I.
The log-rank test checks the null hypothesis H 0 that both groups experienced identical rates of the events of interest in time (also called hazard functions) [2], conditional on fixed rates in the past are the same. Under the null hypothesis H 0 , the observed numbers of individuals experiencing the events could be considered as random variables D 1,j and D 2,j following a hypergeometric distribution with parameters (r j , r i,j , d j ) for both i ∈ {1, 2}. Thus, the expected value of the variable D i,j is E(D i,j ) = r i,j dj rj and variance is var(D i,j ) = which follows under H 0 a χ 2 distribution with 1 degree of freedom, χ 2 log-rank ∼ χ 2 (1). For feasible large r j , i. e. at least r j ≥ 30, a square root of χ 2 log-rank follows a standard normal distribution, χ 2 log-rank ∼ N (0, 1 2 ). Since χ 2 log-rank ∼ χ 2 (1), the statistics χ 2 log-rank can be uniquely transformed into p-value, which stands for a conditional probability of obtaining the test statistics χ 2 log-rank at least as extreme as the statistics actually observed, under the assumption that the null hypothesis H 0 reflects the reality.
Assumptions and limitations of the log-rank test. The right censoring of the data should not affect the occurrences of the event of interest in both groups anyhow. Also, the proportions of censored observations are supposed to be of (nearly) equal size in both groups. Otherwise, the test statistic χ 2 log-rank calculated using (1) could be biased for i = 1, or for i = 2.
Then, putting together the equation (1), so the test statistic χ 2 log-rank follows a χ 2 distribution, and the table I, both the initial total number of individuals r 0 at risk and initial number r 0 − d 0 not experiencing the event, should be large enough. Otherwise, so-called Cochrane criteria for minimal sample size for χ 2 tests are not met and the χ 2 log-rank statistics could not fulfill the χ 2 asymptotic properties; or, analogously, both the numerator and denominator of the statistics (1) are relatively small and an estimate of the χ 2 log-rank statistics is numerically unstable.
All the named issues may decrease the robustness or statistical power of the log-rank test.
Furthermore, by investigating the denominator of the equation (1), we can easily realize the test statistic χ 2 log-rank is the highest when the denominator k j=1 var(D i,j ) is as low as possible given the values d i,j and r i,j for all i ∈ {1, 2} and j ∈ {1, 2, 3, . . . , k}. This holds just when the proportions r1,j rj = r1,j r1,j +r2,j and r2,j rj = r2,j r1,j +r2,j are both constant (and mutually different enough) across all the time points (t 1 , t 2 , . . . , t k ) T , and then the log-rank test is the most statistically powerful, i. e. its ability to reject the null hypothesis H 0 claiming the survival curves are equivalent, when they are in fact different, is maximal possible. That is common issue decreasing the test power -the mentioned proportions are typically not constant when a "trend" of the survival curves change a lot, when the curves change their mutual distance or when they even cross themselves one or more times.

B. Principles, assumptions, and limitations of Cox proportional hazard model
Principles of Cox proportional hazard model. The Cox proportional hazard model is frequently used to model relationships between the hazard function of the event of interest, defined as a probability that a subject experiences the event of interest in a small time interval, given that the individual survived up to the beginning of the interval, and explanatory variables. If one of the explanatory variables is categorical, thus dividing an entire sample into two or more groups, then the Cox proportional hazard model could serve as a method for statistical comparing of more than two groups and their survival curves. The hazard function h(t) depending on explanatory variables as suggested by Cox [19], follows for individual i form where h 0 (t) is the baseline hazard function, β T = (β 0 , β 1 , β 2 , . . . ) T is a vector of estimated linear coefficients to explanatory variables and T is a vector of values of the explanatory variables for group i. The formula (2) could be after exponentiation rewritten also as by which we can see for two groups 1 and 2 that , thus, the hazard ratio for any two groups 1 and 2 is forced to be constant, considering the model (2) and a fact that once estimated coefficientsβ T = (β 0 ,β 1 ,β 2 , . . .) T and input data x i = (1, x i,1 , x i,2 , . . .) T are given, therefore constant. The parameters in the Cox model (2) can be estimated by a partial likelihood [20].
When exists j ∈ {1, 2, 3, . . .} so that β j is a linear coefficient of a categorical variable classifying observations into two or more groups (with their survival curves), then one can consider the Cox approach as an alternative for the log-rank test with the exception there are more than two survival curves to be compared. Wald t-tests indicate significant statistical differences between the categorical variable levels, thus also in groups' survival curves.
Assumptions and limitations of Cox proportional hazard model. However, while Cox's regression is widely used for event prediction in survival analysis or for comparing more than two survival curves, it has rigid statistical limitations [21]. Particularly, Cox's model assumes that ratios of hazards for any two subjects (individuals or groups) are constant across all time points; that is why the model is called "proportional hazard". However, real survival data often violate this assumption. For instance, supposing two survival curves for two groups as in Fig. 2, such that the curves cross each other, their hazards could not be proportionally constant. Even more, when one of the curves drops to zero while the other levels off similarly to Fig. 3, also, the ratio of the hazard functions could not be constant.

C. Principles of the random forests
Before we conclude up basic principles of the random forests' algorithm, we remind fundamental pieces of knowledge about decision trees that build up a random forest model.
Principles of the decision trees. The decision trees (also called classification trees) that belong to the CART family of trees (classification and regression trees) are sets of rules that partition the hyperspace of all explanatory variables into disjunctive hyper-rectangles and fit simple (constant) models there, each time minimizing a given criterion [22].
More specifically, the decision trees classify an observation depicted by a vector of values The logic behind a tree induction is described by the flowchart in Fig. 4. Initially, one root node is set, and the tree induction algorithm searches for a node decision rule, i. e. such an explanatory variable and a logical formula containing the explanatory (splitting) variable and its relationship to some constant or subset that minimizes a given criterion. When the optimal node rule is found, the node rule enables to split (binary partition) the dataset into two parts following the logic of the slitting variable and splitting point (the first part contains values larger than or equal to the splitting point, the other contains the rest of dataset). Two new child nodes for Let σ(•) j be a proportion of a target class j in all observations constrained by rules coming from the root till the node n t . If the node n t is a leaf one, it classifies into the class j * so that j * = argmax j∈{1,2,...,m} {σ(•) j }.
The given criterion minimized in searching for node n t rule is an impurity measure, Q nt (T ), such as misclassification error We can easily see that the higher the σ(•) j as a proportion of a target class j in the node n t is, the lower whatever kind of the named impurity measures is, as expected.
Following the logic of the top-down induction of a decision tree depicted in Fig. 4, a final tree cannot have lower than maximal possible complexity; even a leaf node including only two observations of two different target classes is once more split into two child leaf nodes. To overcome this issue, overfitting, besides some naive approaches like a fixed maximal number of nodes per a tree, etc., a procedure called pruning is frequently applied. The pruning is based on numerical estimating of the statistics cost-complexity function following the form where {n t } is a set of leaf nodes of the tree and {x nt } is a set of all observations constrained by rules coming from the root till the node n t . The idea is to find a subtree T κ so that T κ ⊂ T for a given κ that minimizes the statistics C κ (T ), i. e. T κ = argmin T nt∈{nt} |{x nt }| · Q nt (T ) + κ · |{n t }| . The κ ≥ 0 is a hyperparameter (a tuning parameter) and governs the trade-off between a tree complexity or size (low values of κ) and goodness of fit to the data (large values of κ).
Principles of the random forests. Once we can generate classification trees as described above, construction of a random forest is relatively easy. Random forests are finite sets of (distinct) decision trees so that each tree classifies an observation depicted by a vector of values x i = (x i,1 , x i,2 , . . . , x i,k ) T for k explanatory variables into one of m < ∞ target classes [18]. The eventual classification into the final class is done using a voting scheme -the final class j * ∈ {1, 2, . . . , m} is the one that a subset of the random forest's trees classifying just into the class j * is the largest one among all subsets of the random forest's trees. More technically, j * = argmax j∈{1,2,...,m} {# of trees classifying into the class j}. In case of a tie, i. e. there are two or more target classes the forest's trees would classify with maximum frequency into, one of them is picked randomly.
A bit different in the random forest's tree induction is the fact that only k * < k variables are considered as possible splitting variables in each searching for the node rule. Instead, the subset of k * variables of the original set of all k explanatory variables is selected randomly using bootstrapping to ensure the pre-selected k * variables are as much uncorrelated as possible. Other details of the trees inductions are the same as described above. A flowchart of the random forest model building is in Fig. 5.
Assumptions and limitations of the trees and forests. There are neither other technical assumptions nor limitations of the random forests usage worth to be discussed.

COMPARING
We introduce the novel method for statistical comparison of two or more time-to-event developments of individuals' groups, depicted by their survival curves.
Firstly, data that are on the input of the method have to be transformed. Each individual is originally described using their group affiliation, a time to event of interest (or to censoring), and whether they experienced the event of interest (or have been censored). Then, using the original data, for each individual, a sequence of weighted point estimates of probabilities that they did not experience the event of interest in a given time point and the group affiliation is created. That enables introducing new variables (their number is equal to the number or all considered time points) that are used as splitting variables in tree inductions when a random forest model is built.
Once the data are transformed, a random forest model is constructed. Each tree of the random forest either can classify into two or more classes that are represented as the group affiliations, or cannot to classify into any classes at all (then it is necessarily a root node tree), based on its complexity (size). See also Fig. 6.
The more trees of sufficient complexity able to classify into the classes (equal to the groups of individuals, described by their survival curves melted into the transformed variables as mentioned above) are in the forest, the more likely we can reject the null hypothesis that there is no difference between the survival curves (or the groups of individuals' time-to-event development). Thus, a proportion of trees that classify into all the classes, to all the trees of the random forest is very close to a point estimate of the p-value, i. e. the probability we incorrectly reject the null hypothesis of no difference between the survival curves, assuming the null hypothesis is true. Thus, the p-value is a probability of a wrong decision and should be as low as possible whenever we consider rejecting the null hypothesis. The first type error rate, i. e. the incorrect rejection of the null hypothesis when it is true, can be controlled by setting the parameter κ of the random forest's tree complexity (or the tree pruning).
So, the proposed method fulfills all feasible demands on inference testing. We also discuss some of the method's properties, particularly its asymptotic time complexity. The first type error rate is simulated in the simulation study with varying κ tuning parameters. The introduced method is able to compare more than two survival curves, and since it utilizes a random forest tree-based algorithm, it is practically assumption-free. This is where it surpasses both the log-rank test and Cox's regression. 6. An example of a root node tree (on the left) not capable to classify into any class unambiguously, and an example of a tree with "sufficient" complexity (on the right) able to classify into two classes (j = 1 and j = 2).

A. Data transformation and preparation for random forest model building
Initial time-to-event survival data includes n observations; for each of them, we have a piece of information about the time to the event of interest (or to the censoring) and whether the event of interest or the censoring occurred. By adopting the mathematical notation from the section about the log-rank test, for each considered time point t j , where j ∈ {1, 2, . . . , k} and k ∈ N, we can calculate for the group i, where i ∈ {1, 2, . . . , m}, a proportion r i,j of individuals that are at risk (of the event of interest or the censoring) in the j-th time point. Similarly, one can estimate a for the group i, where i ∈ {1, 2, . . . , m}, a proportion d i,j of individuals who experienced the event of interest (or the censoring) in the j-th time point. Putting those estimates together, for the group i, where i ∈ {1, 2, . . . , m}, we can make a point estimate of a probabilityp i,j that an individual from the group would not experience the event of interest (or the censoring) in the j-th time point, sop Such an estimate is made k-times for all time points {t 1 , t 2 , . . . , t k }, by getting (p i,1 ,p i,2 , . . . ,p i,k ), but such a vector of values is common for all individuals of the group i. However, it could be personalized using an operator δ ν,j for ν-th individual, where ν ∈ {1, 2, . . . , n}, following the form assuming that ν-th individual belongs to the group i. So by modifying the formula (4) using the operator δ ν,j we get The logic of the formula (5) enables to get mutually distinct vectors of values (δ ν,1pi,1 , δ ν,2pi,2 , . . . , δ ν,kpi,k ) T for each individual in the group i, which increases natural variability of the data.
Finally, still assuming that ν-th individual belongs to the group i, where ν ∈ {1, 2, . . . , n}, there are n new vectors (δ ν,1pi,1 , δ ν,2pi,2 , . . . , δ ν,kpi,k ) T that could be arranged in a matrix of n rows and k columns, which creates a new dataset suitable as an input for the decision tree induction; the k variables could serve as possible splitting variables in the trees' nodes. The j-th variable of the dataset could be interpreted as a personalized point estimate of probability of non-experiencing the event of interest. The (k + 1)-th variable in the dataset is a target one -categorical variable describing a group affiliation i ∈ {1, 2, . . . , m} of each observation 2 .

B. Construction of the random forest model behind the novel method
The random forest model is built following the algorithms sketched in Fig. 4 and Fig. 5. Variables used as node splitting variables come from the newly created dataset, containing k "explanatory" variables and a target one, as described more in the previous subsection.
Number t ∈ N as a count of the trees in the random forest as well as the level of the trees' pruning determined by parameter κ ≥ 0 may vary, as is more explained later.

C. Statistical inference behind the novel method
As already mentioned, the main purpose of the introduced method is to statistically compare two or more survival curves depicting a time-to-event development of distinct groups of individuals. Intuitively, when a large number of the (adequately pruned) trees involved in the random forest model is able to classify into two or more classes, i. e. groups determined by their survival curves, then it is hard to suppose the groups and their survival curves are (statistically) without any difference.
Similarly to the log-rank test or the Cox's regression, let the null hypothesis H 0 claim that there is no statistical difference between the m > 1 survival curves 3 , and let the alternative hypothesis H 1 claim the contradiction, so  Whenever the log-rank test or the Cox's model based on Wald t-test rejects -based on test statistics -the appropriate 2 Make a note that across the entire paper, the mathematical notation is consistent -there are m groups depicted by their survival curves, but there are also m target classes of decision trees. Furthermore, there are k time points, and for each of them, a new variable is created within the transformation to the new dataset, thus containing k variables. Finally, the bootstrap behind the random forest model construction also pre-selects k * < k node variables. 3 Two or more survival curves; in general m ∈ {2, 3, 4, . . .} curves.
null hypothesis H 0 in favor of the alternative hypothesis H 1 , is equivalent to a situation the test's p-value is lower than or equal to an apriori set level of significance α, usually equal to 0.05. Since the introduced method is in practice assumptionfree and non-parametric, the only way to evaluate the statistical inference about the null hypothesis is to estimate the p-value and compare it to the previously set significance level α. By definition, the p-value is a probability of gaining data evidence at least as extreme as the data evidence actually observed, under the assumption the null hypothesis is true. Let t c be a number of trees in the random forest that are in contradiction to the null hypothesis under the null hypothesis. The random forest contains exactly t trees. We can easily realize that, given the value for the κ parameter, the value of t c is equal to the number of all the trees classifying into more than only one class (which is naturally in contradiction to the null hypothesis). Let the n c (τ ) be a number of classes the tree τ classifies into. Then we can derive t c = |{∀ tree ∈ random forest : n c (tree) ≥ 2.}| Then, assuming all trees are inducted randomly regardless of their complexity, the p-value is estimated byp so that p = P (getting data at least as extreme as the observed | H 0 ) = = P (|{∀ tree ∈ random forest : Thus, from the formula (6) results that the p-value's estimate is equal to the fraction of 1− tc−1 t . That result is also intuitive. If the initial number t c of trees in the random forest that are complex enough and classify into two or more classes (and more groups with their survival curves) is in general low, then such a random forest as an entire model is not "so much" in contradiction to the null hypothesis, claiming there are no differences between the classes (and survival curves). Finishing the idea, since the t c is relatively low, then the fraction p-value = 1 − tc−1 t is relatively large, close to 1 and unlikely to be lower than α(= 0.05) which is required for the null hypothesis rejection. On the other hand, when the initial value of t c is large, i. e. there are many trees in the forest with sufficient complexity classifying into two or more classes (and thus, standing against the null hypothesis), then -because of the large value of t c -the fraction p-value = 1 − tc−1 t is relatively low and likely below the level α. That likely results in the null hypothesis rejection.
The number of trees t in the random forest determines maximum decimal precision of the p-value estimate. When the precision of d decimal digits is required for the p-value estimate, then t has to be t > 10 d or better t > 10 d+1 to ensure the next-to-last digit (as the d-th decimal digit) is feasibly estimated. The κ parameter determines how complex the trees in the random forest would be, i. e. how significant the pruning of the trees should be. Inspecting the formula (3), we can simply realize that if κ = 0, then there is no cost for large tree complexity and the trees in the random forest are generally very complex (of large size). Then, whenever there are at least two observations in the transformed dataset so the they are assigned to different two groups, all the trees (because of the unlimited complexity) in the forest would classify those observations into their groups (classes), i. e. that for each tree τ is n c (τ ) ≥ 2, which results into the equity t c = t and, thus, If p-value ≈ 0, then also p-value ≈ 0 < α which, consequently, tends to rejection of the null hypothesis, very likely a false rejection that increases the first error type rate. However, high chance of the null hypothesis rejection means also the high statistical power, i. e. the rejection of the null hypothesis when this is not true.
If κ > 0, then in general the trees' complexity (size) decreases and also not all of the trees are complex enough to classify into more than one class (the are only root node trees); this means that there are trees τ in the random forest so that n c (τ ) ≤ 1, and, finally, t c < t. So, p-value estimate is p-value = 1 − tc−1 t > 0 and it could be, but also could not be below α.
To conclude this, low values of κ tend to decrease values of p-value and increase the statistical power and the first type error rate, and vice versa. However, the more exact relationship between κ and α could be only roughly estimated using simulations due to the stochastic character of the random forests.

D. A brief asymptotic time complexity analysis and fundamental approaches on the p-value estimation
An atomic unit of the random forest model is a decision tree, inducted following the flowchart 4 and algorithm 1. As long as there is a node containing data of ≥ 2 classes, constrained by all node rules coming from root to the node, the data splitting and growing of the tree continues. If the classes in the data are well balanced as well as the growing tree, the splitting partitions the subdatasets roughly in halves, and the average depth of the tree would be log n and the time complexity would be Θ(log n), assuming one split of a node takes 1 time unit. However, on the other hand, when the classes are not well balanced across the dataset, the splitting cuts the subdatasets into 1 and n − 1 observations, which takes n steps in total and the depth of the tree is n. Consequently, the asymptotic time complexity is Θ(n), assuming one split of a node takes 1 time unit.
Within each node splitting, both for a splitting variable among k variables and through the sample size n is searched, the time complexity Θ(•) of a decision tree building is somewhere in between being in Θ(k · n · log n) (the best-case scenario) and Θ(k · n · n) (the worst-case scenario), so that Θ(kn log n) ≤ Θ(•) ≤ Θ(kn 2 ). When a random forest model containing t trees is constructed, the tree induction as introduced above is repeated t times. That being said, the asymptotic time complexity Θ(••) of a random forest model building is in between One model of the random forest provides one (point) estimate of the p-value using the formula (6). In comparison, the estimation of the χ 2 statistics using the formula (1) takes only Θ(2k + 1) time units since is based on a ratio of two summations of k elements. Fortunately, the time complexity (7) is still polynomial. Furthermore, the building of the random forest with the complexity of (7) could be parallelized; then, asymptotic memory complexity rather than the time complexity could become an issue. In theory, if the random forest building would be parallelized into ℓ ≤ t independent slave processes each inducting a bunch of t ℓ trees, the time complexity (7) would be reduced to Θ t ℓ kn log n ≤ Θ(••) ≤ Θ t ℓ kn 2 . Eventually, for ℓ = t, the random forest building could take the same computing time as only one single tree induction, When we want to estimate the p-value rather using a confidence interval than only using a point, we need to repeat the random forest building many times, let us say f ≫ 0 times. As a result, we get a set of random forests that might also be called a primeval superforest of random forests. The

308
PROCEEDINGS OF THE FEDCSIS. ONLINE, 2021 primeval superforest of random forests construction is of a time complexity Θ(• • •), so that However, for a given dataset, a point estimate of the p-value is usually supposed to suffice for purposes of routine statistical inference. The primaveral superforest of random forests of the complexity (8) may be applied rather for experimental reasons when e. g., a posterior distribution of the p-values is about to be investigated.

IV. SIMULATION STUDY
We compared the log-rank test and the proposed method using several simulations of many pairs of survival curves to get preliminary simulated results, although the method -in contrast to the log-rank test -can compare more than only two survival curves. The curves in pairs were assumed they were not significantly different. We calculated the first type error rates, i. e., rates of false test results that two statistically non-different survival curves are (falsely) detected as different. Also, the lower value of the first type error is, the more robust such a method is. The simulation was repeated for different κ parameter values to illustrate how the value of κ determines the first type error rates.
For generating of the pairs of survival curves, we applied the negatively exponential survival function as follows, where ε is a random white noise term following a standard normal distribution, ε ∼ N (0, 1 2 ), and ρ(•) is a function rounding its argument to the nearest multiplier of 0.01 using a half rule, e. g. σ(0.012) = 0.01, σ(0.350) = 0.35 or σ(0.048) = 0.05. A group of negatively exponential survival functions following the formula s(t) = ρ e − 5+ε 200 t is in Fig. 7.  There were η = 1000 pairs of significantly non-different survival curves generated in total, and for each κ ∈ {0.1, 0.3, 0.5, 0.7, 0.9}, the curves were compared using the log-rank test and the above-proposed method. The number of trees in each random forest was always t = 1000. Numbers of cases where p-value was lower than or equal to α = 0.05 regardless of the method were summed up, by which we got the point estimates of the first type error rates, as illustrated in table II. The simulation study was performed using R programming language and environment [3]. More on numerical applications of R language to various areas is in [23]- [27]. While the log-rank test returned a point estimate of the first type error rate about 0.050 (regardless of κ since the χ 2 statistics following the formula (1) is not a function of the κ), point estimates of the first type error rates output by the introduced method progressively decreased with increasing value of κ, see table II. What is more, the proposed method seems to be more robust than the log-rank test for large values of κ, based on the simulations above.

V. CONCLUSION REMARKS
Survival curves could be compared by the log-rank test when they are only two or by the Cox proportional hazard model if there are more than two curves. However, both methods are limited by statistical assumptions.
We introduced a novel, assumption-free method for survival curves comparison based on a random forest algorithm. Firstly, it requires deriving new variables using the point estimates of modified (personalized) probabilities of non-experiencing the event of interest across all time points. Using those variables as node splitting ones, the random forest model can be built. A subtraction between 1 and a proportion of trees with sufficient complexity, capable of classifying into two or more classes, i. e. groups determined by their survival curves, to all trees of the forest, is a point estimate of p-value of the proposed method. Parameter κ determines the random forest trees' complexity, and, thus, by increasing the parameter, the first type error rate decreases, and robustness of the method increases, as was also illustrated within the simulation study. The asymptotic time complexity of the random forest-based method is higher than the one for the log-rank test but still polynomial and could be parallelized, too. The random forest-based method seems to overcome the risk of violations of statistical assumptions of the traditional techniques comparing survival curves and, furthermore, could compare more than two survival curves. Eventually, the method and its computational optimization could also inspire a new R package development.