Let's Estimate all Parameters as Probabilities: Precise Estimation Using Chebyshev's Inequality, Bernoulli Distribution, and Monte Carlo Simulations

Regarding the parameter estimation task, besides the time effectiveness of the simulation, parameter estimates are required to be precise enough. Usually, the estimates are Monte Carlo-simulated using a prior estimated variability within a small sample. However, the problem with pre-estimated variability is that it can be estimated imprecisely or, even worse, underestimated, resulting in estimation bias. In this work, we address the abovementioned issue and suggest estimating all parameters as probabilities. Since the probability is not only finite but has its theoretical maximum as 1, using outcomes of Bernoulli and binomial distribution's upper-bounded variance and Chebyshev's inequality, the estimator's variability is theoretically upper-bounded within the Monte Carlo simulation and estimation process. It cannot be underestimated or estimated inaccurately; thus, its precision is ensured till a given decimal digit, with very high probability. If there is a known process that treats the parameter of interest in terms of probability, we can estimate how many iterations of the Monte Carlo simulation are needed to ensure parameter estimate on a given level of precision. Also, we analyze the asymptotic time complexity of the proposed estimation strategy and illustrate the approach on a short case study of $\pi$ constant estimation.


I. INTRODUCTION
I N THIS work, we focus on the estimation of parameters that are of a non-probabilistic fashion, e.g., simulated estimates of claim amounts in actuaries [1] or simulated numbers of patients at risk of disease recurrence [2].Typically, these parameters are hardly analytically derivable, thus estimated using Monte Carlo simulation and the following logic [3].Firstly, within an initial Monte Carlo simulation, a number of iterations (100, 1000, and so) generating the parameter estimate is run, and the parameter estimates from individual iterations are averaged, and their standard deviation is calculated.Then, applying the central limit theorem, a confidence interval for the parameter is constructed, and the Monte Carlo simulation is repeated so many times that the interval is no wider than a given precision.A problem of the abovementioned approach is in the parameter's variability estimation within the initial Monte Carlo simulation.If the variability is underestimated, the confidence interval is falsely narrower than it should be, and the precision is, in fact, lower than expected.To overcome this issue, we rather refine the simulation logic -firstly, we find a function of the parameter equal to some probability, which is then simulated using Monte Carlo simulation.Since the probability has a theoreticallybased upper bound, its variability is upper-bounded.Then, we use the properties of the Bernoulli distribution to estimate the largest possible variability of the parameter as a probability and Chebyshev's inequality to enumerate the number of iterations keeping the parameter estimate's precision.Due to Chebyshev's inequality, we do not need the assumption of the parameter estimate's normality, which makes the proposed approach more robust.

II. A TRADITIONAL APPROACH TO MONTE CARLO SIMULATION AND ESTIMATION OF PARAMETERS OF NON-PROBABILISTIC FASHION
Let us assume a parameter θ of a non-probabilistic fashion that can be estimated n times using point estimates θ1 , θ2 , . . ., θn .Then, calculation of the estimates' average and standard deviation, i.e., is feasible.To estimate parameter θ using Monte Carlo simulation on a given level of precision 1 − ε, where ε ≳ 0, reached with probability 1 − α, one needs to know a number of iterations n of the simulation [3].

A. Principles of the traditional approach to Monte Carlo simulation and estimation of parameters of non-probabilistic fashion
Adopting the mathematical notation from the previous section, typical values of ε and α are, e.g., ε = 0.001 and α = 0.05, respectively.A traditional approach to parameter θ estimation follows.
(i) Choose n 0 for initial Monte Carlo simulation that would pre-estimate parameter θ using individual estimates θ0,1 , θ0,2 , . . ., θ0,n0 .Typically, n 0 is chosen as n 0 = 100 or n 0 = 1000 or similar.(ii) Calculate an average and a standard deviation of the preestimated parameter as (iii) Applying Ljapunov's central limit theorem [4], parameter θ should lie in an interval of in (1 − α)n cases of n total cases, thus, approximately with a probability 1−α, where u 1−α/2 is the (1−α/2)-th quantile of the standard normal distribution.(iv) Number of iterations n of the main Monte Carlo simulation, outputting parameter estimates θ1 , θ2 , . . ., θn , is chosen to keep precision 1 − ε with probability 1 − α, so the confidence interval's half length from formula (1) should be less than or equal to ε, thus and, equivalently, the number of needed iterations is (v) Finally, parameter θ is estimated using θ = 1 n n i=1 θi , believed to keep precision 1 − ε with probability 1 − α.

B. Limitations of the traditional approach to Monte Carlo simulation and estimation of parameters of non-probabilistic fashion
Although the abovementioned approach works in general and is commonly applied, it can suffer from not meeting the asymptotic properties assumed by Ljapunov's central limit theorem when the confidence interval from formula (1) is constructed.This might happen particularly for low values of n or very high demands on precision, e.g., when ε < 10 −6 .On a more practical note, inspecting formula (3), if the parameter's standard deviation σ θ is underestimated by σ 0,θ , i.e., when σ 0,θ < σ θ , then also number n of iterations needed to keep imprecision ≤ ε is underestimated, which may result into imprecise, i.e., wrong (!) decimal digits staring the i-th digit behind (or before) the decimal point, where i = ⌊|log 10 (ε)|⌋, if ε < 1 (or ε > 1, respectively).

C. The asymptotic time complexity of the traditional approach to Monte Carlo simulation and estimation of parameters of non-probabilistic fashion
Obviously, if one iteration of the Monte Carlo simulation takes τ units of time, then, since the simulation is repeated two times, firstly with n 0 iterations and secondly with n ≥ 2 iterations as comes from formula (4), the total asymptotic time complexity of the procedure, Θ( †), is so, while Θ( †) is linear in n 0 and n terms, it is quadratic in σ 0,θ and 1 ε terms.

III. A PROPOSED APPROACH TO MONTE CARLO SIMULATION AND ESTIMATION OF PARAMETERS OF (NON-)PROBABILISTIC FASHION
Let us suppose a parameter θ of non-probabilistic fashion.Besides the traditional approach for θ estimation as introduced above, we may assume a link function f (•) so that f (θ) has got a dimension of probability, so, where P (•) is a probability function as comes from σ-algebra, and T is a random event or a proposition consisting of random events.If occasionally θ would be apriori a probability, then the link function f (•) is an identity, i.e., f (θ) = θ, and the approach below still works.That is why we bound the prefix non-into brackets in the section title.Thus, to estimate parameter θ of the (non-)probabilistic fashion, keeping precision 1 − ε with probability 1 − α, let us first assume a random variable X following Bernoulli distribution with an argument P (T ), i.e., f (θ) (probability of success).A sum of n independent Bernoulli trials follows the binomial distribution with arguments n (number of trials) and f (θ) (probability of success in each trial).After collecting n estimates Xi coming from the above mentioned Bernoulli distribution, we calculate 1 n n i=1 Xi to estimate parameter f (θ).The number of trials n, i.e., a number of iterations of Monte Carlo simulation, is prior estimated using Chebyshev's inequality, also considering the terms of precision, 1 − ε, and confidence probability, 1 − α.

A. Mathematical and statistical preliminaries of the proposed approach to Monte Carlo simulation and estimation
As we have seen, we need to revisit Bernoulli and binomial distribution and Chebyshev's inequality and their statistical properties.So let's start with Bernoulli and binomial distributions.

Definition 1 (Bernoulli distribution). A random variable X follows Bernoulli distribution with an argument
1, with probability p 0, with probability 1 − p.
Definition 2 (Binomial distribution).A random variable X follows binomial distribution with arguments n ∈ N and 0 ≤ p ≤ 1, if X is sum of n independent variables following Bernoulli distribution with an argument 0 ≤ p ≤ 1 (i.i.d.).Formally, we write X ∼ binomial(n, p).
Lemma 2 (Binomial distribution's expected value and variance).Let random variable X follow binomial distribution with arguments n and p.Then expected value of X is and also Lemma 3 (Binomial distribution's maximum variance).Let random variable X follow binomial distribution with arguments n and p.Then maximum possible variance of X is var(X) = n 4 .Proof.According to lemma 2 and formula (7) Thus, generally var(X) ≤ n 4 , and var(X) = n 4 = n 1  4 − 0 2 for δ = 0, so if and only if p = 1 2 +δ = 1 2 +0 = 1 2 .
Finally, let's revisit Markov's and Chebyshev's inequalities [5], that enables us to derive the number of needed iterations of Monte Carlo simulation.
Theorem 1 (Markov's inequality).Let X be a non-negative random variable with expected value E(X).For a > 0 is Proof.Surely, since a > 0 and and, finally, Theorem 2 (Chebyshev's inequality).Let X be a random variable with expected value E(X), and non-zero and finite variance 0 < var(X) < ∞.For b > 0 is Proof.If we realize that var(X) = E (X − E(X)) 2 ( †) and formally put X ≡ (X −E(X)) 2 and a ≡ b 2 into formula (9) of Markov's inequality, we directly get Chebyshev's inequality,

B. Number of needed iterations of Monte Carlo simulation for parameter estimation keeping the estimate's given precision
Let's assume a random variable X following Bernoulli distribution with an argument P (T ), i.e., X ∼ Bernoulli(P (T )).Thus, probability of a success in each Bernoulli trial is P (T ) = f (θ).If we repeat Bernoulli trials n times, based on definition 1, we can get a random variable n i=1 X i , where We can simplify the right-hand side using lemma 3, i.e., var ( let's set b ≡ nε, then we get and by setting the probability's uncertainty as 1 4nε 2 ≤ α is Formula (11) tells us that a probability of getting a distance between the parameter f (θ) and its estimate X greater than ε, is lower than α.Thus, to keep imprecision ≤ ϵ, i.e., to keep 1 4nε 2 ≤ α, we need n iterations of the Monte Carlo simulation, and unlike (3), formula (12) does not include a stochastic term.

C. A scheme of the proposed approach to Monte Carlo simulation and estimation of parameters
The previous paragraphs and particularly formulas (11) and ( 12) suggest Monte Carlo simulation for not only probabilistic-like parameters, keeping non-underestimated precision, that consists of the following steps.
(i) Setting the tuning parameters of the simulation -precision 1 − ε and probability 1 − α. (ii) Assuming formula (5), constructing a generative Bernoulli process X ∼ Bernoulli(P (T )).We want to estimate parameter θ's value since we don't know it using link function f and a different random process, known from theory, with outcome P (T ) where P (T ) = f (θ).(iii) Repeating Bernoulli process n times, where n ≥ 1 4αε 2 , and collecting the outcomes X1 , X2 , . . ., Xn .
(iv) Finally, averaging the outcomes, X = 1 n n i=1 Xi , by getting estimate f (θ) on precision level 1 − ε with probability An algorithm for the proposed Monte Carlo simulation is in Algorithm 1.
Algorithm 1: The proposed approach to Monte Carlo simulation and estimation of parameters of not only probabilistic fashion Data: generative Bernoulli process with probability of success P (T ), link function f ensuring that // a vector for; 2 // estimates saving; // # of iterations;

D. The asymptotic time complexity of the proposed approach to Monte Carlo simulation and estimation of parameters
The simulation is repeated n times where n ≥ 1 4αε 2 , as comes from formula (12).Assuming one iteration of the Monte Carlo simulation takes τ time units, the total asymptotic time complexity of the procedure, Θ( ‡), is so, while Θ( ‡) is linear in n and 1 α terms, it is quadratic in 1 ε term.To compare asymptotic time complexity Θ( †) from formula (4) for the traditional estimation procedure and Θ( ‡) from formula (13) for the proposed one, assuming that α and u 1−α/2 are monotonous and decreasing while α increases, but for α ≤ 0.05 is u 1−α/2 ≳ 2 while 1 α ≥ 20.So, while the traditional approach is "faster" in terms of time complexity, it may suffer from false underestimating of the parameter estimate's variability.

E. Keeping the first k decimal digits precise in the proposed approach to Monte Carlo simulation and estimation
Due to avoiding the issue of variability coming from lemma 3 and Chebyshev's inequality (10), an appropriate setting of precision level 1 − ε can ensure the first k decimal digits are correctly estimated within the proposed simulation and estimation approach.Inspecting formula (11), we can realize that X − ε ≤ f (θ) ≤ X + ε with probability 1 − α.On , and we can estimate also the real imprecision level ε θ for parameter θ, i.e., not only f (θ), as IV. THE PROPOSED APPROACH TO MONTE CARLO SIMULATION AND ESTIMATION APPLIED: π ESTIMATION Revisiting the well-known example of π constant estimation using Monte Carlo simulation, let us assume a quarter circle with a radius of 1 as in Fig. 1.For a random point A = [x, y] in the unit square around the quarter circle, where [x, y] ∈ ⟨0, 1⟩ 2 , the generative Bernoulli process X ∼ Bernoulli(P (T )) here returns number 1 if A lies in the quarter circle (in gray color in Fig. 1), otherwise it returns 0. Thus, the random event is T = {A ∈ quarter circle | A ∈ unit square} and P (T ) = So, the Bernoulli process enables us to estimate f (θ) = π 4 , which implies the link function f as f (η) = η 4 .Both for traditional and the proposed approach, we repeated Monte Carlo simulation m = 100 times to evaluate how likely the k-th decimal digit is not correct, with k ∈ {1, 2, 3}.We set probability level α = 0.05 and real imprecision The number of simulation iterations was estimated using formulas (3) and (12).The initial number of iterations for the traditional approach needed for pre-estimating the estimate's standard deviation σ 0,θ , was n 0 = 100.We used R programming language and environment [6] for the Monte-Carlo simulations.There are more numerical applications of R language to various fields in [7]- [9].
Results are in Table I.While the traditional approach did not always ensure the precise k-th digit, particularly (but rarely, in ≤ α = 0.05 = 5 % of all cases) for k = 2 and k = 3, the proposed approach kept the k-th digit's precision every time.
Unlike the proposed method not considering a stochastic term (see formula (12)), the traditional one may suffer from a possible underestimate of initial estimate's variability σ 0,θ and needed number n of Monte Carlo iterations (see formula (3)).

Fig. 1 .
Fig. 1.A quarter circle in a unit square enabling estimation of π 4 parameter using Monte Carlo simulation of many points such as A = [x, y] ∈ ⟨0, 1⟩ 2 .

TABLE I PROPORTIONS
OF CASES WHEN THE k-TH DIGIT WAS INCORRECT OUT OF m = 100 REPETITIONS (MARKED AS r) OF MONTE CARLO SIMULATION..CONCLUSIONS REMARKSWe introduced an alternative approach to Monte Carlo estimation, using refining all estimated parameters as probabilities.That enables us to apply Bernoulli trials with upperbounded variability of the estimate and Chebyshev's inequality for a robust estimate of the number of iterations needed to ensure the estimate's precision on a given probability level.VI.ACKNOWLEDGMENTThis research is supported by grant IG410023 IGA no.50/2023 provided by Internal Grant Agency of Prague University of Economics and Business. V