Sensitivity Study of a Large-Scale Air Pollution Model on the Bulgarian Petascale Supercomputer Discoverer

The focus of this study is on the optimal use of high performance computing in the area of environmental security (air pollution transport, in particular). Contemporary mathematical models of air pollution transport should include a fairly large set of chemical and photochemical reactions to be established as a reliable simulation tool. The investigations and the numerical results reported in this paper have been obtained by using a large-scale mathematical model called the Danish Eulerian Model (DEM). For optimization of some applications of the Danish Eulerian Model in various important scientific, social and economic areas, it is of great importance to simplify the model as much as possible, preserving the high reliability of its output results. A careful sensitivity analysis is needed in order to decide how to do such simplifications. On the other hand, it is important to analyze the influence of variations of the initial conditions, the boundary conditions, the rates of some chemical reactions, etc. on the model results in order to make right assumptions about the possible simplifications, which could be done. The sensitivity analysis version of the Danish Eulerian Model was created for these purposes. Its complexity is of higher order, a real challenge for the top performance supercomputers nowadays. The sensitivity analysis version of DEM (SA-DEM) has been implemented on the new Bulgarian petascale supercomputer DISCOVERER. It is a part of the European High Performance Computing Joint Undertaking (EuroHPC), which is building a network of 8 powerful supercomputers across the European Union (3 pre-exascale and 5 petascale). The results of some scalability experiments with SA-DEM on the new Bulgarian petascale supercomputer DISCOVERER are presented here. They are compared with similar experiments performed on the Mare Nostrum III supercomputer at Barcelona Supercomputing Centre – the most powerful supercomputer in Spain by that time, upgraded currently to the pre-exascale Mare Nostrum V, also part of the EuroHPC JU infrastructure.


I. INTRODUCTION
Environmental security is rapidly becoming a significant topic of present interest all over the world.It is necessary to carry out many comprehensive scientific studies and to analyze carefully the most important physical and chemical processes during the transport, and transformations under the transport of air pollutants.An effective performance of such complicated procedures requires a joined research and collaboration between experts in the field of environmental modeling, numerical analysis and scientific computing.
The aim of the present work is to propose a new mechanism for investigation the sensitivity of the calculated concentration levels of important pollutants (like nitrogen dioxide N O 2 and especially ozone O 3 ) due to variation of rates of the involved chemical reactions in a real-life scenario of air pollution transport over Europe with the Unified Danish Eulerian Model (UNI-DEM).
In investigation of various highly complex engineering, physical, environmental, social, and economic systems it is important to measure relations that describe the effect on the output results when the conditions for the input change.
Sensitivity analysis (SA) is the study of how uncertainty in the output of a model can be apportioned to different sources of uncertainty in the model input [20].Two classes in sensitivity analysis are considered in the existing literature: local SA and global SA.Local SA studies how some small variations of inputs around a given value change the value of the output.Global SA takes into account all the variation range of the inputs, and apportions the output uncertainty to the uncertainty in the input factors.
In general, several sensitivity analysis techniques are available [20].Most existing methods for providing SA rely heavily on special assumptions connected to the behavior of the model (such as linearity, monotonicity and additivity of the relationship between input factor and model output).Among quantitative methods, variance-based methods are the most often used [19].The main idea of these methods is to evaluate how the variance of an input or a group of inputs contributes into the variance of model output.
Computational tasks arising in the treatment of large-scale air pollution models are enormous.It is highly desirable to simplify as much as possible the model, keeping high level of reliability of models' results.Sensitivity analysis is rather helpful in order to decide where and how simplifications can be made.On the other hand, it is important to analyze the influence of variations of the initial conditions, the boundary conditions and/or the chemical rates on the model results in order to make right assumptions about the simplifications which have to be implemented.Such an analysis can give valuable information about the performance of reliable and reasonable simplifications or to identify parameters and mechanisms the accuracy of which should be improved, because the model results are very sensitive to variations of these parameters and mechanisms.Thus, the goal could be: (i) improving the model, (ii) increasing the reliability of the results, and (iii) identifying processes that must be studied more carefully.The rest of the paper is organized as follows: In Section II the general concept of global sensitivity analysis is introduced in terms of ANOVA high-dimensional model representation.In Section III the Danish Eulerian Model is described, including its high-performance parallel code UNI-DEM and its special sensitivity analysis version SA-DEM.In Section IV numerical results from some scalability experiments with SA-DEM on two of the most powerful supercomputers in Europe are given.Finaly, some conclusins are drawn.

A. Global Sensitivity Indices
When the sensitivity of the concentrations calculated by UNI-DEM (or any other deterministic mathematical model) is studied, it is convenient to introduce some stochastic variables and equations.
It is assumed that the mathematical model can be presented as a model function is a vector of input parameters with a joint probability density function (p.d.f.) p(x) = p(x 1 , . . ., x d ).In general, real problems are characterized by multiple outputs.Here it is assumed that a scalar output is given.It is also assumed that input variables are independent (non-correlated input variables) and the density function p(x) = p(x 1 , x 2 , . . ., x d ) is known, even if x i are not actually random variables.This implies that the output u is also a random variable, as it is a function of the random vector x, with its own p.d.f.It is reasonable to introduce an indicator that measures the importance of the influence of a given input parameter onto the output.The main indicator referred to a given input parameter x i , i = 1, . . ., d (normalised between 0 and 1) is defined as where D[E[u|x i ]] is the variance of the conditional expectation of u with respect to x i and D u is the total variance according to u.This indicator is named first-order sensitivity index by Sobol' [23] or correlation ratio by McKay [13].A brief review of measures of importance used in variance-based methods for sensitivity analysis is given in [3].The total sensitivity index [10] provides a measure of the total effect of a given parameter, including all the possible joint terms between that parameter and all the others.The total sensitivity index (TSI) of input parameter x i , i ∈ {1, . . ., d} is defined in the following way [10], [23]: where S i is called the main effect (first-order sensitivity index) of x i and S il1...lj−1 is the j -th order sensitivity index (respectively two-way interactions for j = 2, three-way interactions for j = 3 and so on) for parameter x i (2 ≤ j ≤ d).The higher-order terms describe the interaction effects between the unknown input parameters x i1 , . . ., x iν , ν ∈ {2, . . ., d} on the output variance.Usually for practical computations the set of input parameters is classified according their TSI [3]: xi < 0.5, and irrelevant if S tot xi < 0.3.In subsection II-B we will show how sensitivity indices S l1 ... lν are defined via the variances of conditional expectations 8)).It is often reasonable to assume (see [12], [17]) that relatively small subsets of input variables in highdimensional models have the main impact on the output.The high dimensional sums can be neglected when many practical problems are studied.This means that one can use low-order indices preferably, but should be able to control the contribution of higher-order terms.

B. The Sobol' Approach
The Sobol' method is one of the most often used variancebased methods.To our best knowledge the Sobol' sensitivity measure [23] was first published in [22].An important advantage of this method is that it allows to compute not only the first-order indices, but also indices of a higher-order in a way similar to the computation of the main effects.The total sensitivity index can be calculated with just one Monte Carlo integral per factor.
The method for global SA applied here is based on a decomposition of an integrable model function f in the ddimensional factor space into terms of increasing dimensionality: where f 0 is a constant.The total number of summands in equation ( 4) is 2 d (see [25]) and, in general, this so called high dimensional model representation [23] is non-unique.But, if each term is chosen to satisfy the following condition then ( 4) is unique.The representation ( 4) is called ANOVArepresentation of the model function f (x) [24].Here and 1094 PROCEEDINGS OF THE FEDCSIS.WARSAW, POLAND, 2023 hereafter the variables of integration are markeed by a dot below in the integration formulae.
The functional decomposition of [0; 1] d ANOVA (meaning: ANalysis Of VAriance) -representation has been studied by many authors [2], [9], [21], [26].Sobol' has proven [22] that the decomposition ( 4) is unique on the assumption (5) and the functions of the right-hand side can be defined in a unique way by multidimensional integrals [24]: An additional essential property of the terms in the ANOVArepresentation is their mutual orthogonality: It follows from the assumption that the above subsets of indices differ from one another at least one element and the corresponding integral vanishes for this index due to (5).
The quantities are called variances (total and partial variances, respectively) and have been obtained after squaring and integrating over U d the equality (4) on the assumption that f (x) is a square integrable function (thus all terms in (4) are also square integrable functions).Therefore, the total variance of the model output is partitioned into partial variances [22] in the analogous way as the model function, that is the unique ANOVA-decomposition: It is obvious that the use of terms of probability theory is based on the following interpretation: in general, the input parameters are random variables distributed in U d that defines f l1 ... lν (x l1 , x l2 , . . ., x lν ) also as random variables with variances (6).For example f l1 is presented by a conditional expectation: and respectively Based on the above assumptions about the model function and the output variance, the following quantities are called Sobol' global sensitivity indices [22], [24].This formula coincides for ν = 1 with (2) and the so defined measures correspond to the main effect of input parameters as well as the interactions effect.Using the definition of these measures as ratios of variances and dividing ( 7) by D, it is easy to show that the following properties hold for the Sobol' global sensitivity indices: S l1 ... lν ≥ 0, and Based on the results discussed above it is clear that the mathematical treatment of the problem of providing global sensitivity analysis consists in evaluating total sensitivity indices (3) and in particular Sobol' global sensitivity indices (8) of corresponding order.And that leads to computing of multidimensional integrals: where g(x) is a square integrable function in Ω and p(x) ≥ 0 is a probability density function, such that Ω p(x) x .= 1.This means that in general case one needs to compute 2 d integrals of type ( 6) to obtain S tot xi .As we discussed earlier the basic assumption underlying representation ( 4) is that the basic features of the model functions (1) describing typical real-life problems can be presented by low-order subsets of input variables [12], [17], that are constants, terms of first and second order.Thus, the high-dimensional sums (referred to higher-order interactions effects) in (4) can normally be neglected.Therefore, based on this assumption, one can assume that the dimension of the initial problem can be reduced.
Nevertheless, the calculating of the integrals defined by formulas (6) requires integration of different integrands that is not effective according to the computational cost.The procedure for computing global sensitivity indices measuring effect (main or otherwise) of the input parameters that is overcoming this disadvantage has been proposed by Sobol' [24].Consider an arbitrary set of m variables (1 ≤ m ≤ d−1): y = (x k1 , . . ., x km ), 1 ≤ k 1 < . . .< k m ≤ d, and let z be the set of d − m complementary variables.Thus x = (y, z).
The variances corresponding to the subsets y and z can be defined as where the complement of the subset K in the set of all parameter indices is denoted by K.The first sum in (10) is extended over all subsets (i 1 , . . ., i n ), where all indices i 1 , . . ., i n belong to K. Then the total variance corresponding to the subset y is D tot y = D − D z and it is extended over all subsets (i 1 , . . ., i ν ), 1 ≤ ν ≤ d, where at least one The f (x) f (y, z )x .z .− f 2 0 (see [24]).The last equality allows to construct a Monte Carlo algorithm for evaluating f 0 , D and D y , where ξ = (η, ζ): For example, for m = 1, y = {x l1 }, l 1 ∈ {1, . . ., d} and z = {1, . . ., d}\l 1 : It is important to estimate the computational cost for computing the sensitivity indices in order to be able to compare this approach with other existing approaches.The computational cost of estimating all first-order (m = 1) and total sensitivity indices via the scheme proposed by Sobol' can be defined as N (2d + 1) model function evaluations (N model runs for f 0 , dN model runs for the first-order terms, and dN model runs for the total effect terms), where N is the sample size and d is the number of input parameters.It should be noted that the most frequently used variance-based methods as Sobol' method and FAST (Fourier Amplitude Sensitivity Test) (and their improved versions) have a computational cost proportional to dN of estimating all main and total effects of input parameters (see [18]).
The computing of higher-order interactions effect can be performed by an iterative process.For example, and S l1l2 can be obtained assuming that the corresponding first-order sensitivity indices have already been computed.

III. DESCRIPTION AND PARALLEL IMPLEMENTATIONS OF
THE DANISH EULERIAN MODEL (DEM) DEM is a powerful large scale air pollution model, with more than 30-year development history [28], [15], [16], [29].Over the years it was successfully applied in different longterm environmental studies in various areas.processes in the atmosphere should be taken into account, which are mathematically represented by a complex PDE system.To simplify it a proper splitting procedure is applied.As a result the initial system is replaced by several simpler systems (submodels), connected with the main physical and chemical processes.These systems should be calculated in a large spatial domain, as the pollutants migrate quickly on long distances, driven by the atmosphere dynamics, especially on high altitude.Here they are exposed to temperature, light and other condition changes in extremely wide range, so does the speed of most chemical reactions.One of the major sources of difficulty is the dynamics of the atmospheric processes, which require small time-step to be used (at least, for the chemistry submodel) in order to get a stable numerical solution of the corresponding system.All this makes the treatment of large-scale air pollution models a tuff and heavy computational task.It has always been a serious challenge, even for the fastest and most powerful state-of-the-art supercomputers.[7], [29].
The Danish Eulerian Model (DEM) [27], [28] is mathematically represented by the following system of partial differential equations: where the following notation is used: q -number of equations = number of chemical species, c s -concentrations of the chemical species considered, u, v, w -components of the wind along the coordinate axes, K x , K y , K z -diffusion coefficients, E s -emissions in the space domain, k 1s , k 2s -coefficients of dry and wet deposition respectively (s = 1, . . ., q), Q s (c 1 , c 2 , . . ., c q ) -non-linear functions that describe the chemical reactions between the species.

A. Splitting into submodels and domain decomposition
The above rather complex system is split into three subsystems (submodels), according to the major physical and chemical processes as well as the numerical methods applied in their solution (marked by different colors in the right-handside of the system).These are (i) the horizontal advection and diffusion, (ii) chemistry, emissions and deposition and (iii) vertical exchange submodels, respectively.The discretization of the spatial derivatives in the right-hand-sides of these submodels results in forming three large systems of ordinary differential equations.
Chemical reactions play a significant role in the model.Moreover, both non-linearity and stiffness of the equations are mainly introduced by the chemistry (see [30]).On the other hand, this is one of the models of atmospheric chemistry, where the chemical processes are described with great detail in a very accurate way.The chemical scheme used in the model is the well-known condensed CBM-IV (Carbon Bond Mechanism; the scheme was proposed in [8], but some enhancements have been obtained in [28]  Another crucial point on the way towards efficient numerical solution of the sub-models is the domain decomposition technique.This is a natural way to achieve distributed memory parallelization of any numerical problem over a large spatial domain.In some cases, however, like the advection-diffusion equations in particular, there is always certain overhead due to the boundary conditions treatment.Minimizing this overhead is a key point towards efficient optimization.On the other hand, optimization should not restrict the portability of the parallel implementation, as the intensive development in the computer technology inevitably leads to regular updates or complete replacement of the outdated hardware.Standard parallel programming tools as MPI and OpenMP (for distributed / shared memory models) are used in order to preserve portability of the code.An important issue towards efficient parallel optimization is also the load-balance.Sometimes the MPI barriers, used to force synchronization between the processes in data transfer commands, do not allow good loadbalance.This obstacle can be avoided to some extent by using non-blocking communication routines from the MPI standard library.
More details about the numerical methods, applied to solve these systems, can be found in [1], [11], [28].

B. Parallelization strategy
The MPI standard library is used as a main parallelization tool.The MPI (Message Passing Interface) was initially developed as a standard communication library for distributed memory computers.Later, proving to be efficient, portable and easy to use, it became one of the most popular parallelization tools for application programming.Now it can be used on much wider class of parallel systems, including shared-memory computers and clustered systems (each node of the cluster being a separate shared-memory machine).Thus it provides high level of portability of the code.
In the case of DEM, MPI parallelization is based on the space domain partitioning [15], [16].The space domain is divided into sub-domains (the number of the sub-domains is equal to the number of MPI tasks).Each MPI task works on its own sub-domain.On each time step there is no data dependency between the MPI tasks on both the chemistry and the vertical exchange stages.This is not so with the advection-diffusion stage.Spatial grid partitioning between the MPI tasks requires overlapping of the inner boundaries and exchange of certain boundary values on the neighboring subgrids for proper treatment of the boundary conditions.The subdomains are usually too large to fit into the fastest cache memory of the corresponding CPU.In order to achieve good data locality, the smaller calculation tasks are grouped in chunks (if appropriate) for more efficient cache utilization.An input parameter CHUNKSIZE is provided, which controls the amount of short-term reusable data in order to reduce the transfer between the cache and the main (slower access) memory.It should be tuned with respect to the cache size of the target machine.
More detailed description of the main computational stages of DEM and the parallelization techniques used in each of them can be found in [1], [4], [15], [16], [28], [29], [30].V. CONCLUSIONS Sensitivity analysis and particularly the results, reported in this work, have an important twofold role: for mathematical models verification and/or improvement, and/or on the other hand, for a reliable interpretation of experts of main effect, interaction and higher-order interaction effect of input parameters on model output.Variance-based analysis is an useful tool for an advanced investigation of relationships between model parameters, output results and internal mechanisms regulating the system under consideration.Specifying the most important chemical reactions for the model output the specialists from various applied fields (chemistry, physics) may obtain valuable information for an improvement of the model and thus it will lead to an increase of reliability and robustness of predictions.

IV. NUMERICAL EXPERIMENTS
The results of numerical experiments performed show that: • The parallel MPI implementation of SA-DEM is well balanced, portable and runs efficiently on some of the most powerful supercomputers in Europe, including the Bulgarian Petascale supercomputer Discoverer, part of the EuroHPC JU network.• The efficiency and speed-up is higher in the computationally-intensive stages.In particular, the chemistry stage (which does not need any communication between the tasks) has almost linear overall speed-up.The advection stage scales pretty well too, taking into account that there is some unavoidable computational overhead due to overlapping boundaries of the partitioning.• The time for the computationally-intensive stages is additionally reduced in relation with the number of threads in the hybrid MPI-OpenMP code with the OpenMP lower level of parallelism switched on, which can be exploited on core level within a node.• Further attention should be payed on the optimization of the I/O processes in order to reduce the slowdown of the execution on large number of nodes.
procedure for computation of global sensitivity indices is based on the following representation of the variance D y = TZVETAN OSTROMSKY ETAL.: SENSITIVITY STUDY OF A LARGE-SCALE AIR POLLUTION MODEL ON THE BULGARIAN PETASCALE SUPERCOMPUTER DISCOVERER 109Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
by adding several 1096 PROCEEDINGS OF THE FEDCSIS.WARSAW, POLAND, 2023 reactions for handling the ammonia-ammonium transformations in the atmosphere).It includes 35 pollutants and 116 chemical reactions, where 69 are time dependent and the rest 47 are time-independent.The scheme is suitable and adequate to study cases of high concentrations of chemical species.
WITH SA-DEM ON THE PETASCALE SUPERCOMPUTER IBM MARENOSTRUM III IN BARCELONA, SPAIN AND DISCOVERER EUROHPC IN SOFIA, BULGARIA Results of scalability experiments with the 2-D fineresolution grid version of SA-DEM on two of the most powerful supercomputers in Europe are shown in Tables I and II in this section.Some values of the user-defined parameters of SA-DEM used in the experiments on both machines are as follows: TZVETAN OSTROMSKY ETAL.: SENSITIVITY STUDY OF A LARGE-SCALE AIR POLLUTION MODEL ON THE BULGARIAN PETASCALE SUPERCOMPUTER DISCOVERER 109Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I TIME
(T) IN SECONDS, SPEED-UP (SP) AND PARALLEL EFFICIENCY (E) OF SA-DEM (FINEST GRID) ON THE SPANISH SUPERCOMPUTER IBM MARENOSTRUM III AT BSC, BARCELONA IN SECONDS, SPEED-UP (Sp) AND THE TOTAL EFFICIENCY (E) OF SA-DEM ON THE EUROHPC JU SUPERCOMPUTER DISCOVERER IN SOFIA, BULGARIA -Number of metadata modules: 11 SSD; -Data module details: Net capacity provided (PB): 2.03 PB, Performance provided: 20 GB/s, Number and type of storage elements: 164 HDD + 11 SSD, Size per storage element: 6 TB + 1.92 TB, CPU Cores per server: 10, Main memory per server: 150 GB, Memory type and frequency: DDR4 2666 MT/s, Number and bandwidth interfaces to control data network: 4 x GigE RJ45 for OS access and hardware management, Number and bandwidth interfaces to bulk data network (RDMA): 4x HDR100 IB / 100GbE ports (same ports as for metadata).