Optimum Large Sensor Data Filtering, Networking and Computing

—In this paper we consider ﬁltering and processing large data streams in intelligent data acquisition systems. It is assumed that raw data arrives in discrete events from a single expensive sensor. Not all raw data, however, comprises records of interesting events and hence some part of the input must be ﬁltered out. The intensity of ﬁltering is an important design choice because it determines the complexity of ﬁltering hardware and software and the amount of data that must be transferred to the following processing stages for further analysis. This, in turn, dictates needs for the following stages communication and computational capacity. In this paper we analyze the optimum intensity of ﬁltering and its relationship with the capacity of the following processing stages. A set of generic ﬁltering intensity, data transfer, and processing archetypes are modeled and evaluated.


I. INTRODUCTION
S ENSOR technology has developed rapidly over the past decades.Numerous surveys can be found, each limited to a specific sensor technology type.An important research topic is the computational and communication integration of multiple sensors in a network [1], [2], [3], [4], [5].There is an implicit assumption here that the economics of constructing, deploying and operating such a system are such that the use of multiple sensors is plausible within project budget constraints.
In this paper we envision a problem with a different emphasis where there is a single, very expensive "sensor".Examples of such a device include detectors for particle accelerator experiments such as the CERN Large Hadron Collider [6], or the recently announced Electron Ion Collider [7] to be built at Brookhaven National Laboratory.Another example is the use of a large radar equipped drone for monitoring ocean traffic.A final example is the use of imaging satellites for ocean, weather, environmental monitoring and earth resources sensing [8].Note that what we generically consider a "sensor", may consist of a large number of individual sensing elements (as in a particle detector) but we refer to the ensemble collection as a sensor.
The commonality in all these examples is that the sensor increasingly can generate raw data at rates that are much faster than the data can be downloaded over a communication channel(s) onto servers for processing.Moreover, not all data collected by the sensor is valuable enough to be retained for further processing.The generally proposed solution for this problem is to have onboard pre-processing of the data at the detector, the drone or the satellite.This processing is not classical data compression but more so the use of data analysis, machine learning (ML) or other type of broadly understood artificial intelligence techniques to pre-process the raw data for a much smaller processed summary that is more amenable to transmit.Thus, only interesting particle tracks, ship tracks and weather patterns may be transmitted, but the majority of the raw data isn't.For simplicity of exposition we will say that the ML algorithm serves as a filter on the raw data, independently of the field of origin of the actually applied filtering technique.Clearly this situation has similarities with edge computing where sensors/actuators at the network edge do local processing in order to reduce the amount of network traffic to distant cloud facilities.Again, the specification of a single sensor here makes our problem have a unique character.
In this paper we analyze the intensity of data filtering in the first stages of data processing necessary for the shortest time to obtain results.The intensity of filtering is important for the following reasons: (i) filtering algorithms are often hardwareimplemented, and consequently, have costly realization, their changes are less flexible than in software (especially in remote posts like satellites); (ii) the higher intensity of filtering, the more complex algorithm is applied which results in longer filtering time and/or more extensive hardware system; (iii) the size of data emerging from the filtering stage determines needs for capacity in the further stages of data processing pipeline where more sophisticated algorithms are executed; (iv) and vice versa the speed of communication between the stages of data pipeline and processing in the stages following the initial filtering have impact on the required intensity of raw data filtering.For the purpose of analyzing systems of the above nature, an extensive set of the single expensive sensor (SES) models for filtering and processing problem will be examined.These cases are meant to illustrate the modeling and solution possibilities and be representative in a generic way.A common sense expectation is that high intensity of initial data filtering reduces amounts of the data analyzed in the pipeline and thus reduces the time to obtaining the results.However, more intensive filtering is also more time-consuming.Thus, due to interaction between nonlinear speed of filtering and complexity of processing the data in  the later stages, the processing time may have a minimum at a certain filtering intensity.We investigate such minima in this paper.Furthermore, options for combining advanced processing algorithms of various complexity classes in the data processing workflow are analyzed.
Our models are largely tractable.Parts of the evaluation of the models use concepts from the theory of divisible (i.e.partitionable) loads, a well established concept, that provides elegant solutions particularly for linear models [9], [10], [11], [12], [13].The divisibility of the loads means that big volumes of data are processed and the discrete units of data are small in relation to the whole data size.It is also assumed that the loads can be divided into parts processed independently in parallel.
Further organization of this paper is the following.In the next section related literature is outlined.The filtering and parallel processing problem is formally defined in Section III.Section IV is dedicated to analytical derivation of the formulas guiding selection of the optimum filtering intensity.Results of numerical modeling of filtering and processing systems are provided in Section V.The last section is dedicated to conclusions.The notations are summarized in Table I.

II. RELATED WORK
In the literature on sensor networks, particularly wireless sensor networks, there are general surveys [14] and surveys on specific technological aspects of sensor networks such as transport and routing [15], fault detection [16], security solutions [17], optimizing sensor-source geometries and minimizing the number of sensors [18], the use of swarm intelligence for performance optimization [19] and numerous applications.
There has been some work on analytical models of sensor data generation, communication and computation.For instance, an early work is [1]

Filtering algorithm
Fig. 1.Data filtering and processing system architecture.
compression [5], [20], [21], energy minimization [22] and in the case of tree data gathering networks [2].The case when the load is processed in a pipeline fashion has been studied in [23], [24].Most work to date has involved multiple sensors, unlike the single expensive sensor paradigm of this paper.An LHC data acquisition system [6] is a good example of a single expensive sensor with data filtering and parallel processing.In LHC protons circulate in bunches and opposing beams that cross each other resulting in collisions with 40MHz frequency.Only data from particle collisions with sufficient energy and momentum are allowed to proceed from the so-called level-1 trigger to the second stage of processing (so called high-level trigger) for further reconstruction of particle trajectories and analysis.

III. PROBLEM FORMULATION
It is assumed that there is a two-stage workflow: (1) the first stage is related to sensor data filtering, (2) the second stage conducts further data processing.An overview of the system architecture is shown in Fig. 1.The data from the sensor arrive in discrete events each delivering V units of data.The arriving chunk of data is intercepted in the input buffer, and all filtering algorithms use this buffer.The same buffer is used to send the filtered data to the second stage of processing.The use of a single buffer in this model is an aggregate representation of specialized buffer architectures that may be needed in practice to handle the influx, staging and filtering of massive amounts of data.The data chunks may arrive repetitively, then it is assumed that at most V units of data arrive in a chunk once in T time units.The first stage filtering is done in linear time, but the intensity of filtering, i.e. fraction F of the initial data transferred to the second stage, is related to the speed of filtering.
In the second stage more intricate, than filtering, data processing is conducted requiring machines with substantial computing power, memory and storage.Hence, these can be dedicated data centers or cloud systems.The machines running in the second stage will be referred to as servers or processors.Many different algorithms may be executed in parallel in the second stage.For example, different algorithms discovering unrelated artifacts may be run in parallel.In such a case it is assumed that each specific data-processing algorithm receives the same data set, has its own set of processors, and all the many algorithms are executed independently of each other.The longest path in the data-processing would always go to the processor(s) running the most time-consuming algorithm.Consequently, in the following we analyze only one of the possibly many parallel paths in data-processing.Namely, the longest one.

A. First Stage Filtering Intensity and Complexity
The intensity of filtering is expressed by fraction F ∈ (0, 1] controlling the amount of produced results.The amount of results delivered from stage 1 to stage 2 is F V , where V is the size of data injected into the first stage from the sensor.This volume of data is filtered in time A 0 V .It is assumed that the intensity of filtering F and speed of filtering are interdependent.Precisely, the inverse of filtering speed is some function A 0 (F ).Depending on the application, A 0 (F ) may assume various forms.Here we list a few possibilities: Case 1: A 0 (F) = 1/F c , where c > 0 is some constant.This kind of relationship may emerge as a consequence of iterative filtering.Let i be the number of iterations that are executed on each data unit, then A 0 and F can be expressed as where f ∈ (0, 1) is the fraction of data remaining after each iteration of filtering, and f and a > 1 are some given constants determined by the filtering algorithm.This means that an iterative filtering algorithm is executed on each data unit, and extending the algorithm by each new iteration takes exponentially longer to process a data unit.This can be the case when each data unit (e.g. a picture) is rectified with increasing resolution.From equation (1), we get i = ln F ln f .From (2), ln A 0 = i ln a, and hence, ln A 0 ln f = ln F ln a. Equivalently, we have ln A 0 = ln F ln a/ ln f , and hence, A 0 = F ln a ln f .Since f ∈ (0, 1), a > 1, we have ln a ln f < 0 and A 0 (F ) = 1 F c , where c = − ln a ln f > 0. Case 2: A 0 (F) = c ln(1/F).Again all filtering iterations are executed on each data unit, each iteration takes the same time and reduces output data size f ∈ (0, 1) times.Then as in the previous case F = f i , i = ln F ln f , but since each iteration takes the same time, we have A 0 = ai, and hence we obtain Suppose the filtering is a sieve, i.e., the filtering algorithm sifts data in some buffer and with each iteration part of the data is dropped.Suppose 1 jth of the initial data is removed in iteration j.Thus, after i iterations where a = c is some constant.
The data filtering is a sieve again, and after i iterations the remaining data size is Let us observe that in all the above cases A 0 (F ) decreases with F which means that more intensive filtering (F decreases) requires longer computation (A 0 (F ) increases).

B. Inter-Stage Communication
The flow of results from stage 1 to stage 2 can be organized in a number of ways.Here we assume two alternatives: A. Sequential communication -time to transfer x i bytes to server i and x j bytes to server j is B. Parallel communication -stage 1 to server connections are mutually independent.Time to transfer x i bytes from stage 1 to server i and x j bytes to server j is equal to max{C i x i , C j x j }.

C. Computational Complexity of the Second Stage
Computational complexity of the second stage depends on the executed algorithms, necessary result integration and storage.We adopt divisible load theory assumption [10], [12], [13] that the data processed in the second stage is arbitrarily divisible and can be processed in parallel.Potential ways of parallelization are determined by a particular algorithm.Since the variety of possible second stage algorithms and ways of parallelizing them seems unlimited, we will consider a limited set of archetype algorithms as examples of typical computational complexity functions: Linear -The time to process x units of data is A i x on processor i.Typical examples are searching for patterns, compression, message digest (e.g.MD5) calculation or scoring data units (like in the LHC example).Here we assume that result collection time is negligible because the size of output data is small or the results are left on the servers and storing is included in the algorithm run time.Since result return is neglected, it can be shown that in the optimum schedule all servers must finish computations at the same moment [10], [12], [13].
Loglinear -Sorting is a typical example of an algorithm with loglinear complexity.Classic sequential sorting algorithms like heapsort, quicksort have complexity O(n log n), where n is the number of sorted items.We will assume that a parallel version of these algorithms consists in splitting the volume of data into parts, sorting the parts in parallel, and then sequentially merging the results.Thus, on m identical processors the complexity of this method would be O((n/m) log(n/m) + n log m).
Quadratic -Computing a similarity matrix can serve as an example of an algorithm with quadratic computational complexity.Its sequential complexity is O(n 2 ).In the case of parallel processing, the square area of work, e.g. a similarity matrix, may be partitioned into ℓ × ℓ squares distributed to processors, where integer ℓ is a tunable parameter of the algorithm.There are ℓ 2 tiles each of of size n 2 /ℓ 2 .Sending and processing one square takes O(n 2 /ℓ 2 ) time.If equal numbers of ℓ 2 /m tiles are assigned to each of m processors, then receiving and processing them can be executed in time O(n 2 /ℓ 2 × ℓ 2 /m) which is O(n 2 /m).Note that tile distribution may be different, that is, it may depend on the communication and computing speeds of the processors.

IV. OPTIMUM FILTERING INTENSITY
Our goal in this section is to derive close-form solutions (i.e.formulas) linking optimum filtering intensity F with other system parameters to minimize the time required to transmit and process all data.In many cases the obtained formulations are not amenable to analytic solutions.In such a situation further study is delegated to numerical modeling described in the next section.

A. Parallel Communication, Linear Second Stage
In the optimum schedule, all servers communicate and process in parallel, finishing at the same time T .Hence, we have In the above equation system all processors communicate and compute in the same interval by (3), and all work is done by (4).From (3) we get where is a constant.The schedule length is a sum of filtering, communication and processing times: Thus, in order to minimize T , we have to minimize the function We will now compute the optimum value of F for the considered functions A 0 (F ).Note that practical values of F belong to some interval [F min , F max ] ⊂ (0, 1].Thus, if the computed optimum value F * is larger than F max , we should set F = F max .Similarly, if F * < F min , then the smallest possible value of F should be chosen. 1) A 0 (F) = 1/F c , where c > 0 is a constant.Then, Thus, t(F ) is minimized when t ′ (F ) = 0, i.e., for 2) A 0 (F) = c ln(1/F), where c > 0 is a constant.We have and Hence, t(F ) is minimized when 3) A 0 (F) = c(1 − ln F), where c > 0 is a constant.Then, and The minimum value of t(F ) is obtained for Thus, t ′ (F ) does not depend on F .If c > 1/K, then t(F ) is decreasing, and the maximum possible value of F should be chosen.If c < 1/K, then t(F ) is increasing and the smallest possible F should be selected.

B. Sequential Communication, Linear Second Stage
We assume that the sensor communicates with the processors in the order of their identifiers.Hence, we have Equations ( 24) mean that communication to and computation on processor i is preformed in parallel with computation on processor i − i.It follows implicitly that processor i is started after activating processor i − 1.Hence, we obtain and Equation ( 28) has the same form as (7), and hence, the considerations from Section IV-A can be also applied in the case of sequential communication.

C. Parallel Communication, Loglinear Second Stage
In the case of loglinear complexity of the second stage, it is assumed that processing consists of three steps: parallel communication, parallel processing chunks of data, and sequential merging of the results.The latter can be executed in time F V C M log m, where F V is the amount of data that has to be collected, C M is reciprocal of merging speed (e.g. in sec/byte) which is taking into account the speed of communication between the servers providing data to merge and the merging server, log m is a factor representing time to elect the smallest value among m servers in the merging step.The former two steps (parallel communication and processing) take the same time T 1 on all servers.Hence we have: Let us define We have Recall that if ye y = x then y = W (x), where W is the Lambert function [25].Thus, we have and (30) can be written as Hence, the load chunk sizes are: The Lambert function cannot be expressed in terms of elementary functions, but T 1 can be found numerically by solving The above equation is easier to solve in homogeneous systems because all processors have the same parameters and load to process is split equally.Then, each processor receives load of size α i = F V m and equation (36) becomes: Moreover, we get from (33) and hence, The schedule length including filtering, communication and processing is We will now compute the optimum value of F for which T is minimum.
1) A 0 (F) = 1/F c , where c > 0 is a constant.Then, Hence, T (F ) is a minimum when T ′ (F ) = 0. 2) A 0 (F) = c ln(1/F), where c > 0 is a constant.We have and dealt in the same way as the previous case because (45) and and T (F ) is a minimum when T ′ (F ) = 0.
MACIEJ DROZDOWSKI ET AL.: OPTIMUM LARGE SENSOR DATA FILTERING, NETWORKING AND COMPUTING

D. Sequential Communication, Loglinear Second Stage
In this case communications and parts of processed load are linked by the system of equations: Equations ( 47) ensure that work on processor i − 1 is processed in parallel with communication to and computation on processor i.This set of nonlinear equations does not seem to have an easy analytical solution.Therefore, we will recourse to numerical methods to solve (47)-( 48) and find F for which the processing time is minimum.

E. Parallel Communication, Quadratic Second Stage
As mentioned in Section III we assume that the quadratic amount of work is shared between the m processors.This amount of work can be split into ℓ 2 work units, each of size (F V /ℓ) 2 .We will assume that ℓ is large and hence work can be sufficiently flexibly divided as in the linear case.Yet, mind that the amount of work, i.e. data to be processed, grows proportionately to (F V ) 2 .Furthermore, a homogeneous system is considered.Similarly to the linear case (Section IV-A), results are not explicitly merged.We have Equations (49) mean that communication and processing is performed in the same interval on all processors.Since the system is homogeneous, α i = ℓ 2 /m, for i = 1, . . ., m, the whole schedule length is We will now compute the optimum value of F for which T is minimum.1) A 0 (F) = 1/F c , where c > 0 is a constant.Then, Hence, T (F ) is minimum when T ′ (F ) = 0. Unfortunately, equation (52) does not seem to have an easy analytical solution for T ′ (F ) = 0 and has to be solved numerically.2) A 0 (F) = c ln(1/F), where c > 0 is a constant.We have Again, T (F ) is minimum when T ′ (F ) = 0 and 3) A 0 (F) = c(1 − ln F), where c > 0 is a constant, is dealt in the same way as the previous case because and Hence, T (F ) is minimum when F = mc−2Cℓ 2AV .

F. Sequential Communication, Quadratic Second Stage
We have a set of equations determining work distribution: Unfortunately, this set of nonlinear equations does not seem to have an easy analytical solution for F minimizing the schedule length.Therefore, we will recourse to numerical methods to solve (59)-(60) and find F for which the processing time is minimum.

V. NUMERICAL MODELING
This section is dedicated to showing tendencies in the system parameters when the filtering intensity is optimum with respect to the minimum total processing time.In cases not amenable to representation with a closed-form formula, the optimum value of F was found by use of Python method scipy.optimize.fsolve.We assume that the amount of input data is V = 1E6.We will analyze recurring patterns in performance rather than particular numbers.Therefore, only representative examples of the cases introduced in Section III-A are extensively discussed.For simplicity, only homogeneous systems are analyzed.is slow, while data transfer and processing are rather fast.In consequence, larger F (i.e.lower filtering intensity) results in a smaller schedule length T .In the remaining two presented cases, the optimum value of F is neither the minimum possible (close to 0) nor the maximum possible (1).For A = C = 5 and c = 0.2, the best value of F is 0.2, and for A = C = 2, c = 0.25 it is 0.65.Fig. 3 shows how the optimum value of retained data fraction F * depends on the number m of second stage processors, for case 1 (A 0 (F ) = 1/F c ) with A = 5 and C = 2.When m grows, parallel data transfer and processing take less time in comparison to the filtering stage.Therefore, a larger fraction of data should be retained, in order to decrease the filtering time.Naturally, the optimal filtering intensity decreases when filtering is slow, i.e., for large c.In particular, when c = 1E−1 and m ≥ 70, no filtering should take place.

A. Parallel Communication, Linear Second Stage
The total processing time resulting from filtering the optimum size of data for different values of c and m is depicted in Fig. 4. Naturally, the schedule length decreases when more processors are used.This effect is stronger when c is large.Indeed, in this case, decreasing filtering intensity (F increases,  A 0 (F ) decreases), which is possible because of using a larger number of processors, has a large impact on the filtering time.

B. Sequential Communication, Linear Second Stage
When communication is sequential, a smaller number of second stage processors can be effectively used than in the case of parallel communication.Therefore, in Fig. 5, we present the schedule lengths obtained for different values of F and network parameters, m = 10, and for filtering case 1.In general, the visible tendencies are similar to the ones in Fig. 2.However, even when A = C = 1 and c = 0.3, the optimum value of F is much smaller than 1, i.e. filtering must be more intensive than for parallel communication.The best among values analyzed here is 0.4.Indeed, sequential communication is a bottleneck, and even very costly filtering can be beneficial, because it decreases the communication time.
The optimum data fractions F * for A = 5 and C = 2 are presented in Fig. 6.The values are much smaller than in the case of parallel communication (see Fig. 3).As we already explained, intensive filtering decreases the communication time, which dominates in the schedule length.The fraction  of retained data grows with increasing m at a slower pace than in the case of parallel communication.
The optimum schedule lengths are shown in Fig. 7. Using a larger number of servers results in a shorter processing time, but the impact of changing m is smaller than in Fig. 4, because using more processors m does not decrease the communication time.Recall that in the case of linear second stage and case 4, the function T (F ) was always monotonous (Section IV-A, equations (22,23)).It can be seen in Fig. 8 that when the second stage complexity is loglinear, T (F ) is also monotonous for many choices of network parameters, but not for all of them.In particular, when A = 10, C = 1, c = 1 and C M = 0.01, the best among analyzed values of F is 0.45.

C. Parallel Communication, Loglinear Second Stage
In cases 1 and 2 of data filtering complexity, the impact of increasing the number of processors m on the optimum value of F and schedule length is similar as for linear processing complexity.The main difference is that the time of merging the results also influences the results.When C M is big, the merging stage becomes a bottleneck.Hence, more intensive filtering is required to reduce its duration and thus to obtain an optimum schedule.

D. Sequential Communication, Loglinear Second Stage
The differences between systems with sequential and parallel communication in the case of loglinear second stage are similar to those present when the second stage is linear.Since sequential communication is a bottleneck, optimum schedules are obtained by more intensive data filtering.Fig. 9 shows that even if m is large and filtering is slow, the fraction of retained load should be at most several percent in case 1 of filtering complexity.The optimum fractions obtained for cases 2, 3 and 4 are even smaller.

E. Parallel Communication, Quadratic Second Stage
When the second stage complexity is quadratic, intensive data filtering is required to obtain a short schedule by de-  creasing the duration of processing.It can be seen in Fig. 10, representing case 1 of the filtering complexity, that only when c is really large (i.e.c > 1), it may not be beneficial to decrease the data size as much as possible.In cases 2, 3 and 4 the fraction of retained data should be practically always as small possible.

F. Sequential Communication, Quadratic Second Stage
When communication is sequential and the second stage complexity is quadratic, very intensive filtering should be used, even if it is costly.For all combinations of parameter values we studied, T (F ) is an increasing function (see Fig. 11).Although the optimum fraction F * increases slightly with growing m, it stays below 0.01 for all settings we tested.Taking into account the practical limitations on F , this means that the smallest possible amount of data should be retained.

VI. CONCLUSIONS
In this paper we analyzed impact of data filtering intensity on the performance of systems with single expensive sensors.
The analysis covered two-stage systems with generic representations of filtering algorithms, communication patterns and data processing methods.It appears that due to the interaction of nonlinear complexity of filtering, transmission time and further data processing stages, there exists filtering intensity which is optimum for the overall processing performance.These optima were investigated both analytically and computationally.It appeared that the communication subsystem and the second stage algorithm complexity have a large impact on the first stage filtering intensity.The ability of certain combinations of the system designs to scale is very limited.In systems with sequential communication gains from using parallel processors in the second stage are quickly diminishing because data transfer easily becomes a bottleneck.Processing with algorithms of high complexity (loglinear, quadratic) should be delegated to even further stages of data processing workflows because they incur needs for filtering intensities which may be hard to realize.Thus, by exposing scalability issues we demonstrated in this paper that designers of workflows with data filtering and distributed processing should strive for parallel data transfers and linear processing algorithms when handling large volumes of data from the sensors.

Fig. 2
Fig. 2 presents the relationship between the retained data fraction F and the schedule length T in case 2, i.e.A 0 (F ) = c ln(1/F ), for m = 100 and several combinations of A, C and c values.The smallest value of F for which T is shown in Fig. 2 is 0.01, because F must be greater than 0. When A = C = 5 and c = 0.001, filtering is fast, while data transfer and processing in the second stage are rather slow.Hence, the smaller amount of data is retained, the shorter schedule is obtained.Contrarily, when A = C = 1 and c = 0.3, filtering

Fig. 8
Fig. 8 presents the schedule lengths T obtained for different values of F and network parameters, for parallel communication, loglinear second stage case 4. i.e.A 0 (F ) = c(1 − F ).Recall that in the case of linear second stage and case 4, the function T (F ) was always monotonous (Section IV-A, equations(22,23)).It can be seen in Fig.8that when the second stage complexity is loglinear, T (F ) is also monotonous for many choices of network parameters, but not for all of them.In particular, when A = 10, C = 1, c = 1 and C M = 0.01, the best among analyzed values of F is 0.45.In cases 1 and 2 of data filtering complexity, the impact of increasing the number of processors m on the optimum value of F and schedule length is similar as for linear processing

TABLE I SUMMARY
OF NOTATIONS αi size of load part assigned to stage 2 processor i [byte] A0 reciprocal of the first stage processing speed [e.g.