Identifying Hidden Influences of Traffic Incidents' Effect in Smart Cities

The road network of big cities is a complex and hardly analyzable system in which the accurate quantification of interactions between nonadjacent road segments is a serious challenge. In this paper we would like to present a novel method able to determine the effects (the time delay and the level of the correlation) of distinct road segments on each other of a smart city's road network. To reveal these relationships, we are investigating unexpected events such as traffic jams or accidents. This novel analysis can give a significant insight to improve the operation of currently widespread traffic prediction algorithms.


I. INTRODUCTION
N OWADAYS smart city services are becoming more widespread than ever as cities are growing and becoming more and more crowded as a result of urbanization and growth of the world population.The rapid progress of urbanization improved life of many people, but also brought remarkable challenges, like traffic congestions that can lead to increased energy/fuel consumption and enormous emission of pollutants [1].These phenomena heavily and directly impact the health, the life quality and expectancy of city dwellers.According to [2], laboratory studies indicated that transport-related air pollution may increase the risk of developing allergies and can exacerbate symptoms, particularly in susceptible subgroups, while [3] showed that traffic jams increase the risk of heart attacks.
Intelligent management systems, such as Advanced Traffic Management System (ATMS) and Intelligent Transportation System (ITS) can help overcome or significantly reduce the impact of such negative effects on city dwellers.Forecasts of these systems can support traffic control centers in managing the road network and allocating resources systematically, for example opening/closing lanes, pricing dynamically parking places or adapting the traffic lights to the current traffic trends.
In vehicle navigation the knowledge of traffic forecasts for different routes during the route planning is advantageous, as the devices will be able to calculate more efficient routes and reduce travel time.Insight into vehicular flows of smart cities could make searching for parking spaces much easier and faster.It could also provide added value for emerging V2X based traffic control systems, which can play an important role in route planning of self-driving cars.
In the literature, there are numerous prediction models utilized for traffic flow prediction, however the road network of big cities is a complex and hardly analyzable system in which unexpected events could significantly decrease the correctness of the result of the prediction models.
Every predicted value is composed of a predictable component and an error [4], which includes both prediction error and unpredictability of uncertainty.Thus the predictable value is derived from the deterministic part and the predictable part of uncertainty.Therefore it follows, that the predictability of the traffic flow depends on whether the model is able to predict the uncertainty part or not.Fortunately, lots of prediction models can be prepared for handling the uncertainty part by integrating different external data sources.The uncertainty is influenced by many factors, like weather condition, mass events, road constructions, road closures, accidents etc.By considering these external environmental factors, the error of prediction models can be decreased.
In this paper, a new method will be presented, which aims to reduce the previously mentioned uncertainties.The algorithm focuses on unexpected events, such as traffic accidents, which can have a negative impact on the traffic prediction.By investigating the effect of these phenomena, the algorithm is able to: • identify neighboring road segments that could be affected by the event • to determine the level of correlation between the road segment affected by the accident and its neigboring road segments, • to quantify the time delay: the time needed for the effects of the accident to propagate to neighboring road segments.By fusing this information with real-time traffic information, the prediction models will be able to provide more accurate predictions even in the case of an unexpected event happening nearby.
The paper is organized as follows.In Section II., various prediction techniques are introduced aimed at reducing the error of prediction models.This section also contains a short summary of usable data sources.In Section III, the algorithm is presented in detail, which is followed by a case study simulation in Section IV.Section V concludes the paper and points out the current weaknesses and future improvements of the algorithm.1a, the original road network is depicted with control points, while Figure 1b shows the road network's graph representation.

II. TRAFFIC PREDICTION METHODS IN SMART CITIES A. Prediction Techniques
There are several proposed prediction models [5], which perform well for regular conditions, however an unexpected event could significantly decrease the quality of their traffic forecasts.The reason is that the predictability of the traffic flow depends on whether the model is able to predict the uncertainty part of the traffic flow with the required precision.The uncertainty is influenced by many factors, like weather, events, road constructions, lighting conditions etc. Incorporating external environmental factors and fused data [6] to the model is crucial to decrease the error of the prediction and increase the predictable part of uncertainty.
Traffic is predictable in the sense that it does not vary significantly during weekdays and during most months of the year [7].In [8] similar results were obtained as well as they have found a relatively high daily predictability of traffic conditions despite the absence of any apriori knowledge of drivers' origins and destinations and the quite different travel patterns between weekdays and weekends.
A neurowavelet prediction algorithm was proposed [9] to forecast the hourly traffic flow considering the effect of rainfall.In the article, the authors use a stationary wavelet transform to reveal correlation between different weather conditions and changes in the traffic flow.An examination was carried out [10] whether or not road usage on a particular location determines the impacts of various weather conditions.The study showed that the precipitation, cloudiness and wind speed reduce traffic intensity, while high temperatures and hail significantly increase traffic intensity.
Other papers contentrated on the spatio-temporal property of the traffic flow, by using different Autoregressive Integrated Moving Average (ARIMA) model variants [11]- [13], applying K-Nearest Neighbors (KNN) models [14], [15] while others employed Convolutional Neural Network (CNN) for this pur-pose [16]- [18].Deep learning based prediction model was also presented for spatio-temporal data [17].The prediction model uses spatial and temporal relations and integrates global information (such as day of week, meteorological conditions, etc.) to decrease the uncertainty.
Relation between traffic predictability and prediction time horizon was investigated [4] by examining spatio-temporal traffic relationship using Cross-correlation function (CFF).The time lag calculated by CFF can be used to determine prediction time horizon, and the cross-correlation coefficient can be utilized to identify the spatial relations that can be used in prediction.
Solutions enumerated in this section aim to handle the previously listed uncertainties of traffic flows.However, we have not found significant work aimed at increasing the predictability of traffic flow through the investigation of unexpected events like accidents or traffic congestions of uncontrolled traffic flow.In this paper, we will present a novel method, which can exploit these events to measure the time delay and calculate the level of influence between distinct road segments.

B. Data sources
In the first generation of ATMSs and ITSs [19] the utilized data sources were various presence sensors in fixed positions, able to detect the presence of nearby vehicles.Initially inductive loop detectors were the most popular, but nowadays a wide variety of sensors became available [20].
Recently, the advent of GPS equipped smartphones and vehicles has given rise to a relatively new type of data source that could supplement presence type sensors to gather more detailed information or get data about roads, which have not been covered with presence sensors yet.
Our method can leverage both types of data sources, but in the case of GPS traces, a preprocess step is needed to be inserted before the data model building.

III. METHODOLOGY
The road network of big cities is a complex and hard to analyze system in which the accurate quantification of interactions between nonadjacent road segments is almost an insolvable objective, because of the unique decisions of thousands of drivers which makes the interactions invisible.However an unexpected event could be used to reveal hidden connections between road segments, because they always appear as an outlier in the timeseries of the investigated data type (traffic speed, traffic flow count, travel time, etc.).It follows, that if the emergence of an unexpected event is known, the ripple of that event can be observed through the network, thus the hidden correlations will become observable.As an analogy, we suppose that one can think of a road network as a huge black box system, which has one input and numerous outputs.If the system is fed with an unusual input, the inner behavior of the system can be inferred through the outputs.
In this section, the Algorithm for Identifying Hidden Influences (AIHI) will be introduced in detail, which targets to exploit unexpected traffic events to reveal the previously mentioned connections.In Subsection III-A, the graph based data model will be presented which is suitable for the analysis of unexpected events, then the distinct steps of the algorithm will be explained in Subsection III-B.

A. Data model
In our solution, the road network is modeled with a F G = (CP, F E) directed flow graph.Each cp ∈ CP node represents a control point, which is a special point of the road network, where the traffic measurement is feasible by different type of traffic sensors (such as inductive loop detectors, radar sensors, audio sensors, etc.).A cp control point itself does not store any traffic data, they just measure the traffic flow at their position.
The f e ∈ F E directed flow edges represent a link between two adjacent control points (cp src , cp dst ), where there is at least one lane between the two control points in the spreading direction of the traffic.To every f e directed flow edge, a ts time series is assigned, which stores historical traffic flow data.The ts time series of f e flow edge will contain those measurements, which are provided by the cp dst destination control point of f e.For instance, if there is a directed flow edge between cp 1 (source) and cp 2 (destination) control points denoted by f e 1,2 , then the f e 1,2 edge will contain the measurements of cp 2 control point.
Besides the data from fixed position sensors, the data model is also able to utilize GPS traces, if virtual control points are defined and trace data is aggregated in these points.
On Figure 1, an example of the traffic graph interpretation is depicted, it shows how the data model have to be interpreted on a simple road structure.Figure 1a illustrates a simple road network with control points, which are marked by cp i identifiers.On Figure 1b, the directed flow graph of the previous road network is visualized.The cp i identifiers on this figure are identical with the identifiers of the ones on Figure 1a, and there are f e x,y directed flow edges in the graph, if cp x and cp y control points are connected in the road network in the spreading direction of the traffic.
We also have to deal with the timeliness of our model.The different fixed position or GPS sensors sample the measured data type with different frequences based on their settings.Consequently the traffic analysis requires a homogeneous sampling frequency.Different aggregating time intervals are used in the literature for this purpose, which mainly depend on the task at hand.Generally, narrow intervals, for example 10 seconds, are meaningless and really noisy.We have found that the most common time intervals are in minutes (5-10 minutes) [21], [22], but there are also many papers claiming that longer time intervals would be more effective, like quarter or half an hour [16], [23].For our model, we chose a 30 seconds time interval because too long time intervals could hide important features of an unexpected event and the noisiness of the 30 seconds scale does not affect the correctness of the AIHI.

B. The steps of the algorithm 1) Initialization:
Let us denote a time series of f e src,dst flow edge by ts src,dst in which data can be described as an ordered sequence of discrete measurements: where m t is the measured traffic count value at time t at the cp dst control point.
The entry point of the algorithm will be an f e directed flow edge, for which the associated ts time series shows unexpected event between an arbitrary (t start , t end ) time interval.The (t start , t end ) time interval contains an unexpected event, if a statistically significant change can be detected in the behavior/shape of the ts time series between t start and t end compared to the ts hist historical average, or in other words, ts time series contains an anomalous part compared to the ts hist historical average.• F G: Directed flow graph of the road network • f e: The investigated directed edge • t start : The start time of the unexpected event Output: The effects of the event organized in a tree structure There are two methods to discover such time intervals.The easiest way when we have apriori knowledge about the occured unexpected events like accidents or road closures as these can be used directly as the input of the AIHI algorithm.The other possible approach is to execute an extensive search in the raw historical dataset for unexpected events.This approach requires a classification model that is able to decide whether a ts src,dst time series contains an unexpected event or not.
The AIHI algorithm needs the following three inputs: • The F G flow graph of the road network • The f e flow edge, which was the source of the unexpected event • The t start time of the emergence of the unexpected event on f e Utilizing these three inputs, the algorithm will follow the effect of the unexpected event through the traffic flow graph and determining those flow edges that could be affected by the input event.
2) Processing of a job: AIHI algorithm 1 uses a job pool based approach in which the whole investigation task is separated to smaller independent subtasks.In this case, a subtask is responsible for the investigation of whether the currently examined f e flow edge's ts time series is affected by the source unexpected event or not.To run a job, the following elements are necessary: • An f e flow edge • The ts an time series, which contains the whole anomaly part identified by the source job • The t start emergence time of the anomaly in the source job • The dir direction of the spread of the event (forward or backward), because the unexpected events of the road network can have an effect in both directions.To start the algorithm, the first job will be created from the entry point.The entry point contains all necessary job input parameters, except the ts an time series, thus by using the findAnomaly function (in Section III-B4), the anomalous part of the ts time series of f e flow edge have to be calculated.After that, the execution of the algorithm can be started and the processing of jobs will be continued until the pool has been emptied.
A job will execute these steps: 1) Find the best fitting best_lag time lag between the ts time series of f e flow edge and ts an anomalous time series from t s tart in the chosen dir direction (see Algorithm 2) 2) Check that an anomalous event can be identified from best_lag or not (see Algorithm 3) 3) If an anomalous event is detected during the second step, then the adjacent flow edges of the investigated f e flow edge are put into the job pool as new jobs in which the source job will be the current job 3) Find best lag: The BestFitLag function is responsible for determining the start of an anomalous event in the ts time series of the current job.f e flow edge.
We can assume that the shape of the anomalous time series ts an is quite similar to the anomaly observed in the actual ts time series of job.f e.To measure this similarity we defined a new distance function, called shape_dist (Equation 2).Contrary to other distance function like Manhattan, Euclidean or DynamicTimeWarping [24], it measures the similarity the time series' shape by differentiating changes in the different time series: The shape_dist function is calculated with increasing lag = 0, 1, 2, ... between ts time series and ts an time series.It can 654 PROCEEDINGS OF THE FEDCSIS.POZNA Ń, 2018 Input : • ts an : The time series containing the whole anomaly part • ts: The time series of the f e flow edge • t start : The emergence time of the anomaly in the time series • dir: The direction of the spread of the event (forward or backward) Output: • bestlag: The best fitting time lag • inf luence: The level of influence ≤ last do 10 lastlag ← lastlag + i; 11 end 12 bestlag = i; 13 length = len(ts an ); Algorithm 2: BestFitLag be assumed, that while the delay is increasing, the distance between the two time series will decrease until the best fit is reached.Thus if the calculated distance values start to increase, the possible best delay has been reached, because the adjacent road segments show high correlation in general.However sometimes the calculated delay can be just a local minimum, thus a simulate annealing is applied to find the global optimum.
Furthermore, the best distance value can be used to express the influence between the two flow edges, however a transformation is required, converting the shape_dist function's [0, inf) domain to [1,0] domain.Higher values mean a stronger influence, while lower values mean a weaker influence.Equation 4is designed for this purpose: distance = shape_dist(ts an , shif t(ts, lastlag)) (3) , where the length parameter equals with the length of ts an .4) Find anomaly: The findAnomaly function (Algorithm 3) is responsible for determining whether an anomalous part starting from t start can be identified.If an anomalous part is found, the function also returns its length.
The findAnomaly function exploits the observation, that the error of a prediction model significantly increases when Input : • ts: The time series of the f e flow edge • ts hist : The historical average time series of the f e flow edge • t start : The emergence time of the anomaly in the time series Output: • is_anomaly: The input ts time series contains anomaly from t start or not  This point in the time series is length apart from t start , therefore the length of the anomaly can be calculated.If the length equals with zero, it means that ts does not contain an anomaly.

IV. CASE STUDY
After the implementation of the AIHI, in this section we will demonstrate its ability to follow the spread of the effect of a traffic accident through a traffic flow graph.
At first, we searched for datasets containing traffic flow data and traffic accidents as well, but unfortunately there was no such publicly available traffic flow dataset at the time of writing.Because of this, a simulation framework was used for the evaluation.
As the simulation framework, we chose Simulation of Urban Mobility (SUMO), a free and open traffic simulation suite, which is available since 2001.SUMO allows modeling of intermodal traffic systems, including roads, vehicles, public transport and pedestrians.Included with SUMO is a wealth of supporting tools, which handle tasks such as route finding, visualization, road network import from open street maps and emission calculation.
In our scenario, besides the real traffic flow data, a traffic accident had to be generated.The authors of [25] used traffic lights for this purpose, thus this approach had been applied in our simulation as well.The accident is simulated by opening only one of the four available lanes on a road.
In the simulation, high rank roads of Budapest's suburb are examined.Seven control points (using inductive loop detectors) were placed on the map as depicted on Figure 4. We simulated five hours of traffic flow, which was similar to afternoon rush hours.The vehicles are simulated with speedF actor = normc(1, 0.1, 0.2, 2), which means that 95% of the vehicles drove between 80% and 120% of the legal speed limit.In the beginning we simulated normal traffic flow, then an one hour long traffic accident had been inserted after two and a half hours near to cp 2 , so the effect of the accident could be identified first on f e 1,2 flow edge (Figure 2).
As mentioned in Subsection III-B1, the entry point of AIHI was f e 1,2 flow edge with t start start time of the simulated traffic accident.The result of AIHI, is displayed on Figure 5.The spreading tree of the accident shows that the farther control points were, the bigger the detected time lags became, while the influences were decreasing as expected.The change of the pattern of the anomalous part was also visualised between cp 4 and cp 5 on Figure 6.

V. CONCLUSION
The road network of big cities is a complex and hard to analyze system in which the accurate quantification of interactions between nonadjacent road segments is almost an insolvable objective, because of the unique decisions of thousands of drivers which makes the interactions invisible.In this paper a novel algorithm (AIHI) has been presented, that is able to exploit unexpected traffic events to reveal the hidden connections between nonadjacent road segments and provides the following information: • identify nearby road segments that could be affected by the event • if a road segment is affected, the exact level of influence between the affected and accident road segments, • the time delay: the time between the event and its detection on the affected road segments Combining these new information sources with real-time traffic information, the accuracy of prediction models able to integrate external environmental variables can be increased.
We have demonstrated the capabilities of AIHI with simulations on a real road network which results can be depicted as spreading tree of the accident.

Fig. 1 :
Fig.1: An example of the traffic graph model interpretation.On Figure1a, the original road network is depicted with control points, while Figure1bshows the road network's graph representation.

Fig. 2 :
Fig. 2: Visualization of anomaly behavior compared to Predicted flow

8 Fig. 5 :Fig. 6 :
Fig. 5: Visualization of the AIHI's result.The examined control point was cp 2 (red dot), where the time lag is zero and the influence is 1.0.Other influenced control points in the road network identified by AIHI are marked with blue dots.