QueuePredict – accurate prediction of queue length in public service offices on the basis of Open Urban Data APIs

This paper presents the methods to predict the number of people waiting in queues in Districts Offices of the City of Warsaw. On the basis of information from real-time queues length exposed as a part of Open City Data portal we can predict number of people in given time frame, which can be then used to further estimate predicted waiting time. These information are important and useful for number of people that visit district offices every day. The methods presented in the paper can be used to build a new value-added smart city services.


I. INTRODUCTION
Queue management systems are popular tools in many organizations focused on mass customers service.When services are operated by limited number of support staff and at the same time we have to deal with a large, often unpredictably bursting number of visitors, we can expect the phenomenon of queues.Queue management systems (QMSs) can be useful in organizing the queues and very often shortening them.QMSs became popular solution used in crowd management all over the world, in Poland being especially popular in retail trade (e.g. in pharmacies), financial services (banks), electricity suppliers or in public offices (post offices, District Offices, Registry Offices etc.).
In this paper we will focus on the QMSs data provided by the City of Warsaw as a part of Open Urban Data platform of the City of Warsaw [1].The remaining part of this paper is organized as follows:  First, queue management systems present in the citizen servicing areas are described together with QMS API. Next, several applications that uses the API are described, with emphasis on the functionality and usefulness. Sections V and VI describe data collection process and prediction model assumptions respectively  Finally prediction model evaluation and description of future works are provided.

II. EXISTING SOLUTIONS
Citizen servicing offices of the City of Warsaw are served with multiple queue systems.When a citizen enters the office, he gets an appropriate ticket at a kiosk, and then waits until his number appears on the wallboard that manages the queue.Warsaw offices mainly uses QMS delivered by Swedish company Qmatic [2].Installed solutions allows to transfer ticket between queues [3] when, for example, citizen must pay for the documents at the cash desk.Another available function is ticket reservation e.g. using web page.In this case, the person who booked the ticket have to enter the office five minutes before the scheduled time and confirm his/her presence using dedicated function at kiosk [4].Unconfirmed tickets are automatically canceled and removed from the system.It should be pointed out that current queue management systems used in Warsaw do not provide citizens with predicted queue length.Instead, only current queue parameters [5], such as number of waiting people or expected waiting time are presented.

III. QUEUE DATA EXPOSITION
In addition to the queue displays in offices or web pages, that were mentioned in section II, the City of Warsaw also provides Application Programming Interfaces (APIs) to its' QMSs.This RESTlike APIs are part of Open Data portal [1], and cover QMSs serving 8 city offices.The list of the offices as well as number of distinct queues registered within each office is presented in Table I.Open Urban Data platform used by the City of Warsaw was developed within MUNDO project (Apps4Warsaw) [6].The platform collects data from QMSs with the frequency of 1 minute.

IV. EXISTING APPLICATIONS
Number of application using QMSs API of the CoW were developed for past 2 years.The very first one was 'StaczKolejkowy' [8] that was developed within Business Intelligence Hackathon API (BIHAPI) contest in 2014 [7].Another example use of the API is e-kolejka application that was competing in BIHAPI 2015.Both are mobile applications for Android -based smartphones and allows users to visualize queues on the screen and notify them when their ticket is being called.It should be emphasized that 'Staczkolejowy' was one of the winners of the 2014 BIHAPI contest.
Queue API was also popular API among participants of Apps4Warsaw contest.The applications concepts submitted to the competition and prototypes built in the development phase of this contest are listed in Table II.In order to properly handle the task of prediction of the queue length, two datasets were collected between 1st of April 2016 and 7th of May 2016.For each of the datasets real-time queue data for a total number of 681 queues defined in 6 district offices (named Q1, …, Q6) were collected using 1 minute polling interval.The data were then split into two groups, one of them, containing data from 1st of April till 30th of April, formed learning set, while the data collected in May were used to evaluate the accuracy of prediction model.
It was observed that queue length of 0 (zero) is reported very frequently by the API.Moreover, in about 5% of queries internal server errors were reported.
For each of the set only queue data for working hours were used in further analyses.
Overall our dataset contained of about 16 millions of valid individual entries of single queue length.

VI. QUEUE PREDICTION
As mentioned in section V, individual records of queue length are often equal to 0. Therefore we decided that instead of predicting such an individual state, our methods will estimate mean queue length in a given time frame.This means that raw datasets described in Section V had to be aggregated with the use of time-based windows.We decided to use two non-overlapping aggregation windows of length being respectively: 5 minutes and 1 hour.The idea of data aggregation is illustrated in Fig. 2. For every window we calculated mean queue length, it means that in the example in Fig. 2 for window #1 the value assigned to this window will be 1. 4.
In order to perform prediction of the queue length we decided to use Random Forests algorithm [13].Input data for the prediction of queue length in time t were:  aforementioned average queue length for k past aggregation windows, denoted t-1, t-2, …, t-k,  day of week (d) of aggregation window t  time (hour and minutes) of the aggregation window t.

Due to methodology based on averaged values, we decided to use Random Forests in regression analysis task.
For experimental verification of our method, we used three combinations of window size and k that are reflecting two use case scenarios: short-time predictions for people who are in a short distance to district offices and long-term prediction, for people traveling long distances to reach the office.Aggregation window length and number of historical observation for this use case scenarios are as follows: Proposed method prediction accuracy, defined as the proportion of predictions with error less than 0.1 person/timeframe, varies between 36.68% up to 88.16%, depending on district office.Mean prediction error is 0.714 person/timeframe in worst case, and 0.096 person/timeframe in the best case.It is important to note that distribution of error varies in time, as can be observed in the example at Fig. 3.
It can be also observed that almost all peaks observed in real data were predicted, while there are few false predictions, i.e. peaks that although were predicted by our algorithm, were never observed in real life data.
Our algorithm achieves the best results when shorter aggregation window of 5 minutes is used, despite the number of historical observations that are analyzed.The explanation for this fact is that usual peak in queue length lasts for around 30 minutes and therefore individual peaks are averaged over time window and produce more noisily data over which prediction is less accurate.
It should be also noticed that for the purpose of experiment presented in this paper we excluded data coming from hours were offices were closed (i.e.queue length was always 0).This means that prediction evaluation was performed in real use-case scenario.
Detailed comparison of obtained results for all three test cases are provided in tables III, IV and V respectively.

VIII. FUTURE WORKS
Currently, we put emphasis on development of accurate and computationally efficient prediction methods for queue data.Nevertheless proposed methods are designed to be easily transformed into QueuePredict API, which would be highly useful extension to already exposed urban APIs.
Moreover future works on prediction methods will focus on the use of additional data including but not limited to: • weather data and weather forecasts, • public transport data, • long-term observation of queue usage, • prediction of queue state in time steps far beyond near future (i.e.predicting queue length for several hours or days).

IX. SUMMARY AND CONCLUSION
This paper presented the evaluation of Random Forests regression model in the task of predicting the average length of queue in District Offices of the City of Warsaw.The queue data are collected with the use of Open Urban Data portal of the City of Warsaw and are collected in 6 district offices.
Prediction models were developed in three different scenarios representing different real life use cases.Prediction accuracy of the developed models is as high as 88.16 % with prediction error being as low as 0.096 person/timeframe when short time prediction is considered.Long term prediction facilitates significantly worse results with maximum prediction accuracy of 72.96 % and average prediction error of 0.180 person/timeframe.

Fig. 2 .
Fig. 2. Time-based windows used to aggregate raw queue data, window of 5 minutes used

Fig. 3 .
Fig.3.An example comparison of real (blue) and predicted (red) queue length for 5 minutes long aggregation window for one of the districtoffices.