
Traffic4cast Competition Track at NeurIPS 2019
Vancouver Convention Center, Canada
December 14, 2019
The session (see Competition track streaming link) features a summary of the Traffic Map Movie Forecasting challenge. The challenge is based on high-resolution real-world traffic data set with over 100 billion points, provided by HERE Technologies. Traffic data is sampled on 100m x 100m squares every 5 minutes throughout a year in 3 big cities; traffic volume, speed, and direction are encoded as different color channels. The goal of the competition is to predict the next 15 minutes of traffic. Learn more about the Competition …
More than 400 users downloaded the traffic data set; 40 teams participated in the competition, with 30 teams beating the baseline. We have received 600 complete submissions with over 3,200 predictions. The top 3 and the selected 3 contestants presented their results at the meeting. Check out the Current Leaderboard …
Explore further:
An additional extended discussion where contestants presented their work in detail was held as a Full Day Symposium on December 15, 2019, following the NeurIPS meeting.
Program
Session Highlights

1. Introduction.
Dr David Kreil is a founding director at IARAI and has been organizing large scientific data analysis contests for well over a decade (cf. www.camda.info). He introduced the competition and explained the relationship of the shorter NeurIPS Competition Track meeting and the full-day Traffic4cast Symposium. The short NeurIPS presentations present highlights, followed by an awards ceremony, while deeper exposés and discussions follow at the Traffic4cast Symposium the next day.

2. Real-world Data Shaping the Algorithms of Modern AI.
Dr Sepp Hochreiter is a founding director at IARAI and a professor at Johannes Kepler University Linz. Introducing the session he discussed the importance of real world data for the advancement of AI.
"Our goal is to apply AI methods to improve the quality of life, of humanity. Traffic and, more generally, mobility are important issues in densely populated cities. At the IARAI, we focus on the major challenges for the future of humanity. Besides mobility, these challenges encompass climate change and pollution, where we need a detailed model of the world, at the level of street maps. The data, including location of factories and localization of air pollution, and population clustering based on social media and mobile GPS – all these are being collected now. Using these data with the help of AI we will go in the direction of smart cities and their control, from traffic lights, to air pollution, and beyond…”

3. Traffic Map Movies vs Real Movies – the Art of Predicting Moving Images.
Dr Leonid Sigal, professor at the University of British Columbia, discussed the differences and similarities, as well as opportunities for synergies in analyzing traffic movies vs realistic visual films.
“Traffic movies and real videos have a similar representation as they are tensors with dimensions of width x height x 3 (number of channels) x time, while the semantics of color channels is, of course, different. In addition, at the low level, traffic movies and real videos both have spatial and temporal coherence, as there is continuity in their sources. This similarity suggests that some of the methods used to predict how events will unfold in real movies can be applied to traffic movies as well. One of the common approaches is called Conditional Video Generation. The differences lie in the background and camera, which are static in the case of the traffic movies, and also the higher resolution of the traffic movies…”

4. Traffic4Cast – Aligning Short-Term Traffic State Forecasting and Video Frame Prediction.
Dr David Jonietz, from HERE Technologies, Zurich, described the novel approach to traffic forecasting of presenting traffic data as temporal sequences of activity on cartographic pixel maps.
“Human activities are distributed across space and time, creating the need to be mobile, facilitated by the transportation infrastructure. A mismatch in the supply and demand in the transportation system leads to negative social, economic, and ecological consequences. This is why traffic management is important. At HERE technologies, we aim to improve urban traffic conditions by helping to balance the supply and demand. To do this, we provide a variety of different tools for traffic monitoring, forecasting, and optimization. Short-term traffic forecasting, i.e. the prediction of variables that describe the state of traffic, plays a key role in traffic management. Traffic forecasting has been an active field for the past 50 years, however, it is still considered a non-solved and non-trivial task, due to a set of challenges. Specifically, despite spatial and temporal regularity of traffic, it is affected by a multitude of random events, such as road accidents or closures. Traffic forecasts also influence drivers’ decisions and thus modify future traffic states. In addition, road networks show unique and complex spatial dependency patterns. In our approach, traffic data are presented as temporal sequences of cartographic maps. We anticipate that similar to video prediction, neural networks will be able to pick up the underlying rules from these data."

5. Traffic Map Movie Prediction Using U-Net with DenseNet.
Dr Sungbin Choi, Seoul, providing the top ranked submission in the leaderboard, discussed traffic map prediction using U-Net based deep convolutional neural networks.
His approach employed a U-Net based architecture, where a convolutional block contained four layers, each densely connected to the subsequent layers, as in a dense convolutional network (DenseNet). An ensemble model was built by concatenating the output of four trained base model variants. Several alternative U-Net based architectures were tested: a fully convolutional network DenseNet, a residual network (ResNet), and a simple convolutional network. While the ResNet model showed the best performance, due to computational complexity the DenseNet approach was implemented. Image augmentation techniques by random image translation and flipping were also tested, but did not noticeably improved performance.

6. Augmenting the Traffic4cast Challenge 2019 with Spatio-Temporal Context.
Henry Martin of the MIE-Lab, ETH Zurich, providing the second ranked submision in the leaderboard, discussed integration of spatial and temporal context information into convolutional neural network models.
In his approach, the data was transformed by stacking different time steps as additional channels, thus reducing data dimensionality. A U-Net based model was then trained for each city with double convolution in each layer. As most data are distributed around the city streets, the team attempted to extract the street networks from the data by thresholding the image sums. A mask-based graph convolutional network (GCN) was then trained on the transformed images. As the data shows strong temporal dependences, the team tested a conditional U-Net model, with temporal features, such as day of week and hour in the day incorporated at the lowest U-Net level. Several other network architectures were also evaluated; however, the original U-Net based model interestingly gave the best predictions.

7. CrevNet: Conditionally Reversible Video Prediction.
Wei Yu, University of Toronto, providing the third ranked submission in the leaderboard, introduced a Conditionally Reversible Network with reversible predictive modules.
Wei Yu outlined how his approach preserved information, reduced memory consumption, and increased computational efficiency. The model used a reversible architecture and included a two-way auto-encoder (encoder and decoder) and a recurrent predictor. The input video frames were reshaped and split channel-wise into two groups. These two groups were passed to the auto-encoder for feature extraction, and then into the predictor. The predictor contained multiple reversible predictive modules (RPM) computing spatial and temporal dependencies. In the RPM, all standard convolutions were substituted with layers from the convolutional recurrent network family (such as convolutional or spatiotemporal Long Short-Term Memory), and a soft attention mechanism enabled the model to focus on objects in motion instead of the static background. The transformed high-level features produced by predictor were passed back via the decoding pass to yield a prediction.

8. Building an Effective Large-Scale Traffic State Prediction System.
Zhichen Liu, School of Transportation, Southeast University, China, presented a selected contribution by Yang Liu et al.
Their model was designed to achieve a relatively high prediction performance on limited computational resources. The model was based on a U-Net like architecture; a mask instead of spatial attention was used to address data sparsity. To capture spatial correlations in the data, pooling was applied instead of increasing the convolutional kernel size to increase the receptive field. Each city was divided into five sub-regions, which allowed the model to be optimized for a particular sub-region, while permitting training on limited resources. Using manual feature engineering, a number of features such as time continuity of the traffic and the 24-hour periodicity of its states were converted into different additional input channels.

9. Spatiotemporal Tile-based Attention-Guided LSTMs for Traffic Video Prediction.
Tu Nguyen, Daimler Autonomous Services, presented a selected contribution.
His model consisted of spatial-temporal Long Short-Term Memory (LSTM) units for the encoder and decoder components. The LSTM memory modules were modified with elements of 2D convolutional LSTMs to preserve spatial information. Features of gated LSTMs were added to capture spatiotemporal dynamics. To process high resolution traffic data, movie frames were split into tiles, to focus on analyzing local spatial correlations. A tile-based mini-batch method improved training efficiency. Finally, a tile-based cross-frame global additive attention layer was introduced.

10. Recurrent Auto-encoder with Skip Connections and Exogenous Variables for Traffic Forecasting.
Pedro Herruzo Sánchez, Polytechnic University of Catalonia, Barcelona, presented a selected contribution.
His model was built based on a U-Net like architecture that allows predicting the next frame given the previous frame. The architecture of his model was adapted into the problem of a sequence to sequence prediction: A sequence of any size was accepted by iteratively using the encoder and concatenating its outputs. A recurrent encoder thus accumulated the temporal information of the input sequence, while a recurrent decoder gave the embedded predictions. These predictions were then converted to the original space by a further decoder using skip connections from each layer of the encoder, but only considering the last frame in the input sequence. In addition, external data such as time of day, day of week, and weather were introduced into the model by concatenation in embedded space. A data sampling technique was used for faster sampling and batch diversity.

11. The Making Of - Compiling the Large-scale Real-world Data for Traffic4cast.
Moritz Neun, HERE Technologies, Zurich, described how the high-resolution traffic data of the Traffic4cast Competition 2019 was obtained.
Traffic data was collected in 3 big cities with different topologies: Berlin, Istanbul, and Moscow. Traffic movies were sampled using pixel level aggregation of moving probe point data. Spatial aggregation was performed starting from the boundary of a city and dividing it into ca. 100m x 100m cells (0.001 degree grid). The data in cells was then aggregated into 5-minute frames with 3 (RGB) channels representing traffic volume (R), speed (G), and direction (B). Traffic volume and speed were capped by minimum/maximum values; the heading direction was the most challenging property to encode in an average value. The data in each cell was very diverse depending on the types of roads and the number of intersections. The collected data archive was then made accessible via Apache Drill parallel SQL query engine for compilation of the public competition data.

12. Significance and Relevance of Leaderboard Ranks.
Aleksandra Gruca, Silesian University of Technology, Poland, presented an analysis of the significance of competition rank tables.
The competition leaderboard ranks were based on the mean squared error per team, combined for all cities and all time frames. The goal of this study was to assess if the differences in the competition results were statistically significant. This required comparing multiple algorithms over multiple data sets, which is a typical approach in the assessment of machine learning methods. The null hypothesis assuming that all algorithms were equivalent was tested. In order to achieve sufficient power, each city and each day were considered as separate data sets, and a Friedman non-parametrical statistical test was performed, which rejected the null hypothesis. To analyze the statistical differences between the top 10 methods, mean rank post-hoc tests with multiple testing correction for false discovery rate control were applied, showing that the differences between the top ranked methods were significant, whereas the remaining methods separated into significantly different groups of similar performance.

13. The Traffic4cast Competition in Numbers.
Shia Shi, HERE Technologies, Seattle.
In this competition, 40 teams submitted results. More than half of the teams provided less than 10 submissions each, while several teams presented over 50 submissions. The submission time pattern is characterized by two peaks: first following the opening of the competition, and second preceding the competition deadline. Shia Shi investigated how the number of submissions affected results, and found that trying several submissions was important to success, but results were not directly correlated with submission numbers. Shia also observed that the submission score improved significantly over time, with the biggest changes within the first month. Finally, based on the mean squared errors per city, Berlin received the best forecasts while Moscow was the most difficult city for predicting traffic.

14. Picking the Right Loss – Choice of an Appropriate Metric.
Michael Kopp, a founding director at IARAI and data scientist at HERE Technologies, discussed potential alternatives to the mean squared error as a prediction metric.
The loss function for the competition leaderboard was calculated from the mean squared error between all cells in the real and predicted images. One reason for choosing this metric was to bring two communities together – the movie prediction / completion task community and the traffic forecasting community. An issue with this approach lies in the ‘purely local’ property of the metric, where a prediction (e.g., for high traffic volume) that was misplaced by only one cell got penalized twice. A potential alternative metric could be inspired by the Frechet Inception Distance (FID), which analyses image similarity based on the distances between features. Another possible solution could be inspired by generative adversarial networks (GANs), with the discriminator as a loss function testing for consistency of the images. Finally, transfer of learning between cities could be used as an underlying task for an effective loss metric.