Data Origin

All time series data contain Transportation data, including highway traffic, traffic data of cars in tunnels, traffic at automatic payment systems on highways, traffic of individuals on subway systems, domestic aircraft flights, shipping imports, border crossings, pipeline flows and rail transportation. The data contains time series of different structure and frequency, grouped into homogeneous datasets. All time series used are empirical time series, and start at different dates and have different lengths. The time series do not contain the final out-of-sample values, which are withheld in order to evaluate the contestants ex ante.

Datasets

The competition allows you to compete on a selection of 18 datasets of 11 time series each. These 18 datasets are predicted in 3 distinct tournaments to be held in 2009 and 2010. Each of the three tournaments will include 6 datasets of 11 homogeneous time series with a different time series frequencies: Each dataset has a homogeneous time frequency, including low-frequency time series of yearly data (NNG-A), quarterly data (NNG-B) and monthly data time (NNG-C), and high-frequency time series of weekly data (NNGC1-D), daily data (NNG-E) and hourly data (NNG-F).

Each set of 6 datasets represents a complete tournament that allows the evaluation of forecasting accuracy of a particular method across up to 66 time series of different time frequencies. Participants can choose to participate only in a single dataset (e.g. 11 series) or multiples thereof, a single complete time series frequency (e.g. 33 time series) or multiples thereof, a complete tournament (e.g. 66 series) or - ideally - all tournaments and all time series!

Dataset	Tournament 1	Tournament 2	Tournament 3	Dataset Winners
NNG-A - Yearly	1.A 11 series	2.A 11 series	3.A 11 series	x.A 33 time series
NNG-B - Quarterly	1.B 11 series	2.B 11 series	3.B 11 series	x.B 33 time series
NNG-C - Monthly	1.C 11 series	2.C 11 series	3.C 11 series	x.C 33 time series
NNG-D - Weekly	1.D 11 series	2.D 11 series	3.D 11 series	x.D 33 time series
NNG-E - Daily	1.E 11 series	2.E 11 series	3.E 11 series	x.E 33 time series
NNG-F - Hourly	1.F 11 series	2.F 11 series	3.F 11 series	x.F 33 time series
Tournament Winner	1.x winner 66 series	2.x winner 66 series	3.x winner 66 series	Grand Total Winner 198 time series

In order to limit the effort into building models for the competition the datasets of each tournament will be released sequentially, releasing 2 datasets of a tournament every 3 months. The datasets will be released in these three stages (of 2 datasets each) in order to you to focus your time and attention on each set separately. Datasets C and E are similar in structure to the NN3 and NN5 competitions of monthly and daily data respectively, in order to to reflect experiences and learning from past competitions and to allow participants to explore their previously developed algorithms on this new but similar data.

All datasets can then be downloaded and forecasted in consecutive stages following the NN GC Instructions (see also above).

Time Series Structure

Depending on the time series' frequency, the data may contain a number of time series patterns including none to multiple overlying seasonality, local trends, structural breaks, outliers, zero and missing values etc. These are often driven by a combination of unknown and unobserved causal forces driven by the underlying yearly calendar, such as reoccurring seasonal periods, bank holidays, or special events of different length and magnitude of impact, with different lead and lag effects. The data may also contain missing observations and true zero demand, which are both provided as "0" values. Below are some time series plots from the dataset:

Fig.1. Example time series of transportation data measured
as yearly, quarterly, monthly, weekly, daily or even hourly time series.