Review of kaggle time series competition

Inspired by Learnings from Kaggle’s Forecasting Competitions by Casper Solheim Bojer & Jens Peder Meldgaard in 2020, I surveyed the top 3 solutions in the past kaggle time series competitions since 2014 to 2024.

If you find new time series competitions, please tell me by issues.

❤️ Support This Project

If you find this project helpful, please consider supporting it with a small donation!

Table of Contents

List of competitions
Top 3 most voted EDAs
Top 3 solutions

List of competitions

#	Year	Title	Data size
1	2014	Walmart Recruiting - Store Sales Forecasting	3.22MB
2	2015	Walmart Recruiting II: Sales in Stormy Weather	9MB
3	2015	Rossmann Store Sales	39.85MB
4	2016	Predicting Red Hat Business Value	26.74MB
5	2017	Web Traffic Time Series Forecasting	611.85MB
6	2018	TalkingData AdTracking Fraud Detection Challenge	11.27GB
7	2018	Corporación Favorita Grocery Sales Forecasting	479.88MB
8	2018	Recruit Restaurant Visitor Forecasting	27.3MB
9	2018	Google Analytics Customer Revenue Prediction	35.9GB
10	2019	LANL Earthquake Prediction	10.42GB
11	2019	Two Sigma: Using News to Predict Stock Movements	Not available
12	2019	ASHRAE - Great Energy Predictor III	2.61GB
13	2020	University of Liverpool - Ion Switching	146.08MB
14	2020	M5 Forecasting - Accuracy	450.47MB
15	2020-2021	Jane Street Market Prediction	Not available
16	2020-2021	Acea Smart Water Analytics	3.45MB
17	2021	Google Brain - Ventilator Pressure Prediction	698.79MB
18	2022	Optiver Realized Volatility Prediction	2.73GB
19	2022	G-Research Crypto Forecasting	3.12GB
20	2022	Ubiquant Market Prediction	18.55GB
21	2022	American Express - Default Prediction	50.31 GB
22	2022-2023	GoDaddy - Microbusiness Density Forecasting	10.93 MB

Top 3 most voted EDAs

To learn the characteristic of data given in each competition, EDA is one of the best way.
So top 3 most voted EDAs are listed.

1. Walmart Recruiting - Store Sales Forecasting

> Go to the top

2. Walmart Recruiting II: Sales in Stormy Weather

> Go to the top

NA

Top 3 solutions

1. Walmart Recruiting - Store Sales Forecasting

> Go to the top

Pos	Methods	FE	Ensemble	Split	Code	Discussion
1	Time Series Models: • STLF/ETS • STLF/ARIMA • Seasonal ARIMA • Fourier ARIMA	• SVD preprocessing • Time series decomposition • Correlation-based NN pooling • Holiday period adjustment	Average of 6 time series models: • 3 simple models • 3 advanced models	Department-wise pooling across stores	💻	🔊
2	Statistical Methods: • ARIMA with STL decomposition • ETS with STL decomposition • Naive method with STL decomposition • UCM Machine Learning Methods: • Random Forest • Linear Regression • K Nearest Regression • Principal Component Regression	• Used week of the year (1-52) • Filled missing values with 0 • Different holiday weighting for stores with high growth rate vs. stores without high growth	Weighted average and trimmed mean of 6 methods*	Individual models for each store- department combination (~3600 total)	💻	🔊
3	Simple year-over-year approach (no statistical or ML models): • Prior year sales as base predictor • Holiday week alignment • Weighted average of prior weeks • Store & department trend factors • Manual trend coefficients	• Calendar date matching between years • Special handling for moving holidays • Store-specific trend factors • Department-specific trend factors • Minimal "warm day" adjustment	None - single model with minor adjustments	Individual predictions for each store- department combination	💻	🔊

2. Walmart Recruiting II: Sales in Stormy Weather

> Go to the top

Pos	Methods	FE	Ensemble	Split	Code	Discussion
1	PPR (Projection Pursuit Regression) for curve fitting + Lasso regression with vowpal wabbit	• Weekday, weekend flags, holiday flags • Item/store numbers • Date features (year, month, day) • Black Friday period flags • Weather features (precipitation>0.2, temperature departure>8 or <-8) • Moving average • Handling of consecutive zeros • Feature interactions (AB, AC, BM, CM, BK, CK)	Single model	• Excluded item/stores with all zeros • Excluded data from 2013-12-25 • Excluded data where moving average was zero • Excluded dates with too many consecutive zeros on both sides	💻	🔊
2	-	-	-	-	NA	NA
3	-	-	-	-	NA	NA

3. Rossmann Store Sales

> Go to the top

Pos	Code	Discussion
1	NA	🔊
2	NA	NA
3	💻	🔊

4. Predicting Red Hat Business Value

> Go to the top

Pos	Methods	FE	Ensemble	Split	Code	Discussion
1	Two-level approach: • 1st level: XGBoost classifier on aggregated data • 2nd level: XGBoost with leakage exploitation	• TF-IDF features for categorical variables • Group level aggregations • Group_1 ID value • Number of activities/people in group • Min/max dates • First/last activity in timeline	Average of: • Best LB model • Best CV model	• Removed group_1=17304 (30% of train data) • Used Distinct operator for group_1's with 3000+ rows • 5-fold unstratified CV based on people file	NA	🔊
2	Mixture of three models: • Logistic regression • kNN • XGBoost (based on public scripts) + Probabilistic interpolation model for groups	• Histogram-based features (fuzzy binary encoding) • Date-based probabilistic interpolation • Leaderboard feedback	Weighted average: • 0.4 (linear model) • 0.25 (kNN) • 0.25 (public script) • 0.1 (0.5 constant)	NA	NA	🔊
3	XGBoost + extensive group_1 and date trick exploitation	• Modified leave-one-out encoding for char_10 • Group_1 as continuous feature • Time-based confidence levels for leak exploitation • Group-person level leak simulation for training data	Post-processing: • 90% weight to most extreme prediction within a group • 10% weight to raw prediction • Rule-based overrides for ML shortcomings	NA	NA	🔊

5. Web Traffic Time Series Forecasting

> Go to the top

Pos	Code	Discussion
1	💻	🔊
2	💻	🔊
3	💻	🔊

6. TalkingData AdTracking Fraud Detection Challenge

> Go to the top

Pos	Code	Discussion
1	NA	🔊
2	NA	🔊
3	NA	🔊

7. Corporación Favorita Grocery Sales Forecasting

> Go to the top

Pos	Code	Discussion
1	💻 💻	🔊
2	NA	🔊
3	NA	🔊

8. Recruit Restaurant Visitor Forecasting

> Go to the top

Pos	Methods	Ensemble	Split	Code	Discussion
1	LightGBM	-	-	💻 💻	NA
2	-	-	-	NA	NA
3	-	-	-	NA	NA

9. Google Analytics Customer Revenue Prediction

> Go to the top

Pos	Methods	Ensemble	Split	Code	Discussion
1				💻	🔊
2				NA	🔊
3	-	-	-	NA	NA

10. LANL Earthquake Prediction

> Go to the top

Pos	Code	Discussion
1	💻 💻	🔊
2	NA	🔊
3	NA	🔊

11. Two Sigma: Using News to Predict Stock Movements

> Go to the top

NA

12. ASHRAE - Great Energy Predictor III

> Go to the top

Pos	Methods	Code	Discussion
1	CatBoost LightGBM MLP	NA	🔊
2	XGBoost LightGBM Catboost Feed-forward Neural Network	NA	🔊
3	CNN LightGBM Catboost	NA	🔊

13. University of Liverpool - Ion Switching

> Go to the top

Pos	Code	Discussion
1	NA	🔊
2	💻	🔊 🔊
3	NA	🔊 🔊

14. M5 Forecasting - Accuracy

> Go to the top

Pos	Methods	Code	Discussion
1	LightGBM	NA	💻
2	LightGBM	NA	💻
3	DeepAR	NA	💻

15. Jane Street Market Prediction

> Go to the top

Pos	Methods	FE	Ensemble	Split	Code	Discussion
1	XGBoost NN				💻	NA
3	49 layers MLPs	No	15 ensembles of NN		NA	🔊

NA for Pos #2

16. Acea Smart Water Analytics

> Go to the top

NA

17. Google Brain - Ventilator Pressure Prediction

> Go to the top

Pos	Methods	Ensemble	Split	Code	Discussion
1	LSTM Transformer	single architecture	KFold	💻	🔊
2	Stacked LSTM	ensembled by 7 models	KFold	NA	🔊
3	Conv1d Stacked LSTM	random seed average	Stratified K-Folds	NA	🔊

18. Optiver Realized Volatility Prediction

> Go to the top

Pos	Methods	FE	Ensemble	Split	Code	Discussion
1	LightGBM MLP CNN		equally weighd average	GroupKFold	💻	🔊
3	LightGBM MLP TabNet		equally weighd average	KFold	💻	🔊

NA for Pos #2

19. G-Research Crypto Forecasting

> Go to the top

Pos	Methods	FE	Ensemble	Split	Code	Discussion
1	-	-	-	-	NA	NA
2	LightGBM		Single model		NA	🔊
3	LightGBM		Single model		💻 💻	🔊

20. Ubiquant Market Prediction

> Go to the top

Pos	Methods	Ensemble	Split	Code	Discussion
1	LightGBM TABNET	Average of (LGBM x 5 Folds) + (TABNET x 5 Folds)	PurgedGroupTimeSeries TimeSerieseSplit KFold	NA	🔊
2	LightGBM	-	Purged K-FOLD cross validation with embargo	NA	🔊
3	6 layers transformer	5 seeds ensemble	-	NA	🔊

21. American Express - Default Prediction

> Go to the top

Pos	Methods	Ensemble	Code	Discussion
1	LightGBM GRU	Ensembled by 4 models	💻	🔊
2	LGB/XGB/CTB NN		NA	🔊
3	LGB/CTB	Ensembled by 3 models	NA	🔊

22. GoDaddy - Microbusiness Density Forecasting

> Go to the top

Ongoing

Pos	Methods	Code	Discussion
1	Linear regression	💻	🔊
2
3

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github		.github
.gitignore		.gitignore
README.md		README.md

Uh oh!

r-matsuzaka/kaggle-past-time-series-competition

Folders and files

Latest commit

History

Repository files navigation

Review of kaggle time series competition

❤️ Support This Project

List of competitions

Top 3 most voted EDAs

1. Walmart Recruiting - Store Sales Forecasting

2. Walmart Recruiting II: Sales in Stormy Weather

3. Rossmann Store Sales

4. Predicting Red Hat Business Value

5. Web Traffic Time Series Forecasting

6. TalkingData AdTracking Fraud Detection Challenge

7. Corporación Favorita Grocery Sales Forecasting

8. Recruit Restaurant Visitor Forecasting

9. Google Analytics Customer Revenue Prediction

10. LANL Earthquake Prediction

11. Two Sigma: Using News to Predict Stock Movements

12. ASHRAE - Great Energy Predictor III

13. University of Liverpool - Ion Switching

14. M5 Forecasting - Accuracy

15. Jane Street Market Prediction

16. Acea Smart Water Analytics

17. Google Brain - Ventilator Pressure Prediction

18. Optiver Realized Volatility Prediction

19. G-Research Crypto Forecasting

20. Ubiquant Market Prediction

21. American Express - Default Prediction

22. GoDaddy - Microbusiness Density Forecasting

Top 3 solutions

1. Walmart Recruiting - Store Sales Forecasting

2. Walmart Recruiting II: Sales in Stormy Weather

3. Rossmann Store Sales

4. Predicting Red Hat Business Value

5. Web Traffic Time Series Forecasting

6. TalkingData AdTracking Fraud Detection Challenge

7. Corporación Favorita Grocery Sales Forecasting

8. Recruit Restaurant Visitor Forecasting

9. Google Analytics Customer Revenue Prediction

10. LANL Earthquake Prediction

11. Two Sigma: Using News to Predict Stock Movements

12. ASHRAE - Great Energy Predictor III

13. University of Liverpool - Ion Switching

14. M5 Forecasting - Accuracy

15. Jane Street Market Prediction

16. Acea Smart Water Analytics

17. Google Brain - Ventilator Pressure Prediction

18. Optiver Realized Volatility Prediction

19. G-Research Crypto Forecasting

20. Ubiquant Market Prediction

21. American Express - Default Prediction

22. GoDaddy - Microbusiness Density Forecasting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Packages