Predicting-NBA-Salaries

This project highlights two important skills needed for data science. First is the ability to clean data and second is to create machine learning models.

What are NBA Advanced Statistics?

According to the NBA's official stats FAQ, “Advanced Stats are a way to study basketball through objective analysis. It is a more in-depth way to look at a simple box score, and more accurately evaluates the skill and production of a player or team.”

Data

The data set used in this project comes from Kaggle by the user Ai Shaojun. This dataset contains advanced and non traditional stats from the 2017-2018 NBA season. While this data set was perfect for this project, a lot of exporatory data analysis (EDA) needed to be done in order to fully clean and prepare this dataset for machine learning.

EDA and Data Cleaning

Here is one example that shows EDA and Data Cleaning

In the NBA, every year they hold a draft where a total of 60 players are picked and added into 1 of 30 NBA teams. Now the problem with this data set is that it lists people being drafted 62nd. Upon further investigation, I realized that the dataset has undrafted players being drafted 62nd. This needed to be fixed, so the solution was to create a new column assigning a 0 if the player was undrafted and a 1 if the player was drafted.

Regression Models

Building regression models was the next step to this project. Unfortunately, building a linear, KNN, random forest, and XGboosted regression model did not create a good model. When train test splitting the data, the models were all very overfit and yielded really poor R^2 values. I then took the two best models, random forest and XGBoost, and used them within a voting regression model. These were the testing metrics I got out of the model:

The model has an RMSE of over 4 million which is not very good.

Neural Network

Using the keras library from tensorflow, I built a neural network to hopefully build a better model. Unfortunately, this was not the case.

These were the MSE and RMSE values the model gave me for each epoch it ran. As shown above, the RMSE and MSE are way worse than the values from the Voting Regressor Model. More Epochs can be ran in order to reduce the number, but it would only improve the metrics by small increments and would take a long time to catch up to the Voting Regressor

Conclusion

While advanced statistics and non traditional statistics are useful for many other reasons, predicting or assigning a player's salary based on those statistics is not one of them.

This is probably why most people use traditional statistics, like points per game, assists per game, 3 point percentage, to predict an NBA player's salary.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Predicting_NBA_Salaries.ipynb		Predicting_NBA_Salaries.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting-NBA-Salaries

What are NBA Advanced Statistics?

Data

EDA and Data Cleaning

Regression Models

Neural Network

Conclusion

About

Releases

Languages

yvan0831/Predicting-NBA-Salaries

Folders and files

Latest commit

History

Repository files navigation

Predicting-NBA-Salaries

What are NBA Advanced Statistics?

Data

EDA and Data Cleaning

Regression Models

Neural Network

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages