I will predict the 2023 NBA MVP using Machine Learning

By far the best Machine Learning Model for predicting the NBA MVP

TheJK
8 min readSep 1, 2022
active nba players drawn minimalistic
Photo by Michael Weinstein on Dribbble

MVP Prediction 2023 — Results

Introduction

Besides the highest team success in the NBA, the NBA Champion, every basketball playing boy dreams to become MVP (Most Valuable Player) of the NBA and to be considered the best basketball player in the world for one year. In addition to the prestige and attention that this title carries, you get the honor of being named in a row with the greatest in the sport of basketball.
With a total of 6 MVP titles, Kareem Abdul-Jabbar is the sole record holder. At the same time, he is known for arguably the most unguardable shot in NBA history, the ‘Skyhook.’ No player before or after him mastered the move so perfectly.
With only one title behind, two absolute legends of the sport are enthroned. One of them is Bill Russell, arguably the most successful basketball player of all time. He became 11 NBA Champion in his 13 year career. An almost unbeatable record.
The other is the basketball GOAT (the Greatest of All Time) - LeBron James, just kidding - we are talking about Michael ‘Air’ Jordan. Few others have become such a gigantic brand as the player of the legendary Chicago Bulls with the number 23. He is a major factor in making basketball popular in America and has made the NBA the world’s most successful organization for the sport. Even decades after his career ended, fans and experts still talk about the dominance he showed on both ends of the court.

The most unguardable shot, the most successful basketball player of all time and the GOAT, the history of the NBA shows that the title stands for greatness. Recently, however, the voting process that determines the MVP has come under criticism. One of the big issues is voter fatigue.
The problem of voter fatigue has been quite the issue from fans and players alike. It has become a thing for voters to aim their vote at a player that has a solid narrative behind them rather than if they are the candidate with the best statistical resume. When the same exact player has won the award multiple years in a row, some voters get bored and want to let somebody else win it for a change, even if the player is still the best in the game. Consequently, the MVP is not just the player who played the best regular season, but rather the player who makes the NBA more interesting for fans, right?

I claim that is not correct. I’ll show you how I used Machine Learning to develop an emotionless MVP voting based purely on statistics. After reading this article, you will understand what are the secret ingredients that make an MVP and an all-time great basketball player.
Since a detailed explanation of all steps I took is impossible, the article focuses on the exciting data science part, a short evaluation of the model and the analysis of charts with a focus on the most important statistics. If you want to study the code and all results in detail, check out the GitHub Repository.

machine learning predictions for the 2018–2022 seasons
Machine Learning Predictions for the 2018–2022 Seasons

NBA Data & Machine Learning

Where did I get the data?

The source of my database is Basketball-Reference. On this great internet site you can find all important statistics around the NBA and its history, which I extracted by automated HTML parsing.

Python libraries: urllib, numpy, pandas, requests, bs4

Where do I store the data?

All data regarding a team and their regular season, as well as data of all NBA players, were aggregated and loaded into a database. Since the statistics were not fully tracked until the 80s and the data schema changes depending on the season, I decided to use a cloud based MongoDB database (document oriented database, NoSql).

Python libraries: urllib, numpy, pandas, pymongo, bs4

Definition of my Machine Learning Model?

The next paragraph is for the data geeks among us. The value to be predicted by the model was taken from the MVP-Voting-Table. The “Share” column represents the result of the MVP voting for the 2022 season. The final model is therefore a regression model that predicts this value and is evaluated against typical metrics such as R2, MAE and MSE.

Share = MVP Share Score = PtsWon/PtsMax

mvp-voting-table with mvp share score calculation
MVP-Voting-Table / Season: 2022

Python libraries: matplotlib, numpy, pandas, pymongo, seaborn, shap, sklearn, xgboost

What are my results?

To my delight, I managed to predict the last 5 NBA MVPs using an Tree based Regressor. In order to understand the decision-making process of the Regressor, it would be analyzed using the SHAP library. The two following charts show the results of the detailed model investigation. This is followed by three additional PowerBI charts that illustrate the overall picture of the results.

  1. SHAP Importance Plot
  2. SHAP Summary Dot Plot
  3. PowerBI Top15 Player-Seasons
  4. PowerBI Heat Table
  5. PowerBI Comparison Chart

The SHAP Importance Plot shows which statistics, also called features in the Machine Learning context, have the highest influence on the Regressor. It follows that the feature WS and WS/48(Win Shares{per 48 minutes/per game} is a player statistic which attempts to divvy up credit for team success to the individuals on the team) has the greatest influence, i.e. is most important. The third most important feature is OBPM(A box score that shows the offensive impact of a player compared to the league average).
And so on…

regressor analyse: most important features
SHAP Importance Plot

The SHAP Summary Dot Plot can be seen as a more detailed form of the first plot. The features are still sorted by their impact on the Regressor in descending order, but the size of a single feature value is more in focus.
The Regressor will predict a higher MVP Share Score if…

  • WS, WS/48, OBPM, PER, VORP, DWS, DBPM… have a high value
    [advanced player statistics that show the greatness of a player, for more detail click here, google or send a message]
  • Rk_Conference and Rk_Season have a low value
    [the team, in which the player plays, has finished the season very well low ranking value = many wins]
  • Overall have a high value
    [the ratio between wins and losses of a team, in which the player plays, is high many wins]
  • FG, FTA, 2P, TOV, PTS… have a high value
    [basic player statistics that show the greatness of a player, for more detail click here, google or send a message]
regressor analyse: most important features in detail
SHAP Summary Dot Plot

The PowerBI Top15 Player-Seasons shows the Regressor’s predictions for all NBA player seasons from 2018 to 2022. According to the Regressor, James Harden has played the best season in the last 5 years. It is also worth noting that among the top 6 best individual seasons are the 5 seasons that were awarded the MVP. This chart clearly shows that the model is very good at distinguishing MVP worthy seasons from slightly worse but still exceptionally good seasons.

regressor analyse: top 15 mvp share scores in the last 5 seasons
PowerBI Top15 Player-Seasons

The PowerBI Heat Table (season: 2022) shows the top 10 players sorted by predicted MVP Share Score. In addition, the 10 most important features are color-coded according to their size. The combination of SHAP Summary Dot Plot and Heat Table illustrates the nuances that differentiate the Nikola Jokić (predicted and true MVP) from the other players.

regressor analyse: mvp share score and features in detail
PowerBI Heat Table

The PowerBI Comparison Chart (season: 2022) now compares the calculated real MVP Share Score with the values predicted by the model. On the one hand, it is noticeable that the predicted values for the top 3 players do not come close to the real values of the vote. On the other hand, the players who received only a few votes in the vote perform significantly better in the prediction by the model. This can all be explained by the voting system. Since I don’t want to explain the whole voting system here, here is just a short summary.

Statistically, many players have a very good season. So they should be closer in the MVP voting if only these statistics were taken into account. But the voting system works differently. It means that a player who was a voter’s MVP gets significantly more points than a player who only finishes 3rd. In reality, there is minimal difference between the places — but the points awarded speak a different language.

Nevertheless, the model can correctly determine the MVP and usually also the top 3 players of a season.

regressor analyse: real mvp share score vs predicted mvp share score
PowerBI Comparison Chart

Conclusion

Voter fatigue is what makes MVP voting untrustworthy for many people. However, the approach of feeding an emotionless machine with data in the form of statistics and obtaining predictions for the MVP proves the opposite. The MVP is always the player who provides the best statistics and plays the most efficient basketball. The credibility of the machine can be seen especially in the PowerBI Top15 Player-Seasons chart. Each MVP had simply played the best NBA season depending on the year. Even James Harden couldn’t beat Giannis Antetokounmpo with his outstanding performance in the 2019 season.
The final result is: Using Machine Learning, all 5 NBA MVPs of the last 5 seasons could be predicted correctly. This track record makes me absolutely confident to correctly predict next year’s NBA MVP as well. In the end, only one question remains.

What is the secret formula to become NBA MVP?

The analysis can be roughly summarized in 4 formula components.

  1. The team plays much better basketball with the player than without him.
    [WS, WS/48]
  2. The player must play an absolute elite regular season with enough playing time (+40 games). Above all, offensively, he must deliver outstanding performance, but a solid defensive performance is also important.
    [advanced & basic player statistics]
  3. The team must have a good spirit and play a good regular season.
    [Rk_Conference, Rk_Season, Overall]
  4. The player must be the undisputed go-to-guy in the team. (It is someone who can get you that basket your team needs in the game’s closing seconds when every eye in the arena is focused on them)
    [explained by the basis of the advanced & basic player statistics]

Feedback & Questions

If you have any feedback, please check out my email address on my Website or connect with LinkedIn. If you want to study the code and all results in detail, check out the GitHub Repository.

--

--

TheJK

Data Engineer/Analyst from Germay. I enjoy programming and publishing my side projects. Found to be bulletproof by JxmyHighroller.