NBA Hall of Fame Probability Analysis (Linear Regression in R)

  • by

Psst – You can find the RMD version with complete R code my BitBucket repository.

The Naismith Memorial Basketball Hall of Fame is the highest career honor a player can achieve after retirement, with roughly 180 players currently inducted as of 2019. In order for a player to be eligible for the Hall of Fame, they must be retired from the game of basketball for a full 3 years.

Some players are clear locks to be inducted into the Hall well before retirement or their 3 year waiting period comes to an end such as Kobe Bryant and LeBron James who are not yet eligible and inducted at time of writing. There are no serious debates about the fate of players of this caliber, but when it comes to other star players that never quite rose to that level of superstardom or had a shorter peak, things get interesting.

Basketball-reference.com has a well known Hall of Fame probability page and I decided to try my own hand at ranking NBA’s past and current players on their Hall probability. Their formula uses the following 5 predictor variables –

  • Height
  • NBA Championships
  • NBA Leaderboard Points
  • NBA Peak Win Shares
  • All-Star Game Selections

A little explanation on some of these – height has a negative coefficient, so shorter players have better odds at making the Hall. Leaderboard points are calculated by seeing who was the top 10 in statistical categories for a given year (points, rebounds, assists, minutes played, steals, and blocks).

For my datasets, I will focus more on career totals and accolades. I’ll be using ~2000 NBA players that have been drafted into the NBA from the year 1970 forward. ABA awards such as championships and All-ABA selection have been merged into their respective NBA categories for the sake of simplicity. Here are the most important ones to note –

  • Games played
  • Minutes played
  • Total Points/Rebounds/Assists
  • Career Field Goal %
  • Career 3PT %
  • Career Free Throw %
  • Career win shares
  • Career box plus/minus
  • Career value over replacement
  • All Star Selections
  • All Rookie Selection
  • All Star MVP
  • Scoring/Assist/Block/Steal/Rebound Champion (most points/etc for a single season)
  • All NBA Selections
  • NBA Championships
  • All Defensive Selections
  • Rookie of the Year Award
  • Finals MVP
  • Sixth Man Award
  • Defensive Player of the Year
  • Most Improved Award

These numbers will only count stats accumulated in the NBA. No international tournaments or overseas accomplishments will factor in. This is less of a problem today as most elite players come to the NBA to start their career, but in previous decades this was far from the case. All statistics used were gathered from Basketball-reference.

A New Hall of Fame Model

After working with the data (full process disclosed in this RMD), I’ve come up with the following logistical regression model –

This model depends more upon career accolades than the Basketball Reference model. Let’s see how this model rates the probability of players already in the Hall.

As we can see, most players in the Hall of Fame have a probability greater than 90% which is to be expected since we built the model on this data. There’s a small cluster of players on the low-end of the probability spectrum which is largely due to international players who didn’t play much in the NBA.

Current Hall of Fame Snubs

Our model finds 6 players that are currently Hall of Fame eligible that are not inducted that have a probability greater than 0.5 –

The models vary quite a bit on most of these players. Interesting both of the shorter players (Billups and Hardaway) my model has the lesser probability, whereas everyone else who is taller my model has a much greater probability. BBref’s height coefficient must be significant!

Projecting Future Hall of Famers

Here’s a list of players not currently eligible for the Hall of Fame, but that have a probability greater than 0.5 from my model –

Here we see some pretty stark differences between the two models. I think one trend in the differences between the two is my model is rewarding younger players a lot more the BBref model. Since they take cumulative leaderboard points into account, it takes a player more years of playing to climb the ladder.

My model on the other hand rewards players that get more accolades in less time as there in a negative coefficient on minutes played. For example, a player making 3 All-NBA teams in 6 seasons played (Giannis) is more impressive than someone making 4 All-NBA teams in 15 seasons played.

Final Thoughts

After analyzing and comparing my probabilities with BBref’s, I think there are some things to like and dislike about my model. I think BBref’s model does a better job at capturing “if this player were to stop playing tomorrow, what are the odds” whereas my model seems more like a projection of Hall of Fame probability due to the minutes played parameter.

Leave a Reply

Your email address will not be published. Required fields are marked *