Predicting Number of Wins
I was bored today and ran a regression model (with a stepwise selection procedure) to see what team stats (season averages) were most associated with the number of NBA wins in a season. I used data from the 2008-2009 regular season.
Outcome: Number of Wins
Variables considered: FGA, FGM, FG%, 3PTA 3PTM, 3PT%, FTM, FTA, FT%, Total Rebounds, Offensive Rebounds, Defensive Rebounds, Total Rebounds, Turnovers, Steals, Assists, Personal Fouls, Points
Final Model:
Predicted # of Wins = -41.22 +2.71*FGM - 4.02*FGA +0.60*3PTA +1.64*3PT%+5.32*TOTREB +1.82*AST-5.48*TO +7.11*STL
This simple model fits the data quite well, maybe too well (R-squared = 0.98). My next step is to see what the model looks like using data from the previous 3-5 seasons.
Applying the current 2009-2010 season stats to the above model, the Timberwolves
will win 1.28 games! Ouch! For comparison, the Lakers will win 51.8 games.
The Wolves need a lot of help shooting the ball, dishing assists, and decreasing TOs. Steals look great!
5 comments
|
0 recs |
Do you like this story?
Comments
That's quite a model
Couple of questions/notes
if it predicts the Wolves to win 1.28 games, it clearly has problems for at least one extreme, which doesn’t necessarily mean it’s not useful, since many regression models break down at unlikely extremes, but it’s still worth figuring out why it breaks down like that. Perhaps you just need to add a constant for results that are below a certain number.
But it also might be a case where it just doesn’t work without a full season’s data; since you are essentially running an additive formula; when you say it “predicts” a team to win this many or that many games, it’s not clear what you mean. How do you take the data for this season that you have and plug it into the model such that it produces a win total for a full season? Are you just multiplying today’s results by the amount of season still left?
Another question: are you really using 3pt % in your formula? Every other number is a counting number (FGA, STL, etc,); when you throw in a percentage to this sort of thing it makes me nervous.
by Eric in Madison on Nov 20, 2009 6:50 PM CST reply actions
I think the Wolves will win more than 2 games this year, but looking at their team stats (current averages per game) they are pretty horrible. The model is attempting to predict the number of wins for a full season, based on the 2008-2009 season. There will certainly be outliers. I used this data:
Plugging in this season’s data into model assumes that the current team stats will remain constant through the season (not likely). 3 pt % remained significant in the model. I think it would be better to include data from 3-5 seasons to construct the model and also consider some other team stats.
Actually not that far off
While the total is on the low side, if the lineup did not change the rest of the season this team may not win ten games. If Love remained sidelined and Al takes longer to improve than expected they are a JV NBA squad.
I’m sticking with my original prediction of 22 wins.
Offensive Rebounds
I’m surprised that Offensive Rebounds didn’t make it into the formula, as I’ve always considered them an important factor. Did the model just not find them significant?

by 















