Skip to the content

Talking NBA Point Shaving with Jonathan Gibbs

July 30, 2007 1:11 PM

As we just discussed here on TrueHoop, Jonathan Gibbs wrote his senior thesis at Stanford about point shaving in the NBA. Through rigorous research that has been praised by some respected experts in the field, Gibbs demonstrated that over the last 14 years, NBA scores have been consistent with the idea that there is a small amount of point shaving. 

I asked him to explain his work, which he did in an email conversation. (Academic language alert!): 

I guess my first question is what were you looking for, and what did you find?
I undertook this project in September 2006 as my senior honors economics thesis at Stanford. I was inspired by Justin Wolfers' paper regarding point shaving in college basketball and wanted to examine similar data for the NBA. It was my goal to empirically study past NBA betting lines and game results looking for indications of possible point shaving. To do this, I created a dataset containing data over the past 14 NBA seasons. Analyzing that data in a similar style to that of Wolfers, yielded results consistent to what one would expect from point shaving: teams heavily favored fail to cover the spread a statistically significant amount of the time.

In pushing beyond what Wolfers did for NCAA basketball, I created a second data set utilizing game data from the 2001-2007 NBA seasons. My thinking was that the easiest time for a player, coach or referee to influence the amount by which a team wins is in the last five minutes of the game.

Therefore, I collected the score differential for each game, with 5, 4, 3, 2, and 1 minutes to go. Utilizing this information, I analyzed the data and found that when controlling for the score at the end of the game, the favored team's probability of covering the spread, decreases as the spread increases.

Additionally, heavily favored teams are expected to exceed the spread by a greater amount relative to more evenly matched teams.

Therefore, the data suggests a trade-off in the betting market: heavily favored teams are less likely to cover the spread, but when they do, they are likely to exceed the spread by a larger amount. This is consistent with the notion of point shaving.

As a huge basketball fan, it was my hope that I would find nothing. However, the data said otherwise.

Wow, that's amazing. This might not be a fair question, but how many games per season are statistically fishy? Would three or four a season be enough to skew the data in the manner you found? 15? 100? Also, of course the NBA is going to attack the credibility of these numbers. What assurances can you offer that these numbers are solid? Are they, in some manner, auditable?
First, I must say that I did not create an explicit structural model of point shaving's incidence, which would also have to include failed attempts to shave points and over shaves resulting in losses. That being said, the data suggests that across the fourteen-year sample approximately five games per year were influenced by point shaving. It was the volume of data used, 15,859 games, that allowed for so few games to be a statistically significant anomaly.

As for assurances that these numbers are solid, I must say that as a basketball fan, I hope I am wrong.

As for their credibility, this project was undertaken as a senior thesis. It has not been published, nor peer-reviewed beyond my advisor and the professors I had read it. I am in the process of cutting it down so that I can submit it, in the hopes that it will be published, but that is far from a guarantee. I am more than willing to share my work with anyone, in fact I hope I can get useful comments to improve it. Nonetheless, I feel that the work I did is solid, particularly for a 22 year old recent graduate.

What do you say to those who will argue that when one team is way ahead in the closing minutes, what happens on the court is totally meaningless, especially for the team that is leading. It might just not generate ANY meaningful data, right? Is it a shocker that the leading team might consistently underperform expectations in that part of the game?
Basically, you're talking about the issue of garbage time. There are lots of justifications for why the leading team might consistently underperform at the end of the game. For example, it seems natural that when one team is winning by a lot, and the game is about to end, that they would not feel the same sense of urgency as the team that is losing. Similarly, they may substitute out their starters earlier, or some other event may occur giving the losing team a brief advantage. I do not disagree with any of those ideas, however, I do have several responses to that avenue of discussion.

First, since garbage time is a game phenomenon, it should exist in a measurable quantity. Second, it is not exactly a revolutionary idea, and thus one would think that NBA fans, and especially gamblers on the NBA, would know about it. Put those two notions together, and my expectation is that the NBA point spread gambling market should be able to correct for it. As in, if one team is actually 15 points better than the other, but because everyone knows that if a team is up by 15 with one minute to go, they can only be expected to win by 13, then that is where the betting line should be set.

Now, aside from that expectation, which may or may not be true, the data suggests in my analysis that garbage time may be having a significant impact.

Point spread betting markets mirror as prediction markets, meaning that the point spread is the most efficient predictor of a game outcome resulting from publicly and privately held information being turned into prices. Thus, the point spread should be strongly correlated with each game's final outcome.

However, the point spread should be having zero causative effects, as its presence or non-presence should have no affect on the game's result. This is where the garbage time data becomes interesting. What the data shows is that games of large spread tend to end significantly more often just below the point spread relative to just above, looking like a misshapen bell curve. This means that if the point spread is 12, games end up at 10 and 11 more often than 13 and 14, while if the spread is 15, games end up at 13 and 14 more often than 16 and 17. Since the data is showing the margin relative to the point spread instead of the actual game margin, this information speaks to one side of the point spread's relative value when compared with the other.

The regression analyses performed on this data back up this information by suggesting a trade off within the betting market. When controlling for the play during the last five minutes of the game, the greater the point spread is, the smaller the probability of covering that point spread. However, when controlling for the play during the last five minutes of the game, games of the largest point spreads are expected to exceed the point spread by the largest amount. This implies that in large spread games, the favored team either destroys the betting line, or just barely dips below it. This suggests that the betting market is trying to equilibrate the disparate levels of talent between the two teams while accounting for the possibility of point shaving.

Thus, while one would expect garbage time to be seemingly random or slightly shaded toward the underdog, it is the responsiveness of the final game score to the invisible line drawn by the point spread that is suggestive of point shaving.

I know you are an NBA fan. Have you watched any of those heavy favorite games? Have you ever seen anything fishy in the closing minutes? (I realize we're outside your area of expertise here.) And do you have any suggestions about how the NBA might try to identify point shavers in the future?
First, games with large point spreads are most likely between good teams and not so good teams, not exactly the games most likely to be on TV. However, even if they were, one game can tell you very little. Sometimes, a turnover is just a turnover. (However, I feel slacking on defense, as in getting caught up in a pick, or not running out on a shooter, would be a much easier way to shave points.) The reason the data is able to provide any insight is because small systematic patterns can be detected over a large enough sample size.

This same strength of the data in being able to find biases over large samples is also its weakness, as it makes it quite difficult to find individual perpetrators. If a team won 5 straight games but failed to cover the spread, would that be enough? 10? Fishy perhaps, but not proof by a long shot. Even then, distinguishing between a player, coach, or referee would be quite difficult. This whole drama has made it clear that I was looking in the area where coaches and players have the greatest control. It seems referees can most easily affect the over/under, which I did not study. However, within that arena its quite a difficult problem to meter out who may be doing what.

League-Wide Issues, Basketball History, Tim Donaghy

Sort comments by: Most Recent | First Posted