Revisiting Scorekeeper Bias and Score/Venue Adjustments for Hits
Does adjusting for rink and score/venue effects improve the signal in team hit rates?
This season’s NHL trade deadline is just three weeks away, which means playoff and Cup contenders are in the market for players that will help their rosters survive the intensity and grind of the playoffs. In practice, that has often meant getting bigger and tougher, i.e., adding players that hit more, much to the chagrin of the analytics community. But are hits a good proxy for physicality?
Hits are often treated with derision in the analytics community, but the obsession with hitting in the playoffs has persisted among front offices. Recent research has also suggested that there may be a shift in the game, with physicality becoming more important over the past half decade or so. As a statistic, hits are problematic though.
It’s been well established that hits are one of the most subjective stats in hockey, however, with hit counts varying widely between arenas. Michael Shuckers and Brian Macdonald examined this phenomenon in 2014, but for reasons that are unclear to me, rink/scorekeeper adjustments have not become common practice, though Micah Blake McCurdy does account for rink effects in his shot rate model. Additionally, adjusting for score and venue (home/away) effects have long been standard practice for shot metrics using a method developed by McCurdy, but to my knowledge there is no source for score and venue adjusted hits.
This piece is a prelude to a deeper dive into the relationship between physicality and playoff success. In this article, I compare the in-season predictive ability of adjusting for rink effects and two methods of adjusting for score and venue to unadjusted hit rates in an attempt to account for some of the subjectivity of the stat.
Methodology
I decided to test adjusting hit counts using both Shuckers and Macdonald’s method and McCurdy’s method, testing different combinations of adjustments against unadjusted hit rates.
Shuckers and Macdonald’s Method (Ridge Regression)
Shuckers and Macdonald describe a process that makes use of ridge regression to determine the scorekeeper bias for each rink. Their regression targets the natural log of the per 60 minute 5-on-5 hit rate for each team in a game with terms for the team, opponent, rink, per 60 minute home team score differential, an indicator for whether the team is the home team, and a “homer” effect for each rink.
For Shuckers and Macdonald’s method, I mostly followed the steps laid out in their paper, with two differences. First, I used the average score differential from the hitting team’s perspective instead of the home team’s perspective. My thought process on this was that if venue and score are independent variables, then the behavior of the hitting team is affected by their own score. Secondly, I used average 5-on-5 score differential instead of all situations, prorated to 60 minutes. In my view, if the target is 5-on-5 hit rate, then the score difference at 5-on-5 is the relevant predictor. This likely has a limited effect since the vast majority of most games is played at 5-on-5 anyway.
I ran a separate regression for each season, which yielded the following season-by-season adjustment coefficients for each team.
Notably, the 2023-24 season has a significantly smaller range of adjustment coefficients much more closely grouped around the median, indicating that the scorekeeper effect is reduced in that season. In my opinion, this is further evidence of the increased role of the player tracking technology in box score stats that I previously explored for shot locations.
For fun, I’ve included the season-by-season rink adjustments for each team below. An adjustment of over 100% means that hits are undercounted in that rink, while an adjustment less than 100% means that hits are overcounted.
Because I ran a separate regression for each season, each season has its own adjustment coefficient for score and venue (home/away). If we chart these by season, we can get a look at how score and venue effects vary year-to-year to determine if there are trends or if they are relatively stable.
The score coefficient is fairly steady across all seasons, hovering around 103%. This is an encouraging sign, in my view, that score effects are consistent. I feel comfortable simply averaging the coefficients for all seasons and using 1.03 raised to hitting team’s goal difference as a score adjustment.1 I set a cutoff for the score difference of +/-3 as scores gaps larger than that are rare and the effect does not continue to compound neatly beyond that.
The venue effect is different though. From 2009-10 to 2013-14, the effect was about 93%, but there’s a noticeable jump from 2014-15 onwards with the home adjustment staying closer to 96%. The jump is large enough and the coefficients are stable enough before and after that I decided to use the average for each period which resulted in adjustments of 0.928 for 2009-14 and 0.962 for 2014-24.
HockeyViz Method
McCurdy’s method is much simpler to calculate. Essentially, for a given score state the adjustment is the total number of events divided by two divided by the total events for the home or away team. Similar to the ridge regression derived score and venue effects, I looked at each season’s coefficient first to determine if there were any significant trends. Below are the score/venue coefficients for the home team. Note that the home adjustments appear much smaller than the regression coefficients. That’s because the regression coefficients assume the away team has a coefficient of 1, while McCurdy’s method assigns a coefficient greater than 1 in a neutral score state.
Although it’s a little difficult to see on this chart, the home coefficient in tied games (no score effect) follows the same pattern as the venue effect from the regression with the 2009-10 to 2013-14 seasons displaying a notably stronger effect. The effect is easier to see in the up by one coefficient, which is below 1 ever season prior to 2014-15 and above one ever season after, with the exception of 2018-19. Similar to the regression-derived coefficients, I decided to calculate two sets of score and venue adjustments, one covering the 2009-14 seasons and the other covering the 2014-24 seasons, which are listed below.
Results
To test the predictive value of each set of adjustments, I followed a similar testing method to the one outlined in McCurdy’s original score adjustment method. For each team game, I calculated raw and adjusted per 60 minute hit rates using each adjustment individually and the combination of rink adjustment with each score/venue adjustment. Next for each team season, I took a sample of 40 regular season games, divided that subset into two further subsets of 20 games (subset A and subset B), and calculated the total hit rate for each subset of 20 games. I repeated this process 1,000 times for each team season and took the correlation of all pairs of subsets.
I repeated this exercise using a split of five regular season games predictive 35 regular season games and 36 regular season games predicting 4 playoff games (the largest possible sample to include every playoff team in a given season) because the ultimate focus of this exercise is the playoffs.
Every adjustment method outperforms raw hit rates, with the combination of rink effects with regression-derived score effects displaying the strongest correlation to playoff hit rates. Within the regular season, the combination of rink effects with the HockeyViz score/venue adjustment is the most predictive, though the difference in score/venue adjustments is minimal, with only a 0.002 difference in R-squared between the two. The HockeyViz method performs impressively close to the regression derived score/venue adjustments within the regular season, slightly outperforming them, a reminder that more complex methods often meet encounter diminishing returns.
Although the R-squared value for predicting playoff hit rates is significantly lower than the regular season, the fact that every adjustment outperforms the raw totals suggests that the adjustments are successful at removing some of the noise contained in raw counts.
Conclusion
Adjusting for rink effects and score and venue improve the autocorrelation for team hit rates both within the regular season and in the playoffs, though there is still quite a bit of noise comparing the regular season to the playoffs. Next time, I’ll look at how much of the change in play style between the regular season and the playoffs is simply due to which teams make the post season and how much is due to teams changing the way they play before wrapping the series up with an in-depth look at how much of an impact physicality has on outcomes in the playoffs.
Meaning if a team is leading by 2, their score adjustment is 1.03^2 or about 1.06. If they trail by 2, their score adjustment is 1.03^-2 or about 0.94. A tied game has an adjustment of 1.03^0 or 1, i.e., no effect