Previously I’ve looked at ways to calculate the age and size for teams based on ice time and how teams deploy their players differently on special teams based on age and size. I think enough groundwork has been down now to start trying to answer the important question: does any of this help teams win? I’ll begin by looking at each strength state for the regular season focusing on:
Shot quality
Shot quantity
Expected goal (xG) differential
Win percentage
Standings points percentage
Win and standings points percentages are self-explanatory, but the rest require some.
Shot quality, as I’m defining it, is the xG value per unblocked shot (Fenwick) above the median for that season since unblocked shots are the basis of xG models, but xG models vary in accuracy from season to season (more on this in a later post). I chose median over mean because shot quality is not normally distributed. When looking at shot quality, I’m not interested in volume at all, since that is already accounted for in xG differential. I just want to see if a team was able to get better quality shots than they gave up when quality is isolated. Therefore, I subtract the quality against value from the quality for value. In other words, if a team averaged 0.1 xG (a ridiculous number) per Fenwick for and 0.05 xG (also a ridiculous number) per Fenwick against, their shot quality rating will be 0.5.
Shot quantity and expected goal differential follow a similar, but simpler method. For shot quality, I just take a team’s shot shot attempts (Corsi) for per 60 minutes above the median for that season and subtract their shot attempts against rate above the median for that season. If a team takes 10 more attempts per 60 minutes than the median but also gives up 10 more attempts, their shot quantity rating is zero.
I use Evolving Hockey’s expected goal model as the basis for each of these stats. I would like to note that I am aware of scorer bias and the effect it has had on Evolving Hockey’s xG model, however, I think it is negligible enough in the aggregate for what I’m trying to do here which is just a quick analysis of how age and size impact team quality.
I use the score and venue adjusted versions for all of these stats.
For each of the weighted averages I looked at in the previous post, I’ll look at the team’s difference relative to the average for that season, using the team and season level data we put together in the last post. The reason for this is that as the league’s composition changes, it makes more sense to me to look at the performance of a team relative to the teams they were actually playing against. I opted to use raw numbers instead of ranks for my analysis because if a team was unusually old, light, etc. I wanted to capture that.
OK, let’s get started!
All Situations
First, a quick explanation of the graph above, since there will be three more of these. Each square is the correlation between the biographical metric (y-axis) and on-ice metric (x-axis). The value is overlaid on each square, and the correlation is indicated with color as well. The yellower the square is, the more positive the correlation between the biographical metric and the on-ice metric; the more purple, the more negative the relationship. Greenish-blue means little to no relationship between x and y. The number of asterisks denote correlations that are significant for a p-value of less than 0.05, 0.01, and 0.001. A quick glance at this chart shows a lot of greenish-blue, a sign that the relationships are not particularly strong. Indeed, almost all the correlations are within 0.1 of having no correlation, with 0.1 being roughly the threshold for statistical significance for this analysis.
Age, however, does seem to indicate that players older teams slightly perform better across all the on-ice metrics except shot quality. This is extremely interesting to me, as the league has been trending younger in recent years and new conventional wisdom is that the NHL is a young player’s league.
At first, my assumption was that tanking/rebuilding teams may be dragging the young teams down, but that really old teams would also be dragged down by being old. This doesn’t appear to be the case though. Although there is a cluster of older teams hanging out in the lower right quadrants of the scatter plots, the oldest teams are actually in the upper right corners.
Looking closer, this is driven largely by five (2007-12) Detroit Red Wings teams that went to the finals twice, plus the 2020-21 Washington Capitals. Older teams also seem to be slightly better at converting underlying metrics into wins and points as the correlations for those metrics are a bit stronger than quantity and xG differential. Those teams, plus the 2016-17 San Jose Sharks team that went to the Final give us seven teams in the top ten oldest with points percentages above 60%.
At the other end of the age spectrum, only the 2008-09 Chicago Blackhawks were above 60% points percentage, and the ten youngest include rebuilding teams like the 2010-11 Edmonton Oilers and 2020-21’s New York Rangers and New Jersey Devils. Keeping in mind that the overall points percentage for NHL teams since 2007 is 55.8%, only three of the ten youngest teams were above average.
Overall, size seems to have a modest, but not statistically significant correlation with team performance whether measured by height, weight, or BMI.
Draft position also has virtually no impact. For fun, here’s a link to tables of the top/bottom 10 teams for height, weight, BMI, and draft position. Washington and Winnipeg are quite tall.
Even Strength
At even strength, the correlations are roughly the same, which is intuitive since the majority of ice time is at even strength and even strength personnel isn’t restricted to the same degree as special teams. One notable difference from all situations though is that height, weight, and BMI all have stronger relationships to shot quantity and xG. The difference is enough that several of them are significant at even strength when they weren’t in all situations.
Of the three newly significant metrics, weight has the strongest correlation the underlying metrics, and is the only metric that is significant with both shot quantity and xG. By focusing on net differentials instead of breaking up my analysis by offense and defense, I have made it difficult to parse out exactly why this would be, but I think there are two likely explanations. On the offensive side, it seems likely that heavier players would be able to push their way to more dangerous parts of the ice. On defense, heavier players may be able to block out lighter offensive players. I think the offensive explanation is more likely for reasons I’ll get into in a moment, but this is a topic that I’m interested in exploring more at a later time.
Power Play
Power play is probably the least interesting of the strength states, with almost every correlation being weaker than at even strength. The one significant correlations are between age and xG, which seems to be driven by shot quantity.
I interpret this to mean that coaches generally do a good job of selecting the most offensively talented players for the power play. Or they at least ignore size/age in vacuum when considering which players to put on the ice. Anecdotal evidence of coaches putting a big, largely ineffective player on the power play to act as a net front presence are rare enough in the aggregate that they don’t make much of a difference overall.
Penalty Kill
The penalty kill, however, is really interesting. Older, taller, and heavier players are all correlated with having worse underlying numbers on the penalty kill. As I found in my last post, these are exactly the types of players teams opt to use on their penalty kill units too. To me, this looks like teams are over-selecting for certain physical traits on the penalty kill, to the detriment of the unit. Granted the relationship is still pretty minor, but it is statistically significant for all three underlying metrics for both height and weight. Older players also have a statistically significant negative relationship with shot quality, the strongest correlation for shot quality for any biographical metric or strength state.
These findings are why I suspect that the relationship at even strength between size and underlying metrics is driven by offense, not defense. In a situation that essentially isolates defense, being larger and older seems to be a disadvantage. Granted, penalty kill play is different from even strength in a number of ways, so I still think it is worth eventually investigating whether the even strength differentials are driven by better offense, defense, or both.
Predictive Value
To test the predictive value of age and size for team performance, I built three simple linear regression models to predict points percentage for a team using the following predictor variables (all versus league average for that season):
Even strength age, even strength height, and even strength BMI
Even strength age and even strength weight
Even strength age and power play weight
I selected even strength age because it had the highest correlation by itself. For the first regression, I selected height and BMI because they were both significant, but not strongly correlated with each other, although both were strongly correlated with weight. For the second model I selected weight as a replacement for height and BMI because it was correlated with both and also statistically significantly correlated with points percentage. Finally, for the third model I used power play weight specifically since it was more strongly correlated with points percentage than even strength weight.
And the results? These are the adjusted R-squared values for each model:
0.06480
0.06727
0.06705
So, not nothing, but not very strong either. These models only explain about 6.5% of of the variance in a team’s points percentage from year to year.
A few things stand out. One is that the predicted values fall in a much smaller range than the actual values. This is pretty consistent with most predictive models in hockey since randomness plays a large role in results and can cause teams to over or underperform their predictions. It also stands to reasons that there is a lot of player level information that can still be used to explain team results. Finally, all three models are extremely similar. If they were to be overlaid, there would be some variation, but not much. That’s because all three use even strength age as a predictor and as the most significant predictor it influences the predicted output the most.
Looking at the QQ plots of the residuals, they look pretty close to normal. They diverge a bit at the tails, but otherwise are pretty close to a straight line. A sign that a linear model is appropriate for this process. The two outliers at the bottom were Detroit’s spectacular tank job in 2019-20 and Colorado’s nightmare season in 2016-17. At the upper end is Chicago’s ridiculous pace in the shortened 2012-13 season.
Conclusions and Next Steps
I think it’s fair to draw the conclusion that age has a moderate positive correlation with team success, though intuitively there’s a definite limit to how far out that relationship can be stretched. Size also appears to matter very slightly. The most surprising finding is that size appears to be negatively correlated to performance on the penalty kill.
Of course, this is the regular season and the playoffs are different. The next part of this series will look at the relationship between team size and team performance in the playoffs.
Fin.
A huge thank you goes to Evolving-Hockey for providing team, player, and even play-by-play data in a clean, easy-to-use format. When I did my last major project I was combing through the NHL play-by-play data up till 2017-18 and it was quite messy. Fortunately, Evolving-Hockey absolutely has the best quality of any of the major hockey stats sites I’ve looked at and it makes hobbyist work like this so much easier. For that reason, although I’ll use their data and make some of my data available, I won’t include anything from their site that can’t also be found on NHL.com or other free sites.