It’s been a while, but the first release of the NHL’s player and puck tracking data has everybody weighing in, so I thought I’d give my thoughts. Most of what I’ve read so far has been about setting expectations for the new data or hopes for future releases. This post, however, will be an examination of the data as it currently exists. This post was originally intended to be a quick overview of the NHL Edge data but expanded to be quite a bit longer than expected, and there’s still more to dig into in future articles, which I note throughout. Let’s get started!
Methodology
Additional Data
In addition to the NHL Edge data, in this post, I also use Natural Stat Trick’s season-level team data for the 2021-22 and 2022-23 seasons for some analysis.
Scraping the Data
Unlike the play-by-play data, the NHL has not made the new Edge data available through an API. Instead, the only way to get a full table of the published Edge data is to go to each page on the Edge website one at a time and pull it that way.
Along the way, I discovered a few interesting things about the site design. First, each team and player season has its own URL. Second, the HTML for the webpage seems to be populated dynamically, meaning making an HTTP GET request does not return any of the data. Furthermore, the website somehow does not use Ajax to query the backing database, so I was unable to mimic visiting the website dynamically. I do not know enough about modern webpage design to guess the framework they use, or if this is meant to discourage scraping the data, but it was discouraging for me. All of this is to say, scraping the website requires physically visiting each webpage in a browser. It is possible to automate this process but scraping the data was still time-consuming.
One note of interest for anyone else interested in scraping the data is that the game-by-game charts for distance contain all the data for the chart in JSON format as an HTML attribute, meaning it is relatively easy to get game-by-game distance data. Similarly, the zone time charts contain zone time percentages several decimal places further than the table to the left does.
Data Quality
It wouldn’t be the NHL if there weren’t some data quality issues. Looking at a histogram of the total distance skated by each team in every game, two things stand out:
First and most obvious, is the stack of games with zero distance skated (46 total). Clearly, these are games where the NHL Edge player tracking technology was not working, therefore, I will exclude the TOI from these games when calculating skating speed and rate statistics.
The second noticeable thing is the small group of games around 30 miles skated (28 games between 28 and 32 miles skated). The vast majority of games fall between 40 and 50 miles skated. It is also notable that almost no games fall between 32 miles and 37 miles (7 total). Furthermore, the mean of this group is around 30, roughly two-thirds of the mean of the main distribution. This leads me to believe that during these games, the player tracking technology malfunctioned for a single period. Therefore, I will also exclude the TOI from these games with a distance of less than 37 miles (somewhat arbitrary, but roughly halfway between the two groups) when calculating skating speed and rate statistics.
With these games removed, the distribution now looks like this:
Much better! There is still a small outlier group around 37 miles skated, but it is possible that is due to very high penalty games (both sides are slower on special teams than at even strength, as we will see). Either way, the number of games with potential data issues is now small enough, that I doubt it will meaningfully influence any conclusions we can draw from the data.
One final note, it appears the NHL Edge includes goalies in the team TOI number (a typical game has roughly six hours of ice time). This means that some statistics, such as average speed are skewed by having a player who is mostly stationary throughout the game. To get around this, I calculated TOI by summing each player’s TOI for each strength state for each game.
Now to find some insights!
Skating Speed
As published, the only information under the Skating Speed section on the NHL Edge portal is the number of “bursts” at various speeds for all strength states. This means that any analysis of “bursts” is limited to all situations stats, which is not ideal, but that is what is available.
To analyze bursts, I converted the total numbers from the Edge website to rate statistics, giving me the average number of bursts from a team per hour. One difference between my analysis and the NHL’s is that instead of binning my bursts, look at the total number above each MPH cutoff. That means 22+ MPH bursts are included in my 20+ MPH burst rates, and 22+ and 20-22 MPH bursts are included in my 18+ MPH rates. Looking at a histogram of the distributions for each, speed cutoff, the patterns are similar with a few especially fast teams standing out and the rest of the league grouping around a slower mode.
It should come as no surprise the two teams with the highest burst rates for all speeds are the 2021-22 Colorado Avalance and the 2022-2023 Colorado Avalance with both season’s Edmonton Oilers making up third and fourth for each cutoff. The 2021-22 Avalane in particular was an extremely hard-skating team. Additionally, although the NHL is afraid to call out any teams for being slow, we are not! Things are a bit more jumbled at the bottom than at the top, but the Nashville Predators, Arizona Coyotes, and Detroit Red Wings for both seasons consistently make up the bottom of the list for each cutoff. Noticing that the top teams and bottom teams are consistent, I decided to look at year-to-year repeatability.
Not surprisingly, all cutoffs had a strong year-to-year correlation, with the 20+ MPH burst rate having the highest R-squared at 0.79. Some potential causes of changes in burst rates are roster turnover, coaching personnel or style changes, and players getting older. I hope to explore all of these in a future article.
Skating Distance
The NHL’s skating distance data is the most comprehensive by far. As mentioned previously, the bar charts contain game-by-game distances broken down by strength state, making this the best potential area for meaningful analysis. I intend to follow this post up with a much more in-depth exploration of what the skating distance data can tell us, but for now, I will limit my analysis to the same type of preliminary exploration performed on the skating speed data. The most useful way of analyzing distance skated to me seems to be translating distance skated into average MPH for each player by dividing the total distance skated by the total time on ice in hours.
Since the NHL was kind enough to break the distance data up by strength state, I could compare the distributions of average team speed by strength state.
Here several items of interest become apparent. First, average speed at even strength is much faster than either special teams unit. In fact, even the slowest team at even strength is faster than the fastest special teams unit. This makes sense intuitively since special teams often feature a fair amount of relatively stationary play, either once a power play unit has been set up in a zone and both the power play or as the penalty kill awaits a re-entry after a successful kill. This extra waiting in position also explains why on average power plays have a higher average speed than penalty kills, as they have to go all the way down the ice, and then bring the puck back up after a kill. However, the fastest penalty kills are faster than the median power play. This absolutely deserves its own article, but a quick glance at the leaderboard for penalty kill speed also suggests faster penalty kills are more effective with the top teams including Carolina and Calgary.
Furthermore, the distribution of speed at even strength is much tighter than the penalty kill or power play distributions. This seems to indicate to me that there is more variation in power play and penalty kill playing style, which supports the idea that coaching plays a larger role on special teams than at even strength. Nevertheless, this also merits further investigation into coaching effects, which is possible due to the game-by-game data.
Additional areas I plan to investigate in the future include how rest and travel affect team speed, how team speed trends throughout the season, and how team speed correlates to the shot metrics already available to the public.
Next up is year-to-year repeatability for average speed at each strength state.
Interestingly, power play skating speed is the most repeatable year-to-year, but penalty kill is the least. Beyond that, this data suggests that average team speed is less repeatable year-to-year than burst rate. The obvious next step is to compare burst rates to average speed. For this analysis, I switched to all situations average speed, since that is the format of the burst rate data.
I found these results very surprising. I am not sure if this reflects some energy conservation from high burst rate players, meaning they coast in between bursts, or if this is something else, but the higher the cutoff the less significant the relationship between burst rate and average speed becomes. 22+ MPH burst rate is essentially uncorrelated, boasting an R-squared value of just 0.004. I am very much looking forward to investigating whether this holds up at an individual level.
Shot Speed
Similar to bursts, shot speed is aggregated for the entire season. While I did not look at the distribution of fastest speeds for teams (because the fastest speed a single player reached on a single play is likely not useful from an analytical perspective), I did look at the distribution for shot speed. There are two reasons for this. First, shot speed in general seems to fall more into trivia than useful analytics. Second, the distributions together reveal something about the nature of top-end shot speeds.
Fully half of teams across the 2021-22 and 2022-23 seasons failed to record a single 100+ MPH shot. Meanwhile, three teams managed to record double-digit 100+ MPH shots. Those three teams are the New York Islanders in both seasons and the Buffalo Sabres.
Sneak peek for when I look at player-level data: all but one of the shots for those teams were taken by either Ryan Pulock or Tage Thompson.
Another point of interest is the similarity of the distribution for average speed and top speed. However, a glance at the leaderboards for each reveals no obvious connection. The 2021-22 New York Islanders grade out very well for top speed, thanks to Pulock, but come in last at an almost impossibly slow average shot attempt speed of 17.4 MPH (I suspect data collection issues are driving this number way, way down, but it is impossible to tell without more granular data). Meanwhile the 2022-23 Vancouver Canucks led the league substantially as the only team with an average shot speed of over 60 MPH, but topped out at 98.8 MPH.
Looking at the rates of shots above various cutoffs reveals a similar pattern to the rates of speed bursts by team. Namely, that the higher the cutoff, the more the outliers stand out. I suspect this is due to the shots being 90+ MPH being largely an individual skill, and having one or two more players who can reach that threshold greatly increases the frequency of a team’s overall rate relative to the rest of the league.
I wondered if the rate of shots above a certain level correlated to an ability to get shots through: are high-speed shots uncontested shots? To measure this I looked at the correlation between a team’s shot rate above each speed threshold and that team’s corsi, fenwick, shot, and goals for rates.
Correlations with overall shot rates get weaker as the speed threshold decreases, suggesting to me that the rate of high-speed shots is a separate skill from generating volume of shots. Also notable is the fact that fenwick per 60 has a slightly, but consistently, higher correlation with each speed cutoff than corsi per 60. In other words, the correlation is stronger with unblocked attempts than with all attempts. This seems to validate my suspicion that high-speed shots are uncontested shots. Additionally, shots on goal have a lower correlation, suggesting to me that there is some loss of accuracy on high-speed shots. Lastly, in the 90+ MPH cutoff, the rate of attempts has its highest correlation with goals per 60. I interpret this to mean that in the highest bin, speed has a slight, but positive effect on the odds of a shot becoming a goal. This is one potential improvement tracking data can make to public expected goals models.
Shot Location
I opted not to look at shot location because frankly, with the way the NHL aggregated the data by just giving all strength shots on goal totals from 16 location bins, I cannot see how this is an improvement over the currently available play-by-play data, even with all the potential human error. This section is essentially useless compared to the much better charts available at hockeyviz.com. I plan to completely ignore this section in the future.
Zone Time
Lastly, and perhaps most interestingly, is zone time. Similar to skating distance, the NHL broke down zone time by strength state, so I will focus on even strength for this article. Unfortunately, the data is aggregated for the entire season, but this still allows us to look at one major item of interest: how well public possession metrics correlate with actual zone time. To examine this, I converted the the zone time percentages from the NHL edge website to an offensive zone time percentage, which is offensive zone time divided by total zone time spent in the offensive and defensive zones, similar to the existing OZ start percentage stat.
Corsi is a terrific stand-in for offensive zone time. An R-squared of over 0.75 is about as high as possible in a sport as random as hockey. Somewhat surprisingly, expected goals percentage is nearly as strong, with an R-squared of 0.69. This supports the idea that shot quality does. Goals for and standings points percentage are much less correlated with zone time, though the relationship is still very strong at roughly 0.5 R-squared each. However, the relationship between zone time and goals and standings points is less strong than corsi or expected goals. In a future post, I will look at which teams are the best at converting zone time into shot attempts and shot quality and which are the best at suppressing, but this article is already quite long, so I’ll wrap things up here.
Conclusion
The first NHL Edge data release is not going to upend any of the current ideas about NHL analytics. However, there is potential to add more nuance to analysis of processes explaining how certain teams or players achieve their results. In future articles, I hope to dig deeper into the data, especially average skating speed, to see what it can reveal about how teams play, how they achieve their results, and what skills are player ability and what is system or deployment-based.