Thursday, February 25, 2010

On Corsi

This post is directed at a certain audience, but perhaps it'll be useful for others.

Corsi is becoming a bit more understood and mainstream these days, but there remains a lot of folks who are confused by it. This will be my attempt to clarify the stat. What it means and how it's useful in analysis.

What it is

Let's start with a simple definition. "Corsi" is the difference between all shots directed at net for and against at even strength. That is (shots+blocked shots+goals+missed shots FOR - SH+BLK+G+MS AGAINST). The purpose of the stat is to determine possession. It is, in fact, a proxy for "zone time". A positive corsi rate = more offensive zone time. Negative = more defensive zone time.

Here's an analogy that might help. Let's say a hockey game is a tug of war. Corsi is the how far right or left of center the rope is. On an individual level, it's an expression of which players are really pulling the rope. Therefore, if your team has a positive corsi rate, it means they are spending more time in the offensive zone at even stregth. It means they are pulling the rope harder than the opposition.

Why it's useful


In general, two things determine goals for (GF) and goals against (GA) in hockey: volume and frequency. Volume is the amount of shots a team generates and allows. Frequency is how often a team scores or allows goals on those shots. What we're learning in the NHL is that the former is far more repeatable and indicative of skill than the latter. Let's put it another way...

Goals are relatively random events in the game. On any given night, 60-80 pucks may be directed at the net at both ends. Maybe 5-10 goals will be scored. As a result, goals are statistically less powerful because the sample size is small. This means that randomness has a far greater influence. And what we've discovered at the NHL level is that percentages (SH% and SV%) tend to regress to the mean over the long term. As a result, a team that is winning via high frequencies is said to be "riding the percentages" and their success is probably based on randomness or "luck".

Another example. We all know that the chances of a flipped coin landing on heads is 50%. However, it's entirely possible that a coin will land on heads 7 or 8 times in a ten flip sample. This is not indicative of a special coin or special "coin flipping skill". It's variance. As such, we can say with confidence that over, say, 1000 flips, we'll get back down to the 50-50 split.

Volume, or outshooting (corsi) is far more powerful statistically, however, and therefore less skewed by randomness. So, whereas percentages tend to regress to the mean, outshooting is far more stable and therefore indicative of a team's (or players) abilities. The evidence of corsi's value is being investigated by smarter men than me these days, but the evidence continues to pile up. Corsi correlates strongly with scoring chances. It also correlates highly with outscoring (0.65) over the course of the season. From the latter link, JLikens explains that outshooting explains 40% of the variance in EV scoring. Almost half. That's regardless of of things like goaltending ability or the percentage of shots a team has blocked versus what they get on net. It also excludes randomness as we discussed above.

Corsi is a long range stat. A team can outshoot the bad guys in a single game or even a series of games and still lose. The hockey gods can be arbitrary. But, eventually, outshooting teams will win more than they lose. And the more time they spend in the offensive zone, the better they are, the more they'll win.

Evaluating individual players with corsi is a little trickier, because circumstances can elevate or sink skaters, depending. The checking center or shut down defender who starts every shift in his own zone against superstars is bound to have a lousy rate, for example. But that's probably a discussion for another time.

I hope this helped clarify things.