Snooker Statistics Follow Precise Mathematical Laws

A particular mathematical relationship known as a power law has been observed in many day-to-day situations, from word use frequencies in natural languages to the connectivity distribution in Facebook friendship networks. As it turns out, though, such a power law can also be found in snooker statistics. And if the amazing Ronnie O’Sullivan produces yet a few more centuries, the mathematical correspondence will be even better!

Power laws

In mathematics, a power function involves a given value, call it x, that is multiplied by itself a certain number of times. For example, x multiplied by itself three times would be “x times x times x“, written as “x·x·x“, where the dot (·) indicates multiplication. In mathematical terms this is called “x raised to the power three”, which is written as x3.

Common examples where such mathematical power functions are used are in the calculation of the area of a square that has sides of length x, which is equal to x2, or the volume of a cube with sides of length x, which is x3. So, a square with sides of length four has an area of 42=4·4=16, and a cube with sides of length two has a volume of 23=2·2·2=8.

Now, a power law is a mathematical relationship between two variables, say x and y, such that the value of one variable (e.g., y) is directly related to the value of the other variable (x) raised to a certain power, i.e., xa for some fixed value of a. This value a is called the exponent of the power law. For example, if we were to measure the volume (y) of many cubes with different lengths of their sides (x), we would find a perfectly fitting power law with an exponent a=3. The figure below illustrates this graphically.

The volume (y, vertical axis) of a cube against the length of its sides (x, horizontal axis). The open circles represent actual measurements on different cubes. The solid line represents the mathematical relationship y=x3, which forms a perfect fit to the observed data.

Power law relationships are found in many common systems and processes, both natural and man-made. For example, power laws are observed in the frequency distribution of the magnitudes of earthquakes, or the sizes of cities in a given country. They also show up in the distribution of connections in networks like Facebook friendships, movie star co-appearances, or the national electricity grid. And they occur in the frequencies of words used in natural languages, or even in computer code.

Finding power laws

In the example figure above, the data is plotted on a linear scale. In other words, each next point along an axis represents a value that is larger than the previous one by a fixed amount. For example, the marked points along the horizontal axis follow the linear sequence 2, 4, 6, 8, 10, where the fixed increase is equal to two.

However, it is also possible to plot the same data on a non-linear scale. An example of such a non-linear scale is a logarithmic scale, which is based on orders of magnitude rather than a linear increase. This means that the value represented by each next point along an axis is the value of the previous one multiplied by a fixed amount.

An example of this can be found in the way that the strength of an earthquake is indicated. A magnitude three (M3) earthquake releases a certain amount of energy, which translates into the amount of shaking we feel. However, a magnitude four (M4) earthquake releases an amount of energy that is ten times larger than an M3 earthquake. Similarly, an M5 earthquake is again ten times as strong as an M4 earthquake, and thus one hundred times stronger than an M3 earthquake.

The figure below shows the same data as in the previous figure, but using such a logarithmic scale on both axes, resulting in a so-called log-log plot. Mathematically, a logarithmic function is the inverse of a power function. As a consequence, a power law shows up as a straight line in a log-log plot, as the next figure illustrates.

The volume (y, vertical axis) of a cube against the length of its sides (x, horizontal axis) using a logarithmic scale on both axes. The data points (open circles) now fall along a straight line representing a power law with exponent a=3.

Unfortunately, most real-world data does not behave as nicely as the volume of a cube. In reality, there is always some noise in the data due to imprecise measurements, random fluctuations, or missing data. An example is given in the next figure, which shows a log-log plot of the frequencies of words used in English (as measured over a large collection of different texts) against their rank, where the most frequent word (“the”) has rank one, the second most frequent word (“be”) rank two, and so on.

As the figure shows, the data points do not fall along a straight line perfectly, but still reasonably well. The solid straight line represents a power law that best fits the given data, which can be calculated using a standard statistical technique known as a regression analysis. This best fit results in an exponent of a=-0.92. The exponent is negative in this case, as the word frequency decreases with increasing rank.

The frequency of words used in English (vertical axis) against their rank (horizontal axis) using a logarithmic scale on both axes. The data points (open circles) closely follow a straight line, although not exactly.

So, the frequency of a word as used in the English language is roughly proportional to its rank raised to the power a=-0.92. It may be difficult to imagine what it means to multiply a number by itself -0.92 times, but mathematically this is perfectly well defined. In other words, the exponent a in a power law does not always need to be a positive whole number.

Finally, the regression analysis that calculates the exponent that gives the best fit also provides a measure of accuracy, i.e., how closely the data falls along a straight line. In the case of the volumes of cubes, as we saw above, the accuracy (or “fit”) is obviously 100%, as the data points fall exactly on the line. However, for the word frequencies the accuracy is slightly less: 98%. Still pretty good!

As these example have shown, to find out whether a given data set follows a power law, we can simply present the data in a log-log plot, and calculate how closely it falls along a straight line. So let’s give this a try with some snooker statistics.

Power laws in snooker statistics

Consider the ranking list of all professional snooker players who have made at least 100 centuries throughout their career. If you are familiar with snooker, you will know that a century break is a score of at least 100 points within one visit to the table (i.e., without missing a shot). There are currently 64 players who have scored at least a “century of centuries”. The table below shows the ten highest ranked players according to this statistic, of course topped by the amazing Ronnie O’Sullivan with more than 940 centuries, and counting!

Rank Player Centuries
1 Ronnie O’Sullivan 944
2 Stephen Hendry 775
3 John Higgins 719
4 Neil Robertson 558
5 Judd Trump 528
6 Mark Selby 518
7 Ding Junhui 473
8 Marco Fu 470
9 Shaun Murphy 454
10 Mark Williams 427

Now, if we plot the number of career centuries of these 64 players against their rank in this list, but in a log-log plot, we get the result as shown in the next figure. The straight line represents a power law that gives the best fit to the given data, resulting in an exponent of a=-0.62.

The number of career centuries (vertical axis) against the rank (horizontal axis). The straight line is a fitted power law with exponent a=-0.62.

Note that, as with the word frequencies data, the fit it is not perfect, especially not at the top of the ranking (i.e., the data points in the top left of the plot). However, according to the regression analysis, the current fit still has an accuracy of 95%. And if Ronnie O’Sullivan produces a few more centuries, the fit will be even better!

Ronnie O’Sullivan in action at the snooker table. Image: DerHexer, CC-by-sa 4.0.

In a similar way, we can look at the number of ranking titles of each player. There are currently 25 players who have obtained at least three ranking titles throughout their career. The table below shows the ten highest ranked players according to this statistic.

Rank Player Titles
1 Stephen Hendry 36
2 Ronnie O’Sullivan 33
3 John Higgins 30
4 Steve Davis 28
5 Mark Williams 20
6 Mark Selby 14
7 Ding Junhui 13
8 Neil Robertson 13
9 Jimmy White 10
10 Peter Ebdon 9

If we plot the number of ranking titles of these 25 players against their rank in the list, again in a log-log plot, we get the result as shown in the figure below. The straight line once more represents a power law that gives the best fit to the data, resulting in an exponent of a=-0.96. The accuracy of the fit is slightly less in this case (92%), but still quite good. So the number of ranking titles also appears to follow a power law.

The number of titles (vertical axis) against the rank (horizontal axis). The straight line is a fitted power law.

Finally, if we add up the number of ranking titles of all players from the same country, we get the list shown in the following table.

Rank Country Titles
1 England 158
2 Scotland 78
3 Wales 35
4 China 14
5 Australia 13
6 Republic of Ireland 7
7 Northern Ireland 6
8 Hong Kong 3
9 Canada 3
10 Thailand 3

Plotting the number of titles per country against the rank in the list, as a log-log plot, gives the result as shown in the next figure. The straight line shows the power law that gives the best fit, with an exponent of a=-2.11. The accuracy is actually much better in this case: 96%.

The number of titles per country (vertical axis) against the rank (horizontal axis). The straight line is a fitted power law.

In conclusion, snooker statistics seem to follow mathematical power laws, just like natural languages, earthquake occurrences, and many types of natural and man-made networks. Why should this be so? Scientists are still arguing about the significance of the occurrence of power laws. On the one hand, there is no particular reason to expect such behavior in many of these situations. On the other hand, the phenomenon seems to be so common that perhaps it has little meaning after all. Either way, it is fascinating to see that snooker statistics do indeed follow precise mathematical laws with a high degree of accuracy. I’m curious to see if the upcoming world championship snooker, with its usual display of power shots, will also produce some more power laws.