The 2014/15 overall winner of Fantasy Premier League, Simon March, discusses the spurious correlations between data sets that we FPL managers often pounce upon.
There’s an early episode of the TV show Cheers where Sam Malone, the lovable lothario bar owner, is given a tip by an MIT professor that, whenever the Van Allen radiation belt is in a state of flux, the Boston Celtics always lose so, he suggests, Sam should bet on their opponents in that night’s game.
Although (spoiler) Sam does actually win the bet, this storyline is a good demonstration of the ‘correlation/causation fallacy’ by which, essentially, we spot a correlation between two sets of data and assume (often incorrectly) one must therefore be the cause of the other.
This fallacy manifests itself all the time in the FPL community and often to our detriment. This article will look at why this fallacy exists, how it affects our FPL decisions and how we can try to avoid it.
Correlation Does Not Equal Causation
If you are among that rare breed of FPL manager who is still invited to parties, a fun topic of conversation might be the spurious correlations between data sets. For example, for a decade between 1999 and 2009, the number of movies the actor Nicholas Cage appeared in each year corresponded almost perfectly to the number of people who drowned in puddles when plotted on a graph. During this same period, the number of people who died after being wrapped in their bedsheets fluctuated precisely alongside the per capita cheese consumption in the USA.
That, while reading these correlations, there’s a good chance that you instinctively tried to come up with a reason why these things might actually be related explains a lot about why the correlation/causation fallacy exists. We humans have evolved to quickly spot patterns as a survival mechanism; ‘more lions on the plain than normal? Probably time to move on’. ‘Somebody dies every time we eat these berries, maybe we shouldn’t eat them any more’, and so on.
The problem comes when we try to apply this basic, heuristic-based model of pattern-matching to more complex subjects, such as trying to predict who might score in the next Gameweek, because there’s a good chance it will be insufficiently accurate to be reliable.
As I said, we do this all the time in FPL; ‘Suarez always hauls against Norwich’, ‘Sterling never scores against Man United’, ‘Kane can’t score without Dembele in midfield’, ‘Callum Wilson never scores when he’s in my team’, the list goes on. While many perceived correlations are eventually debunked, many persist for a long time and are acted upon.
The causal direction between two data sets can also throw us off. For example, in Joe’s ‘Meet the Manager’ series, we consistently see that elite FPL managers tend to take fewer hits than average, thus, we might draw the conclusion that doing well in FPL requires being stingy with hits. Of course, this may be the case but it might also be the case that elite FPL managers take fewer hits because they are elite FPL managers, their skill and planning abilities (or persistent luck) mean that they don’t need to.
None of this is to say that none of these correlations exist, I mean, Suarez really did seem to always haul against Norwich. What we should be mindful of, however, is that we are almost always dealing with very small sample sizes, small enough for any correlation to be simply a matter of coincidence. Had Suarez played Norwich twice a season for 10 years, he might still have scored well (he is, after all, Suarez and they are, after all, Norwich) but his returns would almost certainly have normalised into something less dramatic than they were after just a handful of meetings.
The key lesson from all of this is not to be casual about causality and to really question why an apparent relationship between data sets might exist. It is also important to try and be objective about how robust that relationship might be when it comes to determining the cause of something happening. The more vague the relationship, the more sceptical we should be in trusting its predictive value.
Equally, we should be wary that the relationship might exist, but its causal direction might be the opposite to what you think. This alone would likely be very significant in determining the veracity of any data-driven strategy.
As mentioned, a big factor in determining how meaningful the relationship between data sets might be is how much data there actually is available. The classic rule of thumb is that, the more data there is available, the more reliable an indicator of any trends it is likely to be. However, there is rarely any substitute for good ol’ common sense and, if you can’t think of a sensible reason why ‘X’ would realistically cause ‘B’, then chances are it probably didn’t.
As always, this is easier said than done as these spurious correlations can be quite pervasive and, indeed, persuasive. Spurs failed to win in Gareth Bale’s first 24 appearances and there actually have been over 20 high-profile deaths on or around the days on which Aaron Ramsey has scored yet, in a large-enough data set, each of these spurious correlations dissolved, as will most of the ones we use to dictate our FPL strategies. Thus, while they’re sometimes interesting to point out or debate, vague or superficial relationships should never be relied upon as a legitimate basis for FPL decision-making.
3 years, 3 days ago
Any users car to post their FH team to give me a headstart on mine?
thanks.