Last year I found myself ranked almost 200k in December and decided to change my approach to the game for the remainder of the season.
Focusing more on underlying stats to inform my transfer choices, I finished 2018-19 with an overall rank inside 9k.
After that success, I decided to explore the available stats more deeply as a tool to help in my player selection, delving deep into the members area section.
The aim of my investigation was to attempt to find a set of stats that would better predict returns (goals or assists) than the commonly adopted techniques of “Total FPL Points” or “Form”.
I found the results were quite interesting so thought I’d share with the community.
The set-up
In any given gameweek, the top transfers-in are typically those players with decent fixtures who are either near the top of the “Total FPL Points” charts or have recently provided a large number of returns and are in “Form”. Sophisticated FPL players may also consider underlying statistics as a predictor of returns.
Members area data was grouped into 4 gameweeks at a time, which I felt gave meaningful data as it reduced the effect of one-off anomalies and showed trends in performance.
For each player in a given period, the “Total FPL points” was calculated as the cumulative FPL points for that player since the start of the season and the “Form” was calculated as the points over that period. The top 5 players in a period were ranked based on each of these measures. They were also ranked by a stats-based measure described below.
The stats-based measure
Underlying stats were selected based on their correlation with the target attributes (goals and assists – both from open play). I excluded penalties due to the unexpected impact of VAR, their infrequency and the fact they skew the xG and xA data. Clearly however, whether your player is a penalty taker should influence your final choice if deciding between a few players.
The most correlated statistics with goals scored from open play were: xG non-penalty, Touches – Penalty Box and ICT Threat.
The most correlated statistics with assists from open play were: xA open play, and ICT Creativity.
A simple function of these was created and each player in a period was ranked based on this measure
Population Grouping
Initial analysis on the full population of players showed clear factors in variability of prediction success to be 1) position, 2) next period fixture difficulty, 3) player cost, 4) time played.
To assess fixture difficulty, another separate model was built based on xG On Target Conceded, xG Conceded, Open Play Goal Attempts Conceded, ICT Influence, Goals Conceded, Goal Attempts In Box Conceded and Big Chances Conceded. For each period, players with the top 25% most difficult fixtures in the next period were excluded from selection under all three approaches (Total FPL Points, Form and stats-based).
A further exclusion filter was added for time played in last period or next period < 50% (assumes we have information on injuries / selection ahead of a gameweek deadline, which most of the time we should do).
Four population sets were created; A) Budget midfielders & forwards, B) Mid-priced midfielders & forwards, C) Premium midfielders & forwards and D) Defenders.
Cost groupings for midfielders & forwards were defined by looking at the average goals+assists per player over the season. The resulting groups were “budget” (<6.5m), “mid-priced” (6.5m to 8.9m) and “premium” (>=9m). The average return over the season of each cost group was 0.4, 1.4 and 2.4 respectively, for four gameweeks.
The analysis
The three approaches – “Total FPL Points”, “Form” and “Stats-based” – were compared for each period within each population set as predictors of returns for the next gameweek group.
The three measures I looked at for each approach were:
(1) percentage of players in top5* that provided at least 1 return (DEFENDER and BUDGET sets), 2 returns (MID-PRICED) or 3 returns (PREMIUM) in the next period;
(2) the average number of goals these players scored; and
(3) the average number of goals scored by those in the top 5 that failed to provide these returns
*note: top5 was used for ease of analysis – the trends observed in this analysis continued beyond this to the top10 and top20.
The Results
Premium midfielders & forwards
On average, the stats-based approach correctly predicted at least 3 returns for the top5-ranked players 61% of the time and was at least as good as the other two approaches in half (Total points) and two-thirds (Form) of the periods.
Under the stats-based approach, the average number of returns in the next period for those top5-ranked players who returned at least 3 times in that period was 3.8, which was at least as good as the other two approaches in three-quarters of the periods.
Under the stats-based approach, the average number of returns in the next period for the top5 players that failed to get 3 returns was 1.2, which was at least as good as the other two approaches in two-thirds (Total points) and three-quarters (Form) of the periods.
In English, this means you were more likely to pick a player that would get you 3 returns in the next 4 gameweeks if you used the stats-based approach than if you adopted either of the other two approaches. Further, most of the time you’d get a higher number of returns for these. Finally, if you failed in your pick, you were still likely to get a better return using the stats-based approach than the other approaches.
Mid-priced midfielders & forwards
On average, the stats-based approach correctly predicted at least 2 returns for the top5-ranked players 61% of the time and was at least as good as the other two approaches in 88% of the periods.
Under the stats-based approach, the average number of returns in the next period for those “successful” players was 2.9, which was at least as good as the other two approaches in half (Total points) and a quarter (Form) of the periods.
Under the stats-based approach, the average number of returns in the next period for the top5 players that failed to get 2 returns was 0.7, which was at least as good as the other two approaches in three-quarters (Total points) and 88% (Form) of the periods.
Budget midfielders and forwards
On average, the stats-based approach correctly predicted at least 1 return for the top5-ranked players 71% of the time and was at least as good as the other two approaches in three-quarters of the periods.
Under the stats-based approach, the average number of returns in the next period for those “successful” players was 1.7, which was at least as good as the other two approaches in three-quarters (Total points) and two-thirds (Form) of the periods.
Defenders
On average, the stats-based approach correctly predicted at least 1 return for the top5-ranked players 47% of the time and was at least as good as the other two approaches in half (Total points) and three-quarters (Form) of the periods.
Under the stats-based approach, the average number of returns in the next period for those “successful” players was 1.4 which was at least as good as the other two approaches in 88% of the periods.
An example: mid-priced midfielders and forwards
Based on gameweek 5 to 8 data, the stats-based approach predict the top5 players for the following period to be Marko Arnautovic, Bernardo Silva, Felipe Anderson, Raúl Jiménez and David Silva. Across these players, there were 12 returns (1, 3, 3, 1, 4 respectively) in gameweeks 9 to 12.
The “Total FPL Points” approach ranked the following in its top5: Bernardo Silva, Ryan Fraser, Raúl Jiménez, Gylfi Sigurdsson and Callum Wilson. Across these players, there were 10 returns (3, 3, 1, 1, 2 respectively) in gameweeks 9 to 12.
The “Form” approach ranked the following in its top5: Marko Arnautovic, Ryan Fraser, Raúl Jiménez, Gylfi Sigurdsson and Callum Wilson. Across these players, there were 8 returns (1, 3, 1, 1, 2 respectively) in gameweeks 9 to 12.
David Silva was only picked by the stats-based approach. In gameweeks 5 to 8 he scored 1 goal with no assists. As a result both his form and total FPL points were quite low. However he ranked highly on xG+xA (total 2.02) and penalty area touches (19) in that period, amongst other stats. He ranked low based on form and total FPL points but would have been selected in the stats-based approach. Silva scored 2 goals and made 2 assists in the next four gameweeks.
In other gameweeks the difference is greater: eg based on GW21-24 data, the stats-based approach correctly predicted a total of twice as many returns across its top5 versus the other approaches.
Prediction for GW1 2019-20
What does this mean for GW1 this season? Well probably not much as the time elapsed may render last season’s data irrelevant – the model needs a few weeks of the new season to be useful. But let’s look at it anyway…
Based on last year’s prices, top defender picks for attacking returns based on the stats approach are Seamus Coleman, Andrew Robertson, Trent Alexander-Arnold and Kyle Walker.
Top-ranked budget mid and forward picks are Ilkay Gündogan, Jordan Henderson and N’Golo Kanté.
Top mid-priced mid and forward picks are Ayoze Pérez, Bernardo Silva, Gerard Deulofeu, Diogo Jota and Ryan Fraser.
I’ve omitted premium mids and forwards as they all look decent picks.
Conclusion
The stats-based approach was more of a differential for the budget players, mid-priced players and defenders; all approaches were successful for the premium players.
Arguably, the cheaper players and defenders are the harder group to predict returns for, so you may see some success in adding a stats-based approach to your armoury for these groups.
Unpredictable factors affecting a player’s performance (luck, weather, contract negotiations, personal life) add a randomness to the game that is hard to model and may lead to a stats-based approach returning fewer points than predicted. Having said that, there does seem to be some value in considering the right measures in your decision-making process.
The members area is a gold mine of useful statistics and tools and I’d highlight a player’s recent positioning using the comparison tool’s heatmap in being useful if you’re uncertain in your final player selection.
Good luck all!
4 years, 8 months ago
Interesting stuff! I'd like to see how it works out this season - it was something I had looked at myself but was stumbling over how to form all these stats into one point predicting model whilst also acknowledging the various underlying variables.
Thank you for submitting!