Community Submissions

Predicting Goals and Assists

Based on some great work from @11tegen11 we know that expected goals are the best predictor of future performance, followed by shots on target. I also previously confirmed the high correlation (77%) of shots on target with goals scored over the whole season. So I was interested to see whether some of the aggregate statistics such as the Fantasy Football Scout Members Influence, Creativity and Threat (ICT) metrics can also help predict goals and assists. Due to the use of members area data this article is restricted to members only.

rakkhi Love my football, love my stats, hoping to improve each year. Go the gunners! @rakkhis on Twitter Follow them on Twitter

57 Comments Post a Comment
  1. J0E
    • Fantasy Football Scout Member
    • Has Moderation Rights
    • 14 Years
    8 years, 4 months ago

    thanks for this. Great research...adds to what we already know about shots on target and big chances.

  2. Doosra - ☭DeclanMyGenius…
    • Fantasy Football Scout Member
    • 14 Years
    8 years, 4 months ago

    Thank you. 🙂

    Shots on Target. 😀

    1. Eden Hazardous
      • 9 Years
      8 years, 4 months ago

      No love for xG?

      1. Doosra - ☭DeclanMyGenius…
        • Fantasy Football Scout Member
        • 14 Years
        8 years, 4 months ago

        Nope. But I love the investigation. 🙂

      2. Woy of the Wovers
        • 13 Years
        8 years, 4 months ago

        xG is not even defined.

        1. rakkhi
          • Fantasy Football Scout Member
          • 13 Years
          8 years, 4 months ago

          Sorry it is Expected Goals (xG), a good explanation of what they are is here: http://cartilagefreecaptain.sbnation.com/2015/10/19/9295905/premier-league-projections-and-new-expected-goals

            • 10 Years
            8 years, 4 months ago

            Thats a good explanation of how one model calculates expected goals. There are infinte methods including infinite non-shot based methods of calculating expected goals.

            1. rakkhi
              • Fantasy Football Scout Member
              • 13 Years
              8 years, 4 months ago

              Infinite is a hyperbolic but yes there are non shot based ones as well. Some links for more reading / watching if anyone is interested

              http://www.optasportspro.com/about/optapro-blog/posts/2015/film-optapro-forum-beyond-shots/

              http://analyticsfc.co.uk/pep-a-non-shots-based-expg-model/

              As a side note I'm pretty happy dipping into the FFS comments occasionally there is some discussion of expected goals now which is great evolution.

                • 10 Years
                8 years, 4 months ago

                +1 I should have sad infinite potential models. Expected goals is modelling and there's no right way or wrong way to model, just scales of better or worse.

    2. Woy of the Wovers
      • 13 Years
      8 years, 4 months ago

      Shots on Target are still the best correlation to Goals among the stats we have here.

  3. Kelton
    • 9 Years
    8 years, 4 months ago

    Great piece! Any chance of xG coming to Members stats?

    1. Guy Demel's SH
      • 11 Years
      8 years, 4 months ago

      +1! Would really like xG to become part of Member stats. I dont think it's something currently provided by Opta though so I guess the team would need their own analyst/agreed methodology.

      1. Kelton
        • 9 Years
        8 years, 4 months ago

        Right, based on my quick Googling it is a custom model, typically using shots on target but splicing location 20+ ways (versus just in box/out of box) and possibly other stats.

        Perhaps a better question is Threat due for a fresh coat of paint, e.g. boosting the influence of shots in the box? 😉

        1. Kelton
          • 9 Years
          8 years, 4 months ago

          edit: it looks like Threat is better than SitB alone so disregard my suggestion above 🙂

        2. Guy Demel's SH
          • 11 Years
          8 years, 4 months ago

          Yeah there are quite a few varying xG methods out there. Some of them go into more depth than just location of the shot. Michael Caley's also uses things like whether the chance comes from a counter attack, from a cross, from a rebound off the woodwork, from a through ball etc to measure the quality of each chance.

          As far as I'm aware he's one of the few to publish his methodology, most of them keep them secret.

          1. rakkhi
            • Fantasy Football Scout Member
            • 13 Years
            8 years, 4 months ago

            I would love FFS develop its own xG model they have all the data with the Opta shot coordinates being shown in those graphs on player profile that never seem to load for me. I use Paul Riley's data which is the only one I have found that have player level data published online: https://public.tableau.com/profile/paul.riley#!/vizhome/PremierLeague201516xGMap/PremierLeague201516ShotonTargetxGDashboard

            1. powderjunkie
              • Fantasy Football Scout Member
              • 9 Years
              8 years, 4 months ago

              One thing that confuses me about Paul Riley's methods is that he only uses shots on target when accuracy rates are shown to be very much influenced by luck (and regress to the mean). Also, it decreases an already very small sample size by over half. That's why imo Michael Caley's model makes more sense. It's just a shame he doesn't publish individual player data.

              1. rakkhi
                • Fantasy Football Scout Member
                • 13 Years
                8 years, 4 months ago

                Yeah be great if Michael Caley published his data by player. Paul's data is fine though, I mean you need a shot on target to score and get an assist so really has the majority of what we care about. They have also both compared each others models and they have very similar rankings. Visualizing Paul's data for GW1-20: https://twitter.com/rakkhis/status/684326573286096900

  4. BabyB
    • Fantasy Football Scout Member
    • 10 Years
    8 years, 4 months ago

    Great rakkhi, thanks very much.

  5. Kelton
    • 9 Years
    8 years, 4 months ago

    @Rakkhi - Can you confirm we are comparing correlations of xG for teams versus Threat for individual players? If so, it might make sense that team-based stats are better predictors than player-base stats, no?

    1. rakkhi
      • Fantasy Football Scout Member
      • 13 Years
      8 years, 4 months ago

      Not sure if 11tengen11 did look at teams or players for xG (teams more likely). Link to my previous work looked at players though and xG is still better

  6. SANTACLAUS
    • 8 Years
    8 years, 4 months ago

    Great read

    1. SANTACLAUS
      • 8 Years
      8 years, 4 months ago

      so the takeaway from this is ICT Threat is the best thing to judge a player on?

      1. rakkhi
        • Fantasy Football Scout Member
        • 13 Years
        8 years, 4 months ago

        None of the ICT metrics have great predictive ability unfortunately but yes if like what ICT is doing and maybe use it with other metrics such as fixtures then Threat is the best one to use

    • 10 Years
    8 years, 4 months ago

    Sorry Rakkhi, but I'm going to have to go a bit RuthNZ vs YMA here. So I'm going to say three things now to give my views on the discussions this article brings up as encouragement to keep digging down this path before I cross-examine the article.

    1) I think its great that this sort of investigation is being attempted.
    2) I love the thought behind xG and its attempt to quantify the game but current models are not the be all and end all and there is always room for improvement.
    3) I think the members area here is great but its been stagnating for a long time and we need more tools to keep apace with modelling going on in the analytics community as the level of analysis available is way more valuable than the actual ownership of the data.

    Right now there are a lot of things I don't like in this analysis that smell like they fall into the category of apples and oranges or worse comparing weather forecasting for London with the weather outcomes in Greece.

    Firstly I have a question in the long range comparison

      • 10 Years
      8 years, 4 months ago

      Actually three questions on the ICT graph, are we looking at player level or team level? I'm going to assume player as thats more of interest for fantasy readers but is of key importance as it gives fewer actions per data point meaning player level is going to be more susceptible to variation than 11tegen11's player level data. Secondly did you have any minimum quantifier on minutes played? I'll give you the benefit of the doubt that you did but if you didn't that's a potentially large problem as its again reducing the actions behind each data point without minimum qualifiers. Thirdly what positions did you filter it for as if defenders are in there, well then the ICT influence comparison is a heap of junk anyway.

      Now the ICT metrics.
      ICT influence, is as the name suggests, heavily weighted towards actions that had bearing on the outcome, e.g. goals, assists and ALSO clean sheets as well as giving a minor weighting to other inputs such as shots.
      ICT Creativity, is a metric that looks at creativity and assist potential, it is not looking for goal potential.
      ICT Threat is looking for goal potential but not necessarily assist potential (although penalty box touches which has weak predictive power for both goals and assists may input some minor residual assist predictivity.
      ICT Index is an amalgamation of the three above and so at least in defensive positions is also factor in clean sheets.
      So why are we trying to compare metrics trying to explain different things each to "goals + assists"?

      Now xG's predictive power. 11tegen11's graph is comparing the first x (in this case 20) weeks to the remainder of the season. You are looking to just weeks 21-30, and again he is looking at a team level while (I believe) this is looking at player level. He is using bigger sample sizes and exposes himself to less variance. It should not be of a surprise that his correlation value is higher then.

      Second graph, comparing to "goals and assists" again. Why just why? Nearly all of these things are better at looking at one of those two outcomes rather than both together. Why should a chance created be indicative of goal scoring potential beyond the weak explanation of it means he's further up the pitch. Similarly shots on target has links to being further up the pitch and also possibly getting rebound fantasy assists but no other compulsion to be a good explainor of assists.

      Using form to predict - this is actually a good point as the smaller sample size lends itself to variance and is can be massively influenced by strength of schedule and other factors. That said with SoS weightings form can do better and I would like to see you work on this as there is definitely room for growth and more data points are always a good thing for getting a better picture. Also I think if you use a minimum minutes played in these comparisons (say 3 starts out of 4) they'll again boost your predictive abilities.

      1. Woy of the Wovers
        • 13 Years
        8 years, 4 months ago

        I think the big point here is the confusion of goals and assists (and Creativity/Threat). While we prefer our players to score goals, many will justify their selection based on their assist potential and since the relative points vary by position, we can't properly analyse the stats without separating forwards, mids and defenders.

          • 10 Years
          8 years, 4 months ago

          Yes but also don't underestimate the bin size problem. Each data point is a bin of events occurring in x player minutes which is varying massively amongst things being compared here.

      2. rakkhi
        • Fantasy Football Scout Member
        • 13 Years
        8 years, 4 months ago

        Thanks for taking the time to provide such detailed feedback. I will attempt to address all your questions:

        ICT stats for Gameweeks 1-20 correlation with GW21-30 points
        1) Actually three questions on the ICT graph, are we looking at player level or team level?
        Player level which as you say is more interesting for us. It maybe more susceptible to variation but it is also a far great sample size. Remember I also looked at the same stats as 11tengen11 and found similar correlations as him with shots on target for example using player data

        2) Secondly did you have any minimum quantifier on minutes played?
        I tested with setting this and it does not change the correlation (both at 950 mins which is half the time played and at 500, at 300 interestingly it reduces the correlation). This actually makes sense when you think about it, players that played low minutes would have a low index or threat score and also low goals and assists. I’m not using a per 90 metric in this work so leaving in low minutes does not matter and again provides a larger sample

        3) Thirdly what positions did you filter it for as if defenders are in there, well then the ICT influence comparison is a heap of junk anyway
        I did not filter defenders out mainly as there is no easy way to do that in the FFS custom stats tables without doubling the data collection effort (running it both for mids and forwards). It may make the influence comparison low as you said, however again it is the same as low time played both values should be lower for defenders so it does not adversely impact the correlation.

        4) So why are we trying to compare metrics trying to explain different things each to "goals + assists"?
        For two reasons. A) as fantasy players we are interested in a player scoring fantasy points so both goals and assists are relevant. Any metric that helps us predict either is valuable. B) When I just looked goals for example with Threat the correlation is actually lower so I presented to most favourable view as a judgement call.

        5) Differences between 11tengen11 and my comparisons
        That is a fair criticism, he is looking at teams and at the rest of the season. I was presenting his data as point of comparison and looked at what would be most interesting to fantasy mangers. If I also had a good historical repository of xG data I would provide that as a comparison. I have looked at shots on target across the whole season before, next time I run this I will do shots on target, and other shot data against the ICT metrics also from a predictive over next 10 games.

        Last 4 gameweek stats
        6) Second graph, comparing to "goals and assists" again
        This is my bad. I explained what I was doing in the first article and did not repeat again for this one. This is what each stat is being correlated against:
        Correlating GW17-20 stat to GW21-36 Goals + Assists (due to reasons explained above):
        ICT Index
        ICT Threat
        ICT Influence
        ICT Creativity

        Correlating GW17-20 stat to GW21-36 Goals:
        Shots on target
        Shots in the box
        Total shots

        Correlating GW17-20 stat to GW21-36 Assists
        Chances created

        7) that said with SoS weightings form can do better
        I have actually done a fair bit of work with fixtures, read some of my previous articles on this site, will continue to use fixtures strength in the future. One big conclusion of the data is that individual player stats e.g. shots on target has a far higher correlation that both opponent difficulty and team offensive strength

        8) Also I think if you use a minimum minutes played in these comparisons (say 3 starts out of 4) they'll again boost your predictive abilities.
        Again ran the data again with both min 90 mins and 190 (half the mins) and correlations are no different for the same reasons I mentioned above
        Thanks again for the feedback, let me know if you have any further questions!

          • 10 Years
          8 years, 4 months ago

          1) As I know you read my post on it the other day this is hugely about data integrity and falls into the category of comparing weather forecasting for London with the climate outcomes in Greece, which is more dangerous than apples and oranges because the average punter can spot that the apples and oranges are different but may miss the meteorological flaw.

          Player level stats are comparing 11 times the number of data bins than team level stats but each with an eleventh of the amount of data. It is really important the nuance is spotted here as a players stats are a bin of events while team stats are a bin of bins. This is part of why there may be much more variation in testing. Secondly comparing r2s from data sets of different sizes without acknowledging r2 decreases naturally in larger data sets is reckless reporting.

          2) Again. Data integrity, small bins and big bins all lumped together doesn’t necessarily make more bins better.

          3) Defenders get Influence points for clean sheets. That is a notable part of why it’s a heap of junk at predicting attacking returns when you leave defenders in.

          4) A okay I had the wrong end of the stick, thanks for the clarification. And B has me interested. DM me the graphs?

          5) -

          6) Again I had the wrong end of the stick here, thanks for clarification.

          7) Have you done work mixing the SoS of the oppositions team level shot prevention with the attacking stats? They seem to work well on a ~70/30 split attack vs oppo defence.

          8) Again data integrity, homogeneity is good when you’re not sacrificing too much.

          1. rakkhi
            • Fantasy Football Scout Member
            • 13 Years
            8 years, 4 months ago

            1) Data integrity - yup will take this on board for future. I forget sometimes at most analytics guys like 11tengen11 look at team data. I'll run more detailed predictive correlations in the summer break when I should also have the xG data that Paul Riley is releasing game week by gameweek also so that will be apples to apples player data comparisions. I'll do something in week 24 looking at GW25-30 also and look at at least shots data correlations for players vs ICT and even fixture difficulty stats

            3) Noted, more annoying but will look to run the tables twice with just mids and forwards in future

            4) Sure think your twitter handle is above. Assuming you have Tableau desktop version otherwise can upload to Tableau public

            7) I have looked at the correlations of goals and assists vs opponent difficulty in both FFS rating as well as shots in the box conceded and they don't correlate well overall but are good for some players: http://www.fantasyfootballscout.co.uk/2015/11/27/fixture-difficulty-gameweeks-1-13/ . This is all the data I have had access to, do you have a good data source for shot prevention?

          • 10 Years
          8 years, 4 months ago

          Sorry forgot to put this at the top, thanks for taking the time to get back to me on these, especially points 4 and 6, I was just wrong there 🙂

          1. rakkhi
            • Fantasy Football Scout Member
            • 13 Years
            8 years, 4 months ago

            No worries good discussion 🙂 Actually I don't have your twitter handle so let me know that and twitter is not good for sending attachments that are not pictures so email me at rakkhi.s[at]gmail.com and I'll send you the Tableau files and data

    1. rakkhi
      • Fantasy Football Scout Member
      • 13 Years
      8 years, 4 months ago

      Thanks for the encouraging words. I figure only way to learn and improve is to keep working on this stuff and get feedback like yours below. I will provide a detailed answer to each of your points tonight (my time about 12 hours time)

        • 10 Years
        8 years, 4 months ago

        🙂 I look forward to it.

      1. Woy of the Wovers
        • 13 Years
        8 years, 4 months ago

        There are two problems to address. One is an historic analysis and one is more predictive. If you want to explain historic goals, SoT is the best explanation - remove all penalties and analyse forwards separately and variability is all explained by chance.

        Where this fails is as a predictive tool is because SoT is subject to random variations so they don't really show how good the player is. Rather, and somewhat formulaically, they show PlayerThreat + Luckfactor. Over a 4 GW period, the random variation can easily contribute to over half of the 'Apparent threat'.

        1. rakkhi
          • Fantasy Football Scout Member
          • 13 Years
          8 years, 4 months ago

          Well the correlation data shows that previous shots on target and expected goals (more precise shots on target) are actually a good predictive metric also. Still it only explains 60% of the data by GW20 for example so Football is still a game with a lot of variance especially as it is so low scoring / low event

    2. Guy Demel's SH
      • 11 Years
      8 years, 4 months ago

      Which analysts do you look at Balders? I've mainly been looking at Michael Caley and 11tegen11's work, wondering who else I should be looking out for?

      I've also had a look at Danny Page's long term simulator (http://dannypage.github.io/lookup.html) but its a bit clunky and updated infrequently.

        • 10 Years
        8 years, 4 months ago

        I follow at least 100 on twitter so I'll point you towards Danny Pages list which is pretty comprehensive then give you my top 5 outside of the names you've already mentioned.
        List: https://twitter.com/DannyPage/lists/sport-quants-analysts/members

        1) Statsbomb [ @Statsbomb @jair1970 @benjaminpugsley ]
        2) Analytics FC [ @AnalyticsFC @Worville @GregorydSam @Torvaney @BobbyGardiner ]
        3) DeepXG [ @DeepXG ]
        4) North Yard Analytics [ @altmandaniel ]
        5) Paul Riley [ @footballfactman ]

        1. Guy Demel's SH
          • 11 Years
          8 years, 4 months ago

          Brilliant, thanks

        2. rakkhi
          • Fantasy Football Scout Member
          • 13 Years
          8 years, 4 months ago

          Yup that is a great list. I only follow about 20 so you can have a look at these guys also: https://twitter.com/rakkhis/following . Michael Caley (@MC_of_A), Mike Goodman (@TheM_L_G) and James York from Statsbomb (@jair1970) are three in particular I would add.

    3. tm245
      • 12 Years
      8 years, 4 months ago

      Very respectful way to disagree, and a great conversation on this post.

      Would be cool if whoever is manning the computer these days could chime in with a look under the RMT machine's hood as well.

      • 10 Years
      8 years, 4 months ago

      ugh sorry one edit:

      " I'm going to assume player as thats more of interest for fantasy readers but is of key importance as it gives fewer actions per data point meaning player level is going to be more susceptible to variation than 11tegen11's __team__ level data"

  7. KingOllie
    • 8 Years
    8 years, 4 months ago

    I need some help my friends. Gone from 21k - 180k over 8 weeks or so, with 29 last gw 🙁

    Myhill / mcarthy
    Bellerin / toby / ward / moreno / oxford
    Ozil / kdb / mahrez / alli / gosling
    Aguero / luk / kane

    3.4 iitb. 1 FT.

    Any ideas on what to fix up? Not sure if i should get payet or fix my crappy defence/keeper situation out. Any help is appreciated 🙂

  8. myboycharlie
    • 12 Years
    8 years, 4 months ago

    Best defender under 4.2 as fifth defender?

    A) smith (good form and decent fixtures but it's Bournemouth)
    B) Simpson (Leicester seem quite lucky to get CS lately)
    C) Davies ( played 3 of last 4 but will rose replace him)
    D) Ryan bennett (will be rarely played)
    E) martina (only played last few for saints)
    F) Rangel (already have Williams)

    All input greatly appreciated

  9. Torres76
    • 14 Years
    8 years, 4 months ago

    Hi all. What will Vardy's selling price be after tonight's drop?

    CP - 7.6
    SP - 7.0
    BP - 6.4

    Thanks all

    1. Woy of the Wovers
      • 13 Years
      8 years, 4 months ago

      42

      1. Fitzy.
        • Fantasy Football Scout Member
        • 12 Years
        8 years, 4 months ago

        Why not just ignore the post rather than trying to be smart? He was polite...

        1. Woy of the Wovers
          • 13 Years
          8 years, 4 months ago

          There's a time and a place...

    2. Fitzy.
      • Fantasy Football Scout Member
      • 12 Years
      8 years, 4 months ago

      You'll lose 0.1 SV on him, SP will be 6.9.

      1. Torres76
        • 14 Years
        8 years, 4 months ago

        Cheers Fitzy. Appreciate your help..

  10. Torres76
    • 14 Years
    8 years, 4 months ago

    Anyone think I should sell Arny for Payet?

    Arny I Mahrez | Ali | KDB | Ozil

    Price rise pending tonight for Payet

    1. Fitzy.
      • Fantasy Football Scout Member
      • 12 Years
      8 years, 4 months ago

      Arnie has Norwich at home this week, Payet has only just come back from injury and hasn't started a game yet.

  11. Dušan Citizen
    • 10 Years
    8 years, 4 months ago

    Great read, thanks. 😛

  12. EddieKL
    • 8 Years
    8 years, 4 months ago

    which is better return? Arnie vs NOR at home or Payet vs BOU away?

  13. Clump
    • 9 Years
    8 years, 4 months ago

    Bit late to the party but great stuff. Lots of effort here 🙂

  14. Somnus84
    • 8 Years
    8 years, 3 months ago

    Substitution conundrum:

    A) Barkley & Lukaku out for Alli and Costa
    B) Barkley & Bellerin out for Payet & Azpi
    C) Barkley & Fuchs out for Payet & Wollschield
    D) Barkley & Moreno out for Payet & Cathcart