Skip to content

⚽📊 A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster), including a curated list of publicly available resources published by the football analytics community.

Notifications You must be signed in to change notification settings

szchelkowski/football_analytics

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Edd Webster Football Analytics

A space for football analytics projects by Edd Webster, including a curated list of publicly available resources published by the football analytics community.


Edd Webster Analytics

Visitors trackgit-views GitHub Stars GitHub Last Commit GitHub Commit Activity GitHub Repository Size Licence

-----------------------------------------------------

👋 About This Repository and Author

Edd Webster

The README of this repository is a concise list of learning resources, data sources, libraries, papers, blogs, podcasts, etc., created by all those that have made contributions to the football analytics community. This will be a constant work in progress so if you can think of any resources that I've missed, or you yourself have created something that you believe should be added and is currently not available, please feel free to create a pull request or send me a message.

Credits to the Soccer Analytics Handbook by Devin Pleuler, Awesome Soccer Analytics by Matias Mascioto, and Jan Van Haaren's Soccer Analytics 2021 Review, Soccer Analytics 2020 Review and soccer-analytics-resources Github repo, which were all used to plug gaps in the list once it was published. Credit also to Matias Singers for his awesome-readme repository used to restyle this README.

If you like the repo, please feel free to give it a ⭐ (top right). Cheers!

For an Excel spreadsheet version of this README to keep track of the parts you have already read/worked on, see the following Google Sheets spreadsheet kindly put together by Melanie Loeper [link].

For more information about this repository and the author, I am available through all the following channels:

CV Badge Personal Website Badge Email Badge Twitter Badge LinkedIn Badge About.me Badge GitHub Badge HackerRank Badge Coder Rank Badge Tableau Badge

-----------------------------------------------------

📖 Table of Contents

Table of Contents
  1. About This Repository and Author
  2. Table of Contents
  3. Prerequisites
  4. Repository Structure
  5. Notebooks
  6. Data Visualisation and Tableau
  7. Resources
  8. Citations
  9. Contributing
  10. Star Tracker
  11. Acknowledgements

-----------------------------------------------------

🍴 Prerequisites

Python Badge Jupyter Badge

The only prerequisites for using this GitHub repo is that you have a computer, internet connection and the desire to learn more about football analytics.

The following open-source Python libraries listed below are some of the most commonly used in Data Science that feature in the the notebooks in this repository. Most of these libraries can be obtained by downloading and installing Anaconda. Step-by-step guides to do this can be found for Windows here and Mac here, as well as in the Anaconda documentation itself here.

Back to Contents

-----------------------------------------------------

🌵 Repository Structure

The contents of this GitHub repository is organised as the following:

football analytics github repository
.
│
├── dashboards
│
├── data
│   │ 
│   ├── capology
│   │ 
│   ├── elo
│   │ 
│   ├── export
│   │ 
│   ├── fbref
│   │ 
│   ├── fifa
│   │ 
│   ├── guardian
│   │ 
│   ├── metrica-sports
│   │ 
│   ├── opta
│   │
│   ├── reference
│   │ 
│   ├── sb
│   │ 
│   ├── shots
│   │ 
│   ├── stats-perform
│   │ 
│   ├── stratabet
│   │ 
│   ├── tm
│   │ 
│   ├── touchline-analytics
│   │ 
│   ├── twenty-first-group
│   │ 
│   ├── understat
│   │ 
│   └── wyscout
│
├── docs
│   ├── centre-circle
│   ├── metrica-sports
│   ├── opta
│   ├── sb
│   ├── shots
│   ├── stratabet
│   └── wyscout
│
├── gif
│   └── fig
│
├── img
│   │  
│   ├── club_badges
│   │  
│   ├── eddwebster
│   │  
│   ├── fig
│   │  
│   ├── logos
│   │  
│   ├── pitches
│   │  
│   └── vizpiration
│
├── notebooks
│   │    
│   ├── 1_data_scraping
│   │   ├── Capology Player Salary Web Scraping.ipynb
│   │   ├── FBref Player Stats Web Scraping.ipynb
│   │   └── TransferMarkt Player Bio and Status Web Scraping.ipynb   
│   │
│   ├── 2_data_parsing
│   │   ├── ELO Team Ratings Data Parsing.ipynb
│   │   ├── StatsBomb Data Parsing.ipynb
│   │   └── Wyscout Data Parsing.ipynb   
│   │
│   ├── 3_data_engineering
│   │   ├── Capology Player Salary Data Engineering.ipynb
│   │   ├── Centre Circle Opta CPL Data Engineering.ipynb
│   │   ├── FBref Player Stats Data Engineering.ipynb
│   │   ├── Opta #mcfcanalytics PL 2011-2012.ipynb
│   │   ├── StatsBomb Data Engineering.ipynb
│   │   ├── StrataBet Data Engineering.ipynb
│   │   ├── The Guardian Player Recorded Transfer Fees Data Engineering.ipynb
│   │   ├── TransferMarkt Historical Market Value Data Engineering.ipynb
│   │   ├── TransferMarkt Player Bio and Status Data Engineering.ipynb
│   │   ├── TransferMarkt Player Recorded Transfer Fees Data Engineering.ipynb
│   │   ├── Understat Data Engineering.ipynb
│   │   └── Wyscout Data Engineering.ipynb
│   │
│   ├── 4_data_unification
│   │   └── Unification of Aggregated Seasonal Football Datasets.ipynb
│   │
│   ├── 5_data_analysis_and_projects
│   │   │   
│   │   ├── player_similarity_and_clustering
│   │   │   └── PCA and K-Means Clustering of 'Piqué-like' Defenders.ipynb 
│   │   │
│   │   ├──tracking_data
│   │   │   │   
│   │   │   ├── metrica_sports
│   │   │   │   └── Metrica Tracking Data EDA.ipynb
│   │   │   │   
│   │   │   └── signality
│   │   │       ├── Signality Tracking Data Engineering.ipynb
│   │   │       └── Signality Tracking Data EDA.ipynb
│   │   │ 
│   │   └──xg_modeling
│   │   │   │   
│   │   │   ├── shots_dataset
│   │   │   │   │   
│   │   │   │   ├── chance_quality_modelling
│   │   │   │   │   ├── 1) Logistic Regression Expected Goals Model.ipynb
│   │   │   │   │   ├── 2) XGBoost Expected Goals Model.ipynb
│   │   │   │   │   └── 3) CatBoost Expected Goals Model.ipynb
│   │   │   │   │   
│   │   │   │   └── metrica-sports
│   │   │   │       └── Metrica Sports.ipynb
│   │   │   │   
│   │   │   ├── statsbomb_dataset
│   │   │   │   └── Introduction to Building Expected Goals Models Using StatsBomb 360 Data.ipynb
│   │   │   │   
│   │   │   └── opta_dataset
│   │   │       └── Training of an Expected Goals Model Using Opta Event Data.ipynb
│   │   │ 
│   └── 6_data_visualisation
│
├── research
│   ├── papers
│   └── slides
│
├── scripts
│
├── spreadsheets
│
└── video 

Back to Contents

-----------------------------------------------------

📔 Notebooks

Nearly all code in this repository is in Jupyter notebooks, organised in the following workflow:

  1. Webscraping;
  2. Data Parsing;
  3. Data Engineering;
  4. Data Unification; and
  5. Data Analysis - projects include working with Tracking data, constructing VAEP models (as introduced by SciSports), building xG models using Logistic Regression, Random Forests and Gradient Booested Decision Tree algorithms such as XGBoost and CatBoost, and analysing player similarity using PCA and K-Means clustering).

Back to Contents

-----------------------------------------------------

📊 Data Visualisation and Tableau Dashboards

For Tableau dashboards produced using the data engineered in the notebooks in this repository, please see my Tableau Public profile: public.tableau.com/profile/edd.webster.

Example Tableau dashboards:

Back to Contents

-----------------------------------------------------

📑 Resources

📑 Getting Started with Football Analytics

Good resources for those new for the use of data in football:

Back to Contents

-----------------------------------------------------

💾 Data

ℹ️ Data Sources

All publicly available data sources and datasets relating to football, from Tracking data, Event data, aggregated player performance data, detailed match statistics, injury records and transfer values, and more.

Data sources that have been used in the code and analysis in this repository can be found in the data subfolder of this repository or in Google Drive (due to GitHub's 100mb file limit) [link]. All code however in this repository should enable you to scrape, parse, and engineer the datasets as per the output used for analysis and visualisations featured..

To learn more about the different types of data available, such as Event and Tracking data, please see the "Where can I get data?" section of Devin Pleuler's soccer_analytics_handbook [link].

For a quick primer of the free football data resources available, see the following Twitter thread by James Nalton [link].

Event data
Tracking data
Aggregated Player/Team Performance data
Team Rating data
Physical data
Results and Matchsheet data
  • 2018 FIFA World Cup Rosters - goals, caps, club, and date of birth for players on 2018 FIFA World Cup rosters. Source: data.world
  • engsoccerdata - English and European soccer results 1871-2017
  • FIFA World Cup Match Results - matchups and results of FIFA World Cup matches from 1930 - 2014. Source: data.world
  • FotMob - dataset including team and play stats including xG and post-shot xG.
  • Football Lineups
  • international_results - repository of 42,452 results of international football matches starting from the very first official match in 1972 up to 2019
  • smarterscout - scouting and player rating information
  • SofaScore - live scores, lineups, standings, heatmaps, and basic teams, coaches and player data
  • Soccerway - matchsheet data
Financial, Valuation, and Transfer data
Odds, Betting, and Predictions data
Plotting Tools

Also see Mark Wilkin's Twitter thread [link]:

Reference data
Miscellaneous Data

📄 Documentation

All documentation saved locally in the documentation subfolder, including:

Data Types and Companies

Data Providers
Tracking
Videos / Performances Analysis
Consultancy / Service Providers

Back to Contents

-----------------------------------------------------

🧑‍🎓 Tutorials

Python

R

Tableau

Check out the Tableau for Sports Discord server organised by Ninad Barbadikar, to interact with a community of Tableau developers

For a YouTube playlist of Tableau-football videos and tutorials that I have collated from various sources including the Tableau Football User Group, Rob Carroll, Tom Goodall, and Ninad Barbadikar, see the following [link].

PowerBI

For a YouTube playlist of Power BI-football videos and tutorials that I have collated from various sources including Futbol AnalysR and PowerBI for Sports, see the following [link].

SQL

Excel

PowerPoint

Back to Contents

-----------------------------------------------------

🏛️ Libaries

Python

  • codeball - data driven tactical and video analysis of soccer games;
  • Football Packing - a Python package to calculate packing rate for a given pass in football by Samira Kumar. This is a variation of the metric created by Impect;
  • kloppy - a Python package providing (de)serializers for soccer tracking- and event data, standardized data models, filters, and transformers designed to make working with different tracking- and event data like a breeze. See the YouTube tutorial [link];
  • matplotsoccer - a Python library for visualising soccer event data by Tom Decroos;
  • mplsoccer - a Python library for drawing soccer/football pitches in Matplotlib and loading StatsBomb open-data by Andrew Rowlinson;
  • nayra - API that allows you track soccer player from camera inputs, and evaluate them with an Expected Discounted Goal (EDG) Agent. See the Evaluating Soccer Player paper by Paul Garnier and Théophane Gregoir;
  • northpitch - a Python football plotting library that sits on top of Matplotlib by Devin Pleuler;
  • PCA_Player_Finder by Parth Athale;
  • PySport including PySport Soccer - collection of open-source sport packages including many of those mentioned in this section, by Koen Vossen;
  • PyWaffle - an open source, MIT-licensed Python package for plotting waffle charts by Peter McKeever;
  • ScraperFC - a Python package by Owen Seymour to scrape FiveThirtyEight data, aggregated StatsBomb data from FBref, Understat shooting and player meta data including values for xG, xA, xGChain, xGBuildup, player salary data from Capology, and WhoScored? Opta Event provided by StatsPerform;
  • Scrape-FBref-data - Python library to scrape aggregated StatsBomb data via FBref by Parthe Athale, which in turn was updated from Christopher Martin's repository;
  • statsbombapi - a Python API wrapper and dataclasses for StatsBomb data;
  • statsbombpy - a Python library written by Francisco Goitia to access StatsBomb data;
  • statsbomb-parser - Python library to convert StatsBomb's JSON data into easy-to-use CSV format;
  • socceraction - a Python library for valuing the individual actions performed by soccer players. Includes an Expected Threat (xT) implementation by Tom Decroos et. al.;
  • soccermix - a soft clustering technique based on mixture models that decomposes event stream data into a number of prototypical actions of a specific type, location, and direction by Tom Deccoos and ML-KULeuven;
  • soccer_xg - a Python package for training and analyzing expected goals (xG) models in football;
  • soccerplots - a Python package that can be used for making visualizations for football analytics by Anmol Durgapal;
  • sync.soccer - a Python package to synchronise football datasets, so that an event in one dataset is matched to the corresponding event or snapshot in the other by Marek Kwiatkowski. This repository contains an implementation that aligns Opta's (now Stat Perform) F24 feeds to ChyronHego's Tracab files. More formats may be added in the future. See the following blog post for methodology [link];
  • tmscrape - a Python TransferMarkt webscraper by danzn1;
  • Tyrone Mings - a Python TransferMarkt webscraper by FCrSTATS;
  • understat - a Python webscraper by Amos Bastian to scrape Understat shooting and player meta data.

R

Back to Contents

-----------------------------------------------------

GitHub Repositories

Python

R

Back to Contents

-----------------------------------------------------

Apps

Back to Contents

-----------------------------------------------------

📊 Data Visualisation Resources and Tools

Resources to aid data visualisation:

Back to Contents

-----------------------------------------------------

✒️ Written Pieces

Blogs

Many of these blog posts are recommended in Sam Gregory's Best Football Analytics Pieces piece and Tom Worville's “What’s the best Football Analytics piece you’ve ever read?”, both articles now a few years old. This section is very subjective so if I've missed anything obvious, apologies.

Blogs and Data Analytics Websites

The following list contains those blogs that are still maintained, as well as the original blogs from the OGs of football analytics.

For a Twitter thread of the football analytics blogs from 2009 an earlier, see the following Twitter thread from Tiotal Football [link].

📃 Papers

Many of the papers included in this list have been included after reading Jan Van Haaren's Soccer Analytics 2021 Review and Soccer Analytics 2020 Review. Props to him for reading a paper a week and making his thoughts publicly available!

The papers included in this list have been

The following Shiny App from Lars Maurath is a great tool for looking up publications [link].

2021
2020
2019

2018

2017
2016
2015
2014
2011
2002
1997
1971

Newsletters

News Articles

📚 Books

See the Sports Analytics Reading List by Measureables (Brendan Kent), as part of his Sports Analytics 101 series

The following use Amazon UK links where available.

Magazines

Back to Contents

-----------------------------------------------------

📼 Video

YouTube Playlists

Custom Playlists Curated by Myself

The following is a series of playlists that that I have collated originally for my own personal viewing but they may be useful to you:

Public Playlists

Playlists created by others

YouTube Channels

Video Analysis

Webinars and Lectures

Ted Talks

Documentaries

Match Highlights

Other

Back to Contents

-----------------------------------------------------

🔊 Podcasts

Below I've tried to include both the Sports/Football Analytics and then notable episodes of all podcasts that have analytical content/interviews. Spotify and YouTube links used where available. All episodes mentioned below that are available on Spotify can be found in the following playlist (updated periodically): [link].

Football Analytics Podcasts

Notable Episodes (including non-football-data-specific podcasts)

Back to Contents

-----------------------------------------------------

👨‍💻 Notable Figures and Twitter Accounts

Back to Contents

-----------------------------------------------------

🗓️ Events and Conferences

Back to Contents

-----------------------------------------------------

Competitions

The following includes non-football competitions.

Back to Contents

-----------------------------------------------------

Courses

Back to Contents

-----------------------------------------------------

💼 Jobs

For live job postings tracked by the community, check the Jobs channel of the Football in Numbers Discord server

Back to Contents

-----------------------------------------------------

Discord/Slack groups

Back to Contents

-----------------------------------------------------

🔑 Key Concepts

Focus on some of the key topics in football analytics. Most of the following resources features above but are instead reorganised by topic. This section is still very much a work in progress as I go along and may be missing resources mentioned above.

History of Football Analytics

Expected Goals (xG) Modeling

Videos

For a playlist of Expected Goals related videos available on YouTube, see the following playlist I have created [link].

Webinars and Lectures
Tutorials
Notable Models
Written Pieces

For a collated list of Expected Goals literature collated by Keith Lyons, see the following [link]

Libraries
GitHub Repositories
Podcasts
Tweets

Web Scraping Football Data

Written Pieces
Videos
Libraries

Tracking Data

Pitch Control Modeling

Tutorials

Pitch Control modelling and Valuing Actions tutorials by Laurie Shaw as part of his Metrica Sports Tracking data series for Friends of Tracking. See the following for code [link];

GitHub Repositories
Written Pieces
Video
Podcasts

Passing Networks

Written Pieces
Blogs
Papers
Tutorials
Videos
Tweets

Possession Value (PV) Frameworks

General
Expected Threat (xT)
Valuing Actions by Estimating Probabilities (VAEP)
Goals Added (g+)
On-Ball Value (OBV)

Dixon Coles Modeling

Player Similarity and Style Analysis

Written Pieces
Videos
Tutorials
GitHub Repositories

Reinforcement Learning for Football Simulation

Team Playing Style Analysis

Written Pieces
Papers
Blogs
Videos
GitHub Repositories

Set Pieces

Section created after seeing the following tweets and threads by Ashwin Raman ([link]) and Stuart Reid ([link])

Radars

Recruitment Analysis

Quantifying Relative Club and League Strength

Models
Financial
Historical Match Results
Historical Statistical Player Performance
Articles
Papers
Videos
Miscellaneous
  • Tweets by AI Abucus [link] and [link]. They use a simple Dickson-Coles method focusing on historic results going back 15 years to build an order of hierarchy amongst teams in leagues that might have never played each other.

Tactics

Counter Attacking
Articles
Papers
Videos
Podcasts
Pressing
Articles
Videos
Counter Pressing
Articles
Papers
Videos

Player Valuation Modeling

Example Models
Example Methodologies
Written Pieces Regarding the Topic of Player Valuation
Articles
Blogs
Papers
Code/Notebooks
Slides
Tweets
Financial Data
Player Values
Recorded Transfers
Other
Relevant Packages/Repos
Miscellaneous

Game Win Probability Modeling

Goalkeeper Analysis

Back to Contents

-----------------------------------------------------

Citations

Thanks to all those that have kindly wrote about or promoted this GitHub repository. See:

Back to Contents

-----------------------------------------------------

Contributing

This GitHub repository and resources list will be a constant work in progress so if you can think of any resources that I've missed, feel free to create a pull request or send me a message @ [email protected] or @eddwebster.

If you're new to creating a pull request, please follow these steps (based on this)

  1. Create an account on GitHub if you do not already have one.

  2. Fork the project repository: click on the ‘Fork’ button near the top of the page. This creates a copy of the code under your account on the GitHub user account. For more details on how to fork a repository see this guide.

  3. Clone your fork of the football_analytics repo from your GitHub account to your local disk:

    git clone https://github.com/<github username>/football_analytics.git
    cd football_analytics
  4. Create environment with:
    $ python3 -m venv my_env or $ python -m venv my_env or with conda:
    $ conda create -n my_env python=3

  5. Activate the environment:
    $ source my_env/bin/activate
    or with conda:
    $ conda activate my_env

  6. Add the upstream remote. This saves a reference to the main hyperopt repository, which you can use to keep your repository synchronised with the latest changes:

    $ git remote add upstream https://github.com/eddwebster/footbal_analytics.git

    You should now have a copy of the football analytics repository, and your git repository properly configured. The next steps now describe the process of modifying code and submitting a pull request:

  7. Synchronize your master branch with the upstream master branch:

    git checkout master
    git pull upstream master
  8. Create a feature branch to hold your development changes:

    $ git checkout -b my_change

    and start making changes. Always use a feature branch. It’s good practice to never work on the master branch!

  9. Then, once you commit ensure that git hooks are activated (Pycharm for example has the option to omit them). This can be done using pre-commit, as follows:

    pre-commit install
  10. Develop the feature on your feature branch on your computer, using Git to do the version control. When you’re done editing, add changed files using git add and then git commit:

    git add modified_files
    git commit -m "my first football_analyitcs commit"
  11. Record your changes in Git, then push the changes to your GitHub account with:

    git push -u origin my_change

Back to Contents

-----------------------------------------------------

Star History

Star history for the football_analytics repository.

Football Analytics GitHub Stars History

Back to Contents

-----------------------------------------------------

Acknowledgements

Back to the Top

About

⚽📊 A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster), including a curated list of publicly available resources published by the football analytics community.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.1%
  • Python 0.9%