Skip to content

KR3YGK/leagues-cup-2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

leagues-cup-2023

Estimates team strength parameters for all MLS and Liga MX teams in 2023 using all data from 2023 up to 8/15/23 (will be updated when Leagues Cup concludes). The model used to estimate team strength parameters models goals scored by the home and away teams of each match as having a bivariate Poisson distribution, which is derived as the product of two Poisson marginal distibutions with a multiplicative factor. The modeling used here allows for the correlation of goals scored between home and away team to be positive, zero, or negative. For theoretical details, see

Lakshminarayana, J., Pandit, S., and Srinivasa Rao, K. (1999). On a Bivariate Poisson Distribution. Communications in Statistics - Theory and Methods, 28:267–276. https://www.tandfonline.com/doi/abs/10.1080/03610929908832297

The model was estimated using Stata's bivcnto command with the pfamoye option. For details, see

Xu, X. and Hardin, J. W. (2016). Regression models for bivariate count outcomes. The Stata Journal, 16(2):301–315. https://journals.sagepub.com/doi/pdf/10.1177/1536867X1601600203

Organization of Files

data

All data is in the "data" folder, which contains two subfolders.

raw

The subfolder "raw" contains all raw data used as inputs to the model. These include

  1. MLS games.dta - game data from 2023 MLS Regular Season prior to Leagues Cup
  2. Liga MX.dta - game data from 2023 Liga MX Clausura
  3. Leagues Cup.dta - game data from all 2023 Leagues Cup games (updated 8.16.23)
  4. logos folder - logo image files for all teams
  5. logos.xlsx - bridge for team names and logo image files

output

The subfolder "output" is where all output (data and images) will be saved. The folder currently has all files that the code will create when all do-files and R files are run.

  1. game data.dta - full data set for analysis generated by data prep.do
  2. coefficients.xlsx - team strength coefficients generated by bivariate Poisson.do
  3. Monterrey at New England.jpg - outcome probability chart generated by bivariate Poisson.do
  4. New England at Monterrey.jpg - outcome probability chart generated by bivariate Poisson.do

do files

Contains both do-files needed to estimate team strength parameters and create match outcome probability charts for matches between the top teams in MLS and Liga MX.

  1. data prep.do - takes the raw data and generates the full data set for analysis, game data.dta.
  2. bivariate Poisson.do - estimates team strength parameters from game data.dta, generates coefficients.xlsx to be used in bar chart.R, and generates outcome probability charts for home/away games between top team in each league (Monterrey at New England.jpg and New England at Monterrey.jpg)

R files

Contains the R script that creates the rankings of teams as horizontal bar charts using team logos.

  1. bar chart.R - generates horizontal bar charts of team strengths from coefficients.xlsx. Image files must be saved manually.

Order of Operations

To reproduce the results of my SubStackm article "Comparable Rankings of MLS and Liga MX Teams" (still in draft as of this README.md draft), follow this order of operations

  1. run data prep.do in Stata (change file paths to location of data folder on your computer)
  2. run bivariate Poisson.do in Stata (change file paths to location of data folder on your computer)
  3. run bar chart.R in R (change file paths to location of data folder on your computer)
  4. Let me know if you have any issues with steps 1-8

About

Analysis of Leagues Cup 2023 in Stata/R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published