Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using t-tests - Match Corner counts do not include matches where a team gets zero corners #3

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

DMacGillivray
Copy link

Bug

I think there may be a bug in the way corners per team per match is counted. In a match where a team gets no corners, the data is ignored instead of being counted as zero. Everton is an example team that played a match where they got zero corners.
The problem originates in this line (which is before the code that counts the corners by team, and by team by match):
corners = train.loc[train["subEventName"] == "Corner"]
This means that for any team match combinations with no corners the data is dropped here, prior to calculating the counts. Therefore, zero counts cannot happen.

Impact

Most calculated numbers for Everton are incorrect i.e. the number of corners per match, the standard error of corners per match, the t-statistic and the p-value from the two-sample 2 sided t-test. However, the conclusion from the two-sample 2 sided t-test is not impacted.

Minimal Reproducing Example

# simulate events data from 2 matches
# Match 1: A vs B - A has 0 corners, B has 1 corner
# Match 2: C vs D - C has 1 corner, D has 2 corners

train = pd.DataFrame({'matchId':[1, 1, 2, 2, 2, 2],
                      'teamId': [ 'A', 'B', 'C', 'C', 'D', 'D'],
                      'subEventName': ['Pass', 'Corner', 'Corner', 'Pass', 'Corner', 'Corner']})
corners = train.loc[train['subEventName']=='Corner']
corners_by_game = corners.groupby(['matchId', 'teamId']).size().reset_index(name='counts')
corners_by_game.sort_values(by=['matchId', 'teamId'])

Current Behaviour - corner counts: B 1, C 1, D 2 (A has no count)
Expected Behaviour - corner counts: A 0, B 1, C 1, D 2 (A has a count of 0)

Possible Fix

The corner filtering and counting operations should take place after grouping, instead of before. Unfortunately, this is a bit more complex than the original code, so the code comments may need to be amended.

corners_by_team = train.groupby(by=['teamId']).apply(lambda grp: (grp['subEventName']=='Corner').sum()).reset_index(name='counts')
corners_by_game = train.groupby(by=['matchId', 'teamId']).apply(lambda grp: (grp['subEventName']=='Corner').sum()).reset_index(name='counts')

Also

Fixed typo statistic spelling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant