Using t-tests - Match Corner counts do not include matches where a team gets zero corners #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bug
I think there may be a bug in the way corners per team per match is counted. In a match where a team gets no corners, the data is ignored instead of being counted as zero. Everton is an example team that played a match where they got zero corners.
The problem originates in this line (which is before the code that counts the corners by team, and by team by match):
corners = train.loc[train["subEventName"] == "Corner"]
This means that for any team match combinations with no corners the data is dropped here, prior to calculating the counts. Therefore, zero counts cannot happen.
Impact
Most calculated numbers for Everton are incorrect i.e. the number of corners per match, the standard error of corners per match, the t-statistic and the p-value from the two-sample 2 sided t-test. However, the conclusion from the two-sample 2 sided t-test is not impacted.
Minimal Reproducing Example
Current Behaviour - corner counts: B 1, C 1, D 2 (A has no count)
Expected Behaviour - corner counts: A 0, B 1, C 1, D 2 (A has a count of 0)
Possible Fix
The corner filtering and counting operations should take place after grouping, instead of before. Unfortunately, this is a bit more complex than the original code, so the code comments may need to be amended.
Also
Fixed typo
statistic
spelling.