Skip to content

Commit

Permalink
Merge pull request #44 from statsbomb/cores
Browse files Browse the repository at this point in the history
Allow configurable concurrency
  • Loading branch information
scotty779 authored Mar 24, 2023
2 parents 874013f + 51d1984 commit 0a97792
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 8 deletions.
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,20 @@ This repository is a Python package to easily stream StatsBomb data into Python
`nose2 -v --pretty-assert`


## Authentication
## Configuration

### Authentication

#### Environment Variables
Authentication can be done by setting environment variables named `SB_USERNAME` and `SB_PASSWORD` to your login credentials.

#### Manual Calls
Alternatively, if you don't want to use environment variables, all functions accept an argument `creds` to pass your login credentials in the format `{"user": "", "passwd": ""}`

### Concurrency
You can specify how many of your computer's cores to use when running the `sb.competition_events()` and `sb.competition_frames()` functions by setting the environment variable `SB_CORES` to the number you want to use. Allowing statsbombpy to use more cores will speed up those functions.

If you don't have an environment variable set we will try to detect the number of cores in your system and use 2 less than that number. If we cannot detect the number of cores we set the number to 4.

## Open Data
StatsBomb's open data can be accessed without the need of authentication.
Expand Down
4 changes: 1 addition & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

setup(
name="statsbombpy",
version="1.7.0",
version="1.8.0",
description="easily stream StatsBomb data into Python",
long_description=README,
long_description_content_type="text/markdown",
Expand All @@ -17,8 +17,6 @@
author_email="[email protected]",
packages=["statsbombpy"],
install_requires=[
"joblib",
"inflect",
"nose2",
"pandas",
"requests",
Expand Down
9 changes: 8 additions & 1 deletion statsbombpy/config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import os
import multiprocessing

CACHED_CALLS_SECS = 600

Expand All @@ -17,7 +18,13 @@
"frames": "https://raw.githubusercontent.com/statsbomb/open-data/master/data/three-sixty/{match_id}.json",
}

PARALLELL_CALLS_NUM = 4
if "SB_CORES" in os.environ:
MAX_CONCURRENCY = int(os.environ["SB_CORES"])
else:
try:
MAX_CONCURRENCY = max(multiprocessing.cpu_count() - 2, 4)
except NotImplementedError:
MAX_CONCURRENCY = 4

VERSIONS = {
"competitions": "v4",
Expand Down
6 changes: 3 additions & 3 deletions statsbombpy/sb.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import pandas as pd

from statsbombpy import api_client, public
from statsbombpy.config import DEFAULT_CREDS, PARALLELL_CALLS_NUM
from statsbombpy.config import DEFAULT_CREDS, MAX_CONCURRENCY
from statsbombpy.helpers import (
filter_and_group_events,
merge_events_and_frames,
Expand Down Expand Up @@ -139,7 +139,7 @@ def competition_events(
creds=creds,
include_360_metrics=include_360_metrics,
)
with Pool(PARALLELL_CALLS_NUM) as p:
with Pool(MAX_CONCURRENCY) as p:
matches_events = p.map(
events_call,
matches(c["competition_id"], c["season_id"], fmt="dict", creds=creds),
Expand Down Expand Up @@ -213,7 +213,7 @@ def competition_frames(
fmt="json",
creds=creds,
)
with Pool(PARALLELL_CALLS_NUM) as p:
with Pool(MAX_CONCURRENCY) as p:
competition_frames = p.map(
frames_call,
matches(c["competition_id"], c["season_id"], fmt="dict", creds=creds),
Expand Down

0 comments on commit 0a97792

Please sign in to comment.