Skip to content

Commit

Permalink
Merge pull request #393 from crocs-muni/refactor-spacy-download
Browse files Browse the repository at this point in the history
Download spacy model with pip
  • Loading branch information
adamjanovsky authored Feb 20, 2024
2 parents 78d2cfd + b545fab commit 7ee9d2f
Show file tree
Hide file tree
Showing 12 changed files with 74 additions and 110 deletions.
1 change: 0 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ jobs:
- name: Install sec-certs
run: |
pip install -e .
python -m spacy download en_core_web_sm
- name: Run tests
run: pytest --cov=sec_certs tests
- name: Code coverage upload
Expand Down
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.5
rev: v0.2.2
hooks:
- id: ruff
- id: ruff-format
args: ["--check"]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: "v1.6.1"
rev: "v1.8.0"
hooks:
- id: mypy
additional_dependencies:
Expand Down
9 changes: 6 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,12 @@ Note on single-sourcing the package version: More can be read [here](https://pac

### Currently, the release process is as follows

1. Update dependencies with `pre-commit autoupdate`, pin new versions of linters into `pyproject.toml` and run `cd requirements && ./compile.sh`.
2. Create a release from GitHub UI. Include release notes, add proper version tag and publish the release (or create it from scratch with new tag).
3. This will automatically update PyPi and DockerHub packages.
1. Update dependencies with `pre-commit autoupdate`, pin new versions of linters into `pyproject.toml`.
2. Run `cd requirements && ./compile.sh` to update dependencies.
3. Use `python -m spacy download en_core_web_sm` to find out the current version of `en_core_web_sm` dependency. Update pyproject.toml link of `en_core_web_sm` dependency with up-to-date link from [GitHub](https://github.com/explosion/spacy-models/releases).
4. Run `cd requirements && ./compile.sh` **again** to update dependencies.
5. Create a release from GitHub UI. Include release notes, add proper version tag and publish the release (or create it from scratch with new tag).
6. This will automatically update PyPi and DockerHub packages.


## Quality assurance
Expand Down
3 changes: 1 addition & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,7 @@ RUN \
pip3 install -U pip wheel pip-tools && \
pip-sync requirements/requirements.txt && \
pip3 install --no-cache notebook jupyterlab && \
pip3 install -e . && \
python3 -m spacy download en_core_web_sm
pip3 install -e .

# just to be sure that pdftotext is in $PATH
ENV PATH /usr/bin/pdftotext:${PATH}
Expand Down
2 changes: 0 additions & 2 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ The tool can be installed from PyPi with

```bash
pip install -U sec-certs
python -m spacy download en_core_web_sm
```

Note, that `Python>=3.10` is required.
Expand All @@ -32,7 +31,6 @@ git clone https://github.com/crocs-muni/sec-certs.git
python3 -m venv venv
source venv/bin/activate
pip install -e .
python -m spacy download en_core_web_sm
```

Alternatively, our Our [Dockerfile](https://github.com/crocs-muni/sec-certs/blob/main/Dockerfile) represents a reproducible way of setting up the environment.
Expand Down
4 changes: 2 additions & 2 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
::::{tab-set}

:::{tab-item} Common Criteria
1. Install the latest version with `pip install -U sec-certs && python -m spacy download en_core_web_sm` (see [installation](installation.md)).
1. Install the latest version with `pip install -U sec-certs` (see [installation](installation.md)).
2. In your Python interpreter, type
```python
from sec_certs.dataset.cc import CCDataset
Expand All @@ -16,7 +16,7 @@ to obtain to obtain freshly processed dataset from [seccerts.org](https://seccer
:::

:::{tab-item} FIPS 140
1. Install the latest version with `pip install -U sec-certs && python -m spacy download en_core_web_sm` (see [installation](installation.md)).
1. Install the latest version with `pip install -U sec-certs` (see [installation](installation.md)).
2. In your Python interpreter, type
```python
from sec_certs.dataset.fips import FIPSDataset
Expand Down
5 changes: 3 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
"ipykernel",
"ipywidgets",
"spacy",
"en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl",
"pkgconfig",
"seaborn",
"pySankeyBeta",
Expand All @@ -63,8 +64,8 @@

[project.optional-dependencies]
dev = [
"ruff==0.1.5",
"mypy==1.6.1",
"ruff==0.2.2",
"mypy==1.8.0",
"types-PyYAML",
"types-python-dateutil",
"types-requests",
Expand Down
26 changes: 7 additions & 19 deletions requirements/all_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@ appnope==0.1.3
# ipython
asttokens==2.4.1
# via stack-data
async-timeout==4.0.3
# via aiohttp
attrs==23.1.0
# via
# aiohttp
Expand Down Expand Up @@ -132,12 +130,10 @@ docutils==0.19
# myst-parser
# pydata-sphinx-theme
# sphinx
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl
# via sec-certs (./../pyproject.toml)
evaluate==0.4.1
# via setfit
exceptiongroup==1.2.0
# via
# ipython
# pytest
executing==2.0.1
# via stack-data
fastjsonschema==2.19.0
Expand Down Expand Up @@ -309,7 +305,7 @@ murmurhash==1.0.10
# preshed
# spacy
# thinc
mypy==1.6.1
mypy==1.8.0
# via sec-certs (./../pyproject.toml)
mypy-extensions==1.0.0
# via mypy
Expand Down Expand Up @@ -596,7 +592,7 @@ rpds-py==0.13.1
# via
# jsonschema
# referencing
ruff==0.1.5
ruff==0.2.2
# via sec-certs (./../pyproject.toml)
safetensors==0.4.0
# via transformers
Expand Down Expand Up @@ -648,7 +644,9 @@ snowballstemmer==2.2.0
soupsieve==2.5
# via beautifulsoup4
spacy==3.7.2
# via sec-certs (./../pyproject.toml)
# via
# en-core-web-sm
# sec-certs (./../pyproject.toml)
spacy-legacy==3.0.12
# via spacy
spacy-loggers==1.0.5
Expand Down Expand Up @@ -714,15 +712,6 @@ tifffile==2023.9.26
# via scikit-image
tokenizers==0.15.0
# via transformers
tomli==2.0.1
# via
# build
# coverage
# mypy
# pip-tools
# pyproject-hooks
# pytest
# setuptools-scm
toolz==0.12.0
# via
# dask
Expand Down Expand Up @@ -778,7 +767,6 @@ types-requests==2.31.0.10
typing-extensions==4.8.0
# via
# alembic
# cloudpathlib
# huggingface-hub
# mypy
# myst-nb
Expand Down
26 changes: 7 additions & 19 deletions requirements/dev_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@ appnope==0.1.3
# ipython
asttokens==2.4.1
# via stack-data
async-timeout==4.0.3
# via aiohttp
attrs==23.1.0
# via
# aiohttp
Expand Down Expand Up @@ -101,10 +99,8 @@ docutils==0.19
# myst-parser
# pydata-sphinx-theme
# sphinx
exceptiongroup==1.2.0
# via
# ipython
# pytest
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl
# via sec-certs (./../pyproject.toml)
executing==2.0.1
# via stack-data
fastjsonschema==2.19.0
Expand Down Expand Up @@ -227,7 +223,7 @@ murmurhash==1.0.10
# preshed
# spacy
# thinc
mypy==1.6.1
mypy==1.8.0
# via sec-certs (./../pyproject.toml)
mypy-extensions==1.0.0
# via mypy
Expand Down Expand Up @@ -424,7 +420,7 @@ rpds-py==0.13.1
# via
# jsonschema
# referencing
ruff==0.1.5
ruff==0.2.2
# via sec-certs (./../pyproject.toml)
scikit-learn==1.3.2
# via sec-certs (./../pyproject.toml)
Expand Down Expand Up @@ -453,7 +449,9 @@ snowballstemmer==2.2.0
soupsieve==2.5
# via beautifulsoup4
spacy==3.7.2
# via sec-certs (./../pyproject.toml)
# via
# en-core-web-sm
# sec-certs (./../pyproject.toml)
spacy-legacy==3.0.12
# via spacy
spacy-loggers==1.0.5
Expand Down Expand Up @@ -508,15 +506,6 @@ thinc==8.2.1
# via spacy
threadpoolctl==3.2.0
# via scikit-learn
tomli==2.0.1
# via
# build
# coverage
# mypy
# pip-tools
# pyproject-hooks
# pytest
# setuptools-scm
tornado==6.3.3
# via
# ipykernel
Expand Down Expand Up @@ -550,7 +539,6 @@ types-requests==2.31.0.10
# via sec-certs (./../pyproject.toml)
typing-extensions==4.8.0
# via
# cloudpathlib
# huggingface-hub
# mypy
# myst-nb
Expand Down
13 changes: 5 additions & 8 deletions requirements/nlp_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@ appnope==0.1.3
# ipython
asttokens==2.4.1
# via stack-data
async-timeout==4.0.3
# via aiohttp
attrs==23.1.0
# via
# aiohttp
Expand Down Expand Up @@ -103,10 +101,10 @@ dill==0.3.7
# multiprocess
distro==1.8.0
# via tabula-py
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl
# via sec-certs (./../pyproject.toml)
evaluate==0.4.1
# via setfit
exceptiongroup==1.2.0
# via ipython
executing==2.0.1
# via stack-data
filelock==3.13.1
Expand Down Expand Up @@ -520,7 +518,9 @@ smart-open==6.4.0
soupsieve==2.5
# via beautifulsoup4
spacy==3.7.2
# via sec-certs (./../pyproject.toml)
# via
# en-core-web-sm
# sec-certs (./../pyproject.toml)
spacy-legacy==3.0.12
# via spacy
spacy-loggers==1.0.5
Expand Down Expand Up @@ -551,8 +551,6 @@ tifffile==2023.9.26
# via scikit-image
tokenizers==0.15.0
# via transformers
tomli==2.0.1
# via setuptools-scm
toolz==0.12.0
# via
# dask
Expand Down Expand Up @@ -600,7 +598,6 @@ typer==0.9.0
typing-extensions==4.8.0
# via
# alembic
# cloudpathlib
# huggingface-hub
# panel
# pydantic
Expand Down
Loading

0 comments on commit 7ee9d2f

Please sign in to comment.