Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to latest model #205

Merged
merged 8 commits into from
Dec 8, 2023
Merged

Update to latest model #205

merged 8 commits into from
Dec 8, 2023

Conversation

lizgzil
Copy link
Collaborator

@lizgzil lizgzil commented Sep 29, 2023

Fixes #197 and #212

  • Update references to the old model (20220825) to the new model (20230808)
  • Added the new model to a public zipped folder. This is currently in ojd_daps_skills_data_new.zip since it will break things if it replaces ojd_daps_skills_data.zip just yet
  • Add info to model cards
  • Optimize bertvectorizer

To do:

  • Rename the old public zipped file to ojd_daps_skills_data_old.zip and change ojd_daps_skills_data_new.zip to ojd_daps_skills_data.zip. DO THIS AFTER THIS PR IS MERGED?

Important. I have created a new public S3 zipped file (s3://open-jobs-indicators/escoe_extension/ojd_daps_skills_data_new.zip) which has the new model (20230808) rather than the old one (20220825).

For the meantime, I have kept s3://open-jobs-indicators/escoe_extension/ojd_daps_skills_data.zip as it is (i.e. with the old model). This is because whilst this code is still in a PR, dev won't work with the new zipped file (it will try to look for the 20220825 model but it won't exist).

This is how I updated the zipped file:

# Move over the new model
aws s3 cp --recursive s3://open-jobs-lake/escoe_extension/outputs/models/ner_model/20230808/ s3://open-jobs-indicators/escoe_extension/outputs/models/ner_model/20230808/

# Delete the old model
aws s3 rm s3://open-jobs-indicators/escoe_extension/outputs/models/ner_model/[20220825/](https://s3.console.aws.amazon.com/s3/buckets/open-jobs-indicators?region=eu-west-1&prefix=escoe_extension/outputs/models/ner_model/20220825/&showversions=false) --recursive

# Download the whole folder locally
aws s3 cp --recursive s3://open-jobs-indicators/escoe_extension/outputs/ ojd_daps_skills_data/outputs/

# zip
zip -r ojd_daps_skills_data.zip ojd_daps_skills_data/

# Upload the zipped file
aws s3 cp ojd_daps_skills_data.zip  s3://open-jobs-indicators/escoe_extension/ojd_daps_skills_data_new.zip

Thanks for contributing to Nesta's Skills Extractor Library 🙏!

If you have suggested changes to code anywhere outside of the ExtractSkills class, please consult the checklist below.

Checklist ✔️🐍:

  • I have refactored my code out from notebooks/
  • I have checked the code runs
  • I have tested the code
  • I have run pre-commit and addressed any issues not automatically fixed
  • I have merged any new changes from dev
  • I have documented the code
    • Major functions have docstrings
    • Appropriate information has been added to READMEs
  • I have explained the feature in this PR or (better) in output/reports/
  • I have requested a code review

If you have suggested changes to documentation (and/or the ExtractSkills class), please ALSO consult the checklist below.

Documentation Checklist ✔️📚:

  • I have run make html in docs
  • I have manually reviewed the docs/build/*.html files locally to ensure they have formatted correctly
  • I have pushed both relevant files AND their corresponding docs/build/*.html files

@lizgzil lizgzil requested a review from india-kerle September 29, 2023 10:36
@lizgzil lizgzil marked this pull request as ready for review September 29, 2023 10:37
@lizgzil
Copy link
Collaborator Author

lizgzil commented Sep 29, 2023

@india-kerle - no rush on this - but I've tried to update everything with the new model details. Hope I haven't missed something?


Since multiple people labelled files from different locations, we merge the labelled data using the following command:
We labelled another batch of job adverts using [Prodigy](https://prodi.gy/). This was to avail of their active learning capabilities. Details of how we labelled job adverts this way are given in [the Prodigy labelling README](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/prodigy/README.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We labelled another batch of job adverts using [Prodigy](https://prodi.gy/). This was to avail of their active learning capabilities. Details of how we labelled job adverts this way are given in [the Prodigy labelling README](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/prodigy/README.md).
We labelled another batch of job adverts using [Prodigy](https://prodi.gy/). This was to make use of their active learning capabilities. Details of how we labelled job adverts this way are given in [the Prodigy labelling README](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/prodigy/README.md).

@@ -16,11 +16,11 @@ This process means we can extract skills from thousands of job adverts and analy

## Labelling data

To train the NER model we needed labelled data. First we created a random sample of job adverts and got them into a form needed for labelling using [Label Studio](https://labelstud.io/). More about this labelling process can be found in the `skill_ner` pipeline [README.md](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/README.md).
To train the NER model we needed labelled data. First we created a random sample of job adverts and got them into a form needed for labelling using [Label Studio](https://labelstud.io/), we then did a second batch of labelled using [Prodigy](https://prodi.gy/). More about this labelling process can be found in the `skill_ner` pipeline [README.md](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/README.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To train the NER model we needed labelled data. First we created a random sample of job adverts and got them into a form needed for labelling using [Label Studio](https://labelstud.io/), we then did a second batch of labelled using [Prodigy](https://prodi.gy/). More about this labelling process can be found in the `skill_ner` pipeline [README.md](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/README.md).
To train the NER model we needed labelled data. First we created a random sample of job adverts and got them into a form needed for labelling using [Label Studio](https://labelstud.io/). We then did a second batch of labelled using [Prodigy](https://prodi.gy/). More about this labelling process can be found in the `skill_ner` pipeline [README.md](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/README.md).

Copy link
Collaborator

@india-kerle india-kerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! thanks for these changes :)

@lizgzil lizgzil merged commit ade0887 into dev Dec 8, 2023
4 checks passed
@lizgzil lizgzil deleted the new-model-update branch December 8, 2023 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update to new model everywhere
2 participants