-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to latest model #205
Conversation
@india-kerle - no rush on this - but I've tried to update everything with the new model details. Hope I haven't missed something? |
|
||
Since multiple people labelled files from different locations, we merge the labelled data using the following command: | ||
We labelled another batch of job adverts using [Prodigy](https://prodi.gy/). This was to avail of their active learning capabilities. Details of how we labelled job adverts this way are given in [the Prodigy labelling README](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/prodigy/README.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We labelled another batch of job adverts using [Prodigy](https://prodi.gy/). This was to avail of their active learning capabilities. Details of how we labelled job adverts this way are given in [the Prodigy labelling README](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/prodigy/README.md). | |
We labelled another batch of job adverts using [Prodigy](https://prodi.gy/). This was to make use of their active learning capabilities. Details of how we labelled job adverts this way are given in [the Prodigy labelling README](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/prodigy/README.md). |
@@ -16,11 +16,11 @@ This process means we can extract skills from thousands of job adverts and analy | |||
|
|||
## Labelling data | |||
|
|||
To train the NER model we needed labelled data. First we created a random sample of job adverts and got them into a form needed for labelling using [Label Studio](https://labelstud.io/). More about this labelling process can be found in the `skill_ner` pipeline [README.md](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/README.md). | |||
To train the NER model we needed labelled data. First we created a random sample of job adverts and got them into a form needed for labelling using [Label Studio](https://labelstud.io/), we then did a second batch of labelled using [Prodigy](https://prodi.gy/). More about this labelling process can be found in the `skill_ner` pipeline [README.md](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/README.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To train the NER model we needed labelled data. First we created a random sample of job adverts and got them into a form needed for labelling using [Label Studio](https://labelstud.io/), we then did a second batch of labelled using [Prodigy](https://prodi.gy/). More about this labelling process can be found in the `skill_ner` pipeline [README.md](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/README.md). | |
To train the NER model we needed labelled data. First we created a random sample of job adverts and got them into a form needed for labelling using [Label Studio](https://labelstud.io/). We then did a second batch of labelled using [Prodigy](https://prodi.gy/). More about this labelling process can be found in the `skill_ner` pipeline [README.md](./ojd_daps_skills/ojd_daps_skills/pipeline/skill_ner/README.md). |
5635bc1
to
93a3336
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! thanks for these changes :)
Fixes #197 and #212
ojd_daps_skills_data_new.zip
since it will break things if it replacesojd_daps_skills_data.zip
just yetTo do:
ojd_daps_skills_data_old.zip
and changeojd_daps_skills_data_new.zip
toojd_daps_skills_data.zip
. DO THIS AFTER THIS PR IS MERGED?Important. I have created a new public S3 zipped file (s3://open-jobs-indicators/escoe_extension/ojd_daps_skills_data_new.zip) which has the new model (20230808) rather than the old one (20220825).
For the meantime, I have kept
s3://open-jobs-indicators/escoe_extension/ojd_daps_skills_data.zip
as it is (i.e. with the old model). This is because whilst this code is still in a PR, dev won't work with the new zipped file (it will try to look for the 20220825 model but it won't exist).This is how I updated the zipped file:
Thanks for contributing to Nesta's Skills Extractor Library 🙏!
If you have suggested changes to code anywhere outside of the ExtractSkills class, please consult the checklist below.
Checklist ✔️🐍:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
soutput/reports/
If you have suggested changes to documentation (and/or the ExtractSkills class), please ALSO consult the checklist below.
Documentation Checklist ✔️📚:
make html
indocs
docs/build/*.html
files locally to ensure they have formatted correctlydocs/build/*.html
files