-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skills extraction example #86
base: dev
Are you sure you want to change the base?
Conversation
…g there will be a -1 cluster, set default clustering parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really quick eyes on this - if we get another request, then maybe we can flesh it out more i.e. a readme, save the model to the repo, refactor a bit but I think its great for our call this morning!!
@@ -27,6 +27,10 @@ More details of the steps included in this project, and running instructions, ca | |||
3. [skills_extraction](skills_taxonomy_v2/pipeline/skills_extraction/README.md) - Extracting skills from skill sentences. | |||
4. [skills_taxonomy](skills_taxonomy_v2/pipeline/skills_taxonomy/README.md) - Building the skills taxonomy from extracted skills. | |||
|
|||
### Extract skills example | |||
|
|||
A simple example of extracting skills from a toy dataset of job adverts is given in [the examples folder](skills_taxonomy_v2/examples/extract_skills.py). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!! really cool - I'd rename the files so that it's clear which file does what - looks like extract_skills.py has the functions needed to run Extract Skills - something like extract_skills_utils.py?
with open(job_adverts_file) as f: | ||
job_adverts = json.load(f) | ||
|
||
# Run the pipeline to extract skills |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add the elements of the pipeline to a simple extract_skills function in extract_skills_utils.py
This pipeline creates and extracts skills from an input list of job adverts. | ||
|
||
Prerequisites: | ||
- You have access to our S3 bucket or have the skills classifier pkl file stored locally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have the model in our repo? we could add the pkl model to outputs so people can have access to it
I created a file to extract skills from a toy dataset (also included). This might be helpful if people want to use our code but don't have access to TextKernel data.
Running this script with the default job_advert_examples.txt file will print out: