Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skills extraction example #86

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from
Open

Skills extraction example #86

wants to merge 3 commits into from

Conversation

lizgzil
Copy link
Contributor

@lizgzil lizgzil commented Aug 11, 2022

I created a file to extract skills from a toy dataset (also included). This might be helpful if people want to use our code but don't have access to TextKernel data.

Running this script with the default job_advert_examples.txt file will print out:

The job advert:
This is a sentence about the company and the salary. We require applicants to have skills in Microsoft Excel.
Has skills:
['microsoft-excel-require']

The job advert:
We want Microsoft Excel skills for this role. Communication skills are also essential.
Has skills:
['microsoft-excel-require', 'communication-important-essential']

The job advert:
This role has a very competitive starting salary. Skills for good communication are very important.
Has skills:
['communication-important-essential']

@lizgzil lizgzil requested a review from india-kerle August 11, 2022 15:41
Copy link
Contributor

@india-kerle india-kerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really quick eyes on this - if we get another request, then maybe we can flesh it out more i.e. a readme, save the model to the repo, refactor a bit but I think its great for our call this morning!!

@@ -27,6 +27,10 @@ More details of the steps included in this project, and running instructions, ca
3. [skills_extraction](skills_taxonomy_v2/pipeline/skills_extraction/README.md) - Extracting skills from skill sentences.
4. [skills_taxonomy](skills_taxonomy_v2/pipeline/skills_taxonomy/README.md) - Building the skills taxonomy from extracted skills.

### Extract skills example

A simple example of extracting skills from a toy dataset of job adverts is given in [the examples folder](skills_taxonomy_v2/examples/extract_skills.py).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!! really cool - I'd rename the files so that it's clear which file does what - looks like extract_skills.py has the functions needed to run Extract Skills - something like extract_skills_utils.py?

with open(job_adverts_file) as f:
job_adverts = json.load(f)

# Run the pipeline to extract skills
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add the elements of the pipeline to a simple extract_skills function in extract_skills_utils.py

This pipeline creates and extracts skills from an input list of job adverts.

Prerequisites:
- You have access to our S3 bucket or have the skills classifier pkl file stored locally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have the model in our repo? we could add the pkl model to outputs so people can have access to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants