Skip to content

Commit

Permalink
add skillner
Browse files Browse the repository at this point in the history
  • Loading branch information
baniasbaabe committed Oct 20, 2024
1 parent 9e32a28 commit b16164e
Show file tree
Hide file tree
Showing 2 changed files with 99 additions and 0 deletions.
57 changes: 57 additions & 0 deletions book/cooltools/Chapter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2268,6 +2268,63 @@
"\n",
"# Output: His name is Mr. <PERSON>. His phone number is <PHONE_NUMBER>."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extract Skills from Job Postings with `skillner`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Extracting skills from unstructured data can be difficult.\n",
"\n",
"With 𝐬𝐤𝐢𝐥𝐥𝐧𝐞𝐫 it doesn't have to be.\n",
"\n",
"𝐬𝐤𝐢𝐥𝐥𝐧𝐞𝐫 extracts skills and certifications from data based on an open source skills database.\n",
"\n",
"Based on spacy and some simple rules, it achieved good results in some tests I ran.\n",
"\n",
"Of course, you could also run an LLM on job ads, but do you need it?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install skillNer\n",
"!python -m spacy download en_core_web_lg"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import spacy\n",
"from spacy.matcher import PhraseMatcher\n",
"from skillNer.general_params import SKILL_DB\n",
"from skillNer.skill_extractor_class import SkillExtractor\n",
"\n",
"nlp = spacy.load(\"en_core_web_lg\")\n",
"skill_extractor = SkillExtractor(nlp, SKILL_DB, PhraseMatcher)\n",
"\n",
"job_description = \"\"\"\n",
"You are a Python developer with a expertise in backend development\n",
"and can manage projects. You quickly adapt to new environments\n",
"and speak fluently English and German.\n",
"\"\"\"\n",
"\n",
"annotations = skill_extractor.annotate(job_description)\n",
"\n",
"skill_extractor.describe(annotations)"
]
}
],
"metadata": {
Expand Down
42 changes: 42 additions & 0 deletions book/llm/Chapter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,48 @@
"score = evaluate(dataset,metrics=[faithfulness,answer_correctness])\n",
"score.to_pandas()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Unified Reranker API with `rerankers`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An essential element of your RAG systems is reranking.\n",
"\n",
"Reranking involves a reranking model that outputs a similarity score for each retrieved document and the user query.\n",
"\n",
"The `rerankers` library gives you a unified API to use with popular vendors and models such as Cohere, Jina or T5.\n",
"\n",
"The perfect API to easily test and replace many methods."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install rerankers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from rerankers import Reranker\n",
"\n",
"ranker = Reranker(\"t5\")\n",
"\n",
"results = ranker.rank(query=\"I love you\", docs=[\"I hate you\", \"I really like you\"], doc_ids=[0,1])"
]
}
],
"metadata": {
Expand Down

0 comments on commit b16164e

Please sign in to comment.