add skillner

baniasbaabe · Oct 20, 2024 · b16164e · b16164e
1 parent 9e32a28
commit b16164e
Show file tree

Hide file tree

Showing 2 changed files with 99 additions and 0 deletions.
diff --git a/book/cooltools/Chapter.ipynb b/book/cooltools/Chapter.ipynb
@@ -2268,6 +2268,63 @@
     "\n",
     "# Output: His name is Mr. <PERSON>. His phone number is <PHONE_NUMBER>."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Extract Skills from Job Postings with `skillner`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Extracting skills from unstructured data can be difficult.\n",
+    "\n",
+    "With 𝐬𝐤𝐢𝐥𝐥𝐧𝐞𝐫 it doesn't have to be.\n",
+    "\n",
+    "𝐬𝐤𝐢𝐥𝐥𝐧𝐞𝐫 extracts skills and certifications from data based on an open source skills database.\n",
+    "\n",
+    "Based on spacy and some simple rules, it achieved good results in some tests I ran.\n",
+    "\n",
+    "Of course, you could also run an LLM on job ads, but do you need it?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install skillNer\n",
+    "!python -m spacy download en_core_web_lg"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import spacy\n",
+    "from spacy.matcher import PhraseMatcher\n",
+    "from skillNer.general_params import SKILL_DB\n",
+    "from skillNer.skill_extractor_class import SkillExtractor\n",
+    "\n",
+    "nlp = spacy.load(\"en_core_web_lg\")\n",
+    "skill_extractor = SkillExtractor(nlp, SKILL_DB, PhraseMatcher)\n",
+    "\n",
+    "job_description = \"\"\"\n",
+    "You are a Python developer with a expertise in backend development\n",
+    "and can manage projects. You quickly adapt to new environments\n",
+    "and speak fluently English and German.\n",
+    "\"\"\"\n",
+    "\n",
+    "annotations = skill_extractor.annotate(job_description)\n",
+    "\n",
+    "skill_extractor.describe(annotations)"
+   ]
   }
  ],
  "metadata": {

diff --git a/book/llm/Chapter.ipynb b/book/llm/Chapter.ipynb
@@ -396,6 +396,48 @@
     "score = evaluate(dataset,metrics=[faithfulness,answer_correctness])\n",
     "score.to_pandas()"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Unified Reranker API with `rerankers`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "An essential element of your RAG systems is reranking.\n",
+    "\n",
+    "Reranking involves a reranking model that outputs a similarity score for each retrieved document and the user query.\n",
+    "\n",
+    "The `rerankers` library gives you a unified API to use with popular vendors and models such as Cohere, Jina or T5.\n",
+    "\n",
+    "The perfect API to easily test and replace many methods."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install rerankers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from rerankers import Reranker\n",
+    "\n",
+    "ranker = Reranker(\"t5\")\n",
+    "\n",
+    "results = ranker.rank(query=\"I love you\", docs=[\"I hate you\", \"I really like you\"], doc_ids=[0,1])"
+   ]
   }
  ],
  "metadata": {