Improve docs

nestauk · Oct 28, 2024 · 59dcbc4 · 59dcbc4
1 parent a85b766
commit 59dcbc4
Show file tree

Hide file tree

Showing 4 changed files with 132 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -63,7 +63,7 @@ Docs for this repo are automatically published to gh-pages branch via. Github ac
 However, if you are editing the docs you can test them out locally by running
 
 ```
-cd guidelines
-pip install -r docs/requirements.txt
+cd docs
+<!-- pip install -r docs/requirements.txt -->
 mkdocs serve
 ```
diff --git a/docs/README.md b/docs/README.md
@@ -1,5 +1,104 @@
-# nlp-link
+# 🖇️ NLP Link
 
-Documentation for NLP Link
+NLP Link finds the most similar word (or words) in a reference list to an inputted word. For example, if you are trying to find which word is most similar to 'puppies' from a reference list of `['cats', 'dogs', 'rats', 'birds']`, nlp-link will return 'dogs'.
 
-- [Page1](./page1.md)
+Another functionality of this package is using the linking methodology to find the [SOC](https://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc) code most similar to an inputted job title. More on this [here](./page1.md).
+
+## 🔨 Usage
+
+Install the package using pip:
+
+```bash
+pip install nlp-link
+```
+
+### Basic usage
+
+Match two lists in python:
+
+```python
+
+from nlp_link.linker import NLPLinker
+
+nlp_link = NLPLinker()
+
+# list inputs
+comparison_data = ['cats', 'dogs', 'rats', 'birds']
+input_data = ['owls', 'feline', 'doggies', 'dogs','chair']
+nlp_link.load(comparison_data)
+matches = nlp_link.link_dataset(input_data)
+# Top match output
+print(matches)
+
+```
+
+Which outputs:
+
+```
+   input_id input_text  link_id link_text  similarity
+0         0       owls        3     birds    0.613577
+1         1     feline        0      cats    0.669633
+2         2    doggies        1      dogs    0.757443
+3         3       dogs        1      dogs    1.000000
+4         4      chair        0      cats    0.331178
+
+```
+
+### Extended usage
+
+Match using dictionary inputs (where the key is a unique ID):
+
+```python
+
+from nlp_link.linker import NLPLinker
+
+nlp_link = NLPLinker()
+
+# dict inputs
+comparison_data = {'a': 'cats', 'b': 'dogs', 'd': 'rats', 'e': 'birds'}
+input_data = {'x': 'owls', 'y': 'feline', 'z': 'doggies', 'za': 'dogs', 'zb': 'chair'}
+nlp_link.load(comparison_data)
+matches = nlp_link.link_dataset(input_data)
+# Top match output
+print(matches)
+
+```
+
+Which outputs:
+
+```
+  input_id input_text link_id link_text  similarity
+0        x       owls       e     birds    0.613577
+1        y     feline       a      cats    0.669633
+2        z    doggies       b      dogs    0.757443
+3       za       dogs       b      dogs    1.000000
+4       zb      chair       a      cats    0.331178
+
+```
+
+Output several most similar matches using the `top_n` argument (`format_output` needs to be set to False for this):
+
+```python
+
+from nlp_link.linker import NLPLinker
+
+nlp_link = NLPLinker()
+
+comparison_data = {'a': 'cats', 'b': 'dogs', 'c': 'kittens', 'd': 'rats', 'e': 'birds'}
+input_data = {'x': 'pets', 'y': 'feline'}
+nlp_link.load(comparison_data)
+matches = nlp_link.link_dataset(input_data, top_n=2, format_output=False)
+# Top match output
+print(matches)
+# Format output for ease of reading
+print({input_data[k]: [comparison_data[r] for r, _ in v] for k,v in matches.items()})
+```
+
+Which will output:
+
+```
+{'x': [['b', 0.8171109], ['a', 0.7650396]], 'y': [['a', 0.6696329], ['c', 0.5778763]]}
+{'pets': ['dogs', 'cats'], 'feline': ['cats', 'kittens']}
+```
+
+The `drop_most_similar` argument can be set to True if you don't want to output the most similar match - this might be the case if you were comparing a list with itself. For this you would run `nlp_link.link_dataset(input_data, drop_most_similar=True)`.
diff --git a/docs/mkdocs.yaml b/docs/mkdocs.yaml
@@ -39,6 +39,6 @@ theme:
         name: Switch to light mode
 nav:
   - Home: README.md
-  - Page 1: page1.md
+  - SOCMapper: page1.md
 plugins:
   - same-dir
diff --git a/docs/page1.md b/docs/page1.md
@@ -1 +1,27 @@
-## Title
+# 🗺️ SOC Mapper
+
+The SOC mapper relies on the [SOC coding index](https://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc/soc2020/soc2020volume2codingrulesandconventions) released by the ONS. This dataset contains over 30,000 job titles with the SOC code.
+
+The `SOCMapper` class in `soc_map.py` maps job title(s) to SOC(s).
+
+## 🔨 Core functionality
+
+```
+from nlp_link.soc_mapper.soc_map import SOCMapper
+
+soc_mapper = SOCMapper()
+soc_mapper.load()
+job_titles=["data scientist", "Assistant nurse", "Senior financial consultant - London"]
+
+soc_mapper.get_soc(job_titles, return_soc_name=True)
+```
+
+Which will output
+
+```
+[((('2433/04', 'Statistical data scientists'), ('2433', 'Actuaries, economists and statisticians'), '2425'), 'Data scientist'), ((('6131/99', 'Nursing auxiliaries and assistants n.e.c.'), ('6131', 'Nursing auxiliaries and assistants'), '6141'), 'Assistant nurse'), ((('2422/02', 'Financial advisers and planners'), ('2422', 'Finance and investment analysts and advisers'), '3534'), 'Financial consultant')]
+```
+
+## 📖 Read more
+
+Read more about the methods and evaluation of the SOCMapper [here](https://github.com/nestauk/nlp-link/soc_mapper/README.md).