Skip to content

Commit

Permalink
Update main readme
Browse files Browse the repository at this point in the history
  • Loading branch information
lizgzil committed Oct 31, 2024
1 parent 178e451 commit 2b2560d
Showing 1 changed file with 72 additions and 28 deletions.
100 changes: 72 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,106 @@
# nlp-link
# 🖇️ NLP Link

A python package to semantically link two lists of texts.
NLP Link finds the most similar word (or words) in a reference list to an inputted word. For example, if you are trying to find which word is most similar to 'puppies' from a reference list of `['cats', 'dogs', 'rats', 'birds']`, nlp-link will return 'dogs'.

## Set-up
# 🗺️ SOC Mapper

In setting up this project we ran:
Another functionality of this package is using the linking methodology to find the [Standard Occupation Classification (SOC)](https://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc) code most similar to an inputted job title. More on this [here](./page1.md).

```
conda create --name nlp-link pip python=3.9
conda activate nlp-link
pip install poetry
pip install pre-commit black
pre-commit install
```
## 🔨 Usage

```
poetry init
Install the package using pip:

```bash
pip install nlp-link
```

```
poetry install
### Basic usage

```
Match two lists in python:

## Usage
```python

```
from nlp_link.linker import NLPLinker

nlp_link = NLPLinker()

# dict inputs
comparison_data = {'a': 'cats', 'b': 'dogs', 'd': 'rats', 'e': 'birds'}
input_data = {'x': 'owls', 'y': 'feline', 'z': 'doggies', 'za': 'dogs', 'zb': 'chair'}
nlp_link.load(comparison_data)
matches = nlp_link.link_dataset(input_data)
# Top match output
print(matches)
# list inputs
comparison_data = ['cats', 'dogs', 'rats', 'birds']
input_data = ['owls', 'feline', 'doggies', 'dogs','chair']
nlp_link.load(comparison_data)
matches = nlp_link.link_dataset(input_data)
# Top match output
print(matches)

```

Which outputs:

```
input_id input_text link_id link_text similarity
0 0 owls 3 birds 0.613577
1 1 feline 0 cats 0.669633
2 2 doggies 1 dogs 0.757443
3 3 dogs 1 dogs 1.000000
4 4 chair 0 cats 0.331178
```

### SOC Mapping

Match a list of job titles to SOC codes:

```
from nlp_link.soc_mapper.soc_map import SOCMapper
soc_mapper = SOCMapper()
soc_mapper.load()
job_titles=["data scientist", "Assistant nurse", "Senior financial consultant - London"]
soc_mapper.get_soc(job_titles, return_soc_name=True)
```

Which will output

```
[((('2433/04', 'Statistical data scientists'), ('2433', 'Actuaries, economists and statisticians'), '2425'), 'Data scientist'), ((('6131/99', 'Nursing auxiliaries and assistants n.e.c.'), ('6131', 'Nursing auxiliaries and assistants'), '6141'), 'Assistant nurse'), ((('2422/02', 'Financial advisers and planners'), ('2422', 'Finance and investment analysts and advisers'), '3534'), 'Financial consultant')]
```

## Contributing

The instructions here are for those contrbuting to the repo.

### Set-up

In setting up this project we ran:

```
conda create --name nlp-link pip python=3.9
conda activate nlp-link
pip install poetry
pip install pre-commit black
pre-commit install
```

```
poetry init
```

```
poetry install
```

## Tests
### Tests

To run tests:

```
poetry run pytest tests/
```

## Documentation
### Documentation

Docs for this repo are automatically published to gh-pages branch via. Github actions after a PR is merged into main. We use Material for MkDocs for these. Nothing needs to be done to update these.

Expand Down

0 comments on commit 2b2560d

Please sign in to comment.