Skip to content

Commit

Permalink
better README
Browse files Browse the repository at this point in the history
  • Loading branch information
florian committed Jun 15, 2024
1 parent 6d5e099 commit 74018b0
Showing 1 changed file with 2 additions and 55 deletions.
57 changes: 2 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# ✨PyPI Scout

PyPI Scout helps you find PyPI packages using natural language prompts powered by Large Language Models (LLMs).
PyPI Scout helps you find PyPI packages through natural language prompts using Large Language Models (LLM's).

![Demo](./demo.gif)

The project works by collecting project summaries and descriptions for all packages on PyPI with more than 50 weekly downloads. These are then converted into vector representations using [Sentence Transformers](https://www.sbert.net/). When the user enters a query, it is converted into a vector representation, and the most similar package descriptions are fetched from the vector database. Additional weight is given to weekly downloads before presenting the results to the user in a dashboard.
The project works by collecting project summaries and descriptions for all packages on PyPI with more than 50 weekly downloads. These are then converted into vector representations using [Sentence Transformers](https://www.sbert.net/). When the user enters a query, it is converted into a vector representation, and the most similar package descriptions are fetched from the vector database. Additional weight is given to the amount of weekly downloads before presenting the results to the user in a dashboard.

## Getting Started

Expand Down Expand Up @@ -54,56 +54,3 @@ After a short while, your application will be live at [http://localhost:3000](ht
## Data

The dataset for this project is created using the [PyPI dataset on Google BigQuery](https://console.cloud.google.com/marketplace/product/gcp-public-data-pypi/pypi?project=regal-net-412415). The SQL query used can be found in [pypi_bigquery.sql](./pypi_bigquery.sql). The resulting dataset is available as a CSV file on [Google Drive](https://drive.google.com/file/d/1huR7-VD3AieBRCcQyRX9MWbPLMb_czjq/view?usp=sharing).

## Running the setup script

Next to running the setup script with

```sh
poetry install
poetry run python pypi_scout/scripts/setup.py
```

you can also run it using the Docker image. Below are two options, depending on whether you have an NVIDIA GPU and the NVIDIA Container Toolkit installed.

### Option 1: With NVIDIA GPU and NVIDIA Container Toolkit

Build the Docker image with

```sh
docker build -t pypi-scout .
```

Then run:

```sh
docker run --rm \
--gpus all \
--env-file .env \
-v $(pwd)/data:/code/data \
pypi-scout \
python /code/pypi_scout/scripts/setup.py
```

**Option 2: Without NVIDIA GPU and NVIDIA Container Toolkit**

If you do not have an NVIDIA GPU or the NVIDIA Container Toolkit installed, omit `--gpus all` in the command above:

```sh
docker run --rm \
--env-file .env \
-v $(pwd)/data:/code/data \
pypi-scout \
python /code/pypi_scout/scripts/setup.py
```

Alternatively, you can use Poetry to set up the environment and run the setup script directly:

```sh
poetry install
poetry run python /code/pypi_scout/scripts/setup.py
```

---

By following these instructions, you'll have PyPI Scout up and running, enabling you to find the best PyPI packages with ease using natural language queries. Enjoy exploring!

0 comments on commit 74018b0

Please sign in to comment.