Skip to content

Commit

Permalink
better readme, small fix
Browse files Browse the repository at this point in the history
  • Loading branch information
fpgmaas committed Jun 23, 2024
1 parent 95abfb8 commit 1656ad3
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 3 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,8 @@ There are three methods to run the setup script, dependent on if you have a NVID
- [Option 3: Using Docker without NVIDIA GPU and NVIDIA Container Toolkit](SETUP.md#option-3-using-docker-without-nvidia-gpu-and-nvidia-container-toolkit)

> [!NOTE]
> Although the dataset contains all packages on PyPI with more than 50 weekly downloads, by default only the top 40% of this dataset (those with more than approximately 250 downloads per week) are added to the vector database. To include packages with less weekly downloads in the database, you can increase the value of `FRAC_DATA_TO_INCLUDE` in `pypi_scout/config.py`.
> The dataset contains approximately 100.000 packages on PyPI with more than 100 weekly downloads. To speed up local development,
> you can lower the amount of packages that is processed locally by lowering the value of `FRAC_DATA_TO_INCLUDE` in `pypi_scout/config.py`.
#### 3. **Run the Application**

Expand Down
2 changes: 1 addition & 1 deletion frontend/app/components/InfoBox.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ const InfoBox: React.FC<InfoBoxProps> = ({ infoBoxVisible }) => {
<br />
<p className="text-gray-100">
Once you click search, your query will be matched against the summary
and the first part of the description of the ~50.000 most popular
and the first part of the description of the ~100.000 most popular
packages on PyPI, which includes all packages with at least ~100
downloads per week. The results are then scored based on their
similarity to the query and their number of weekly downloads, and the
Expand Down
2 changes: 1 addition & 1 deletion pypi_scout/data/raw_data_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def read(self):
DataFrame: The processed dataframe.
"""
df = pl.read_csv(self.raw_dataset)
df = df.with_columns(weekly_downloads=pl.col("number_of_downloads").round().cast(pl.Int32))
df = df.with_columns(weekly_downloads=pl.col("number_of_downloads").cast(pl.Int32))
df = df.drop("number_of_downloads")
df = df.unique(subset="name")
df = df.filter(~(pl.col("description").is_null() & pl.col("summary").is_null()))
Expand Down

0 comments on commit 1656ad3

Please sign in to comment.