Skip to content

Commit

Permalink
Rerun Croissant Health reports for Hugging Face and OpenML (#660)
Browse files Browse the repository at this point in the history
  • Loading branch information
marcenacp authored Jun 5, 2024
1 parent d65b6ce commit 0f95e04
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 805 deletions.
2 changes: 1 addition & 1 deletion health/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ pip install -r requirements.txt

# Test the spider locally.
# In huggingface.py you can uncomment the line in
# `start_requests` to produce crawl fake data.
# `list_datasets` to produce crawl fake data.
scrapy crawl huggingface

# When you're ready, the following commands launch a new job:
Expand Down
4 changes: 3 additions & 1 deletion health/crawler/spiders/openml.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,6 @@ def list_datasets(self):

def get_url(self, dataset_id: str):
"""See base class."""
return f"https://openml1.win.tue.nl/dataset{dataset_id}/croissant.json"
return (
f"https://openml1.win.tue.nl/{dataset_id // 10000:04d}/{dataset_id:04d}/dataset_{dataset_id}_croissant.json"
)
850 changes: 47 additions & 803 deletions health/visualizer/report_huggingface.ipynb

Large diffs are not rendered by default.

0 comments on commit 0f95e04

Please sign in to comment.