scraping-script-llm

This is the script for scraping IITBBS website. The workflow of the repo is as follows:

Run link_scraper_dfs.py to extract the links from the website.This stores the links to crawled_urls.csv.
Now run the scraping script.py to extract content from the links collected in cawled_ursl.csv and save it to scraped_data_dfs.json.
Runpreprocessed_data.py to remove unwanted characters from the collected data and store it to cleaned_json.py.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
len_json.py		len_json.py
links_scraper_bfs.py		links_scraper_bfs.py
links_scraper_dfs.py		links_scraper_dfs.py
package-lock.json		package-lock.json
package.json		package.json
preprocessed_data.py		preprocessed_data.py
requirements.txt		requirements.txt
scrapin_script_big.py		scrapin_script_big.py
scraping_script.py		scraping_script.py

Provide feedback