Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Augment the information getting fetched from a webpage #203

Merged
merged 6 commits into from
May 10, 2024

Conversation

mayurdb
Copy link
Contributor

@mayurdb mayurdb commented May 10, 2024

These are follow-up changes from the discussion #187

We are now adding a mechanism to fetch the contents of the webpage using beautifulsoup. Apart from the header and body are now also fetching all the urls on the webpage.

We will need some work to create a navigable URLs from the current ones as sometimes they are just pointing to sub-pages within the webside (see the example below)

This getting the navigable url and cleaning up the relevant urls will be taken up in a separate change

Parsed content: Title: 
   Projects | Marco Perini
  , Body: <body class=fixed-top-nav><header><nav class="navbar navbar-light navbar-expand-sm fixed-top"id=navbar><div class=container><a class="navbar-brand title font-weight-lighter"href=/> <span class=font-weight-bold> Marco </span> Perini </a><button aria-label="Toggle navigation"class="navbar-toggler collapsed ml-auto"aria-
.
.
.
Hosted by <a rel="external nofollow noopener"href=https://pages.github.com/ target=_blank> GitHub Pages </a> .</div></footer>, 
URLs in page: ['/', '/', '#', '/projects/', '/competitions/', '/cv/', '/projects/rotary-pendulum-rl/', 'https://github.com/PeriniM/DQN-SwingUp', 'https://github.com/PeriniM/Multi-Agents-HAED', '/projects/wireless-esc-drone/', 'https://jekyllrb.com/', 'https://github.com/alshedivat/al-folio', 'https://pages.github.com/']

kahwoo and others added 5 commits May 8, 2024 20:39
fix formatting and add other needed models
New release, many new features and bug-fix
## [0.10.0](ScrapeGraphAI/Scrapegraph-ai@v0.9.0...v0.10.0) (2024-05-08)

### Features

* add claude documentation ([5bdee55](ScrapeGraphAI@5bdee55))
* add gemini embeddings ([79daa4c](ScrapeGraphAI@79daa4c))
* add llava integration ([019b722](ScrapeGraphAI@019b722))
* add new hugging_face models ([d5547a4](ScrapeGraphAI@d5547a4))
* Fix bug for gemini case when embeddings config not passed ([726de28](ScrapeGraphAI@726de28))
* fixed custom_graphs example and robots_node ([84fcb44](ScrapeGraphAI@84fcb44))
* multiple graph instances ([dbb614a](ScrapeGraphAI@dbb614a))
* **node:** multiple url search in SearchGraph + fixes ([930adb3](ScrapeGraphAI@930adb3))
* refactoring search function ([aeb1acb](ScrapeGraphAI@aeb1acb))

### Bug Fixes

* bug on .toml ([f7d66f5](ScrapeGraphAI@f7d66f5))
* **llm:** fixed gemini api_key ([fd01b73](ScrapeGraphAI@fd01b73))
* **examples:** local, mixed models and fixed SearchGraph embeddings problem ([6b71ec1](ScrapeGraphAI@6b71ec1))
* **examples:** openai std examples ([186c0d0](ScrapeGraphAI@186c0d0))
* removed .lock file for deployment ([d4c7d4e](ScrapeGraphAI@d4c7d4e))

### Docs

* update README.md ([17ec992](ScrapeGraphAI@17ec992))

### CI

* **release:** 0.10.0-beta.1 [skip ci] ([c47a505](ScrapeGraphAI@c47a505))
* **release:** 0.10.0-beta.2 [skip ci] ([3f0e069](ScrapeGraphAI@3f0e069))
* **release:** 0.9.0-beta.2 [skip ci] ([5aa600c](ScrapeGraphAI@5aa600c))
* **release:** 0.9.0-beta.3 [skip ci] ([da8c72c](ScrapeGraphAI@da8c72c))
* **release:** 0.9.0-beta.4 [skip ci] ([8c5397f](ScrapeGraphAI@8c5397f))
* **release:** 0.9.0-beta.5 [skip ci] ([532adb6](ScrapeGraphAI@532adb6))
* **release:** 0.9.0-beta.6 [skip ci] ([8c0b46e](ScrapeGraphAI@8c0b46e))
* **release:** 0.9.0-beta.7 [skip ci] ([6911e21](ScrapeGraphAI@6911e21))
* **release:** 0.9.0-beta.8 [skip ci] ([739aaa3](ScrapeGraphAI@739aaa3))
@VinciGit00 VinciGit00 requested review from VinciGit00 and removed request for VinciGit00 May 10, 2024 08:08
@VinciGit00 VinciGit00 changed the base branch from main to pre/beta May 10, 2024 09:09
@VinciGit00 VinciGit00 merged commit 4e62689 into ScrapeGraphAI:pre/beta May 10, 2024
Copy link

🎉 This PR is included in version 0.11.0-beta.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Copy link

🎉 This PR is included in version 0.11.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants