Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update #170

Merged
merged 66 commits into from
May 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
8b94fe8
Merge pull request #136 from VinciGit00/pre/beta
VinciGit00 May 3, 2024
39f64e5
add claude model
VinciGit00 May 3, 2024
5bdee55
feat: add claude documentation
VinciGit00 May 3, 2024
aeb1acb
feat: refactoring search function
VinciGit00 May 3, 2024
f7d66f5
fix: bug on .toml
VinciGit00 May 3, 2024
8c6dda6
Merge pull request #138 from VinciGit00/new-search-function
PeriniM May 5, 2024
5aa600c
ci(release): 0.9.0-beta.2 [skip ci]
semantic-release-bot May 5, 2024
0ab7272
Merge branch 'pre/beta' into 133-support-claude3-haiku-and-others-usi…
PeriniM May 5, 2024
c06e1e9
Merge pull request #137 from VinciGit00/133-support-claude3
PeriniM May 5, 2024
da8c72c
ci(release): 0.9.0-beta.3 [skip ci]
semantic-release-bot May 5, 2024
b7539d4
Added update_config function to base_node.py
epage480 May 5, 2024
c2c6162
Corrected logic of update_config function in base_node.py
epage480 May 5, 2024
729d5d7
Changed node_config["llm"] to node_config["llm_model"]
epage480 May 5, 2024
2178485
Adjusted graphs to reflect node_config change
epage480 May 5, 2024
444a13a
Created set_common_params function
epage480 May 5, 2024
4dc6049
Simplified create graph functions using common params
epage480 May 5, 2024
79daa4c
feat: add gemini embeddings
VinciGit00 May 5, 2024
8d0e109
Added overwrite keyword to set_common_params`
epage480 May 5, 2024
a53e95c
Corrected graphs to use common params
epage480 May 5, 2024
3ae2ea1
Miscellaneous "llm" -> "llm_model" refactors
epage480 May 5, 2024
f10a44a
Resolved key error "llm" -> "llm_model"
epage480 May 5, 2024
cc27b21
Merge branch 'pre/beta' into pass-common-params-graph
epage480 May 5, 2024
3bef9bb
Merge pull request #154 from epage480/pass-common-params-graph
VinciGit00 May 5, 2024
36a1522
Merge pull request #153 from VinciGit00/google_embeddings
PeriniM May 5, 2024
8c5397f
ci(release): 0.9.0-beta.4 [skip ci]
semantic-release-bot May 5, 2024
84fcb44
feat: fixed custom_graphs example and robots_node
PeriniM May 5, 2024
16f53c5
add example custom search graph
PeriniM May 5, 2024
1c4ba91
exposed abstract_graph allowing the user to create new graphs
PeriniM May 5, 2024
dbb614a
feat: multiple graph instances
PeriniM May 5, 2024
930adb3
feat(node): multiple url search in SearchGraph + fixes
PeriniM May 5, 2024
88d999e
add website content
VinciGit00 May 6, 2024
d9a4ab2
Delete custom_search_graph.py
VinciGit00 May 6, 2024
e6387d7
Merge pull request #155 from VinciGit00/graphs-iterator-node
VinciGit00 May 6, 2024
532adb6
ci(release): 0.9.0-beta.5 [skip ci]
semantic-release-bot May 6, 2024
389b52a
removed examples
VinciGit00 May 6, 2024
80053a2
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 May 6, 2024
5e1d5db
Update search_internet_node.py
VinciGit00 May 6, 2024
89a1f99
add lava integration for ollama
VinciGit00 May 6, 2024
019b722
feat: add llava integration
VinciGit00 May 6, 2024
726de28
feat: Fix bug for gemini case when embeddings config not passed
shkamboj1 May 6, 2024
77505aa
Merge pull request #3 from shkamboj1/pre/beta
shkamboj1 May 6, 2024
b0573a2
Merge pull request #158 from shorthills-ai/pre/beta
VinciGit00 May 6, 2024
8c0b46e
ci(release): 0.9.0-beta.6 [skip ci]
semantic-release-bot May 6, 2024
fd01b73
fix(llm): fixed gemini api_key
PeriniM May 6, 2024
b053953
Merge pull request #159 from VinciGit00/fix-gemini-apikey
PeriniM May 6, 2024
6911e21
ci(release): 0.9.0-beta.7 [skip ci]
semantic-release-bot May 6, 2024
ac0a2e5
Update models_tokens.py
VinciGit00 May 6, 2024
8c7c3e3
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 May 6, 2024
e264e92
Added support for Claude 3 models from Anthropic
cemkod May 6, 2024
2ac9e16
Fixed accidental reformatting.
cemkod May 6, 2024
d5547a4
feat: add new hugging_face models
f-aguzzi May 6, 2024
f6442cc
Merge pull request #157 from VinciGit00/llava_integration
VinciGit00 May 6, 2024
739aaa3
ci(release): 0.9.0-beta.8 [skip ci]
semantic-release-bot May 6, 2024
97c3fff
Merge pull request #162 from f-aguzzi/patch-1
VinciGit00 May 6, 2024
cbd77df
removed claude
VinciGit00 May 6, 2024
ac6d200
Merge branch 'pre/beta' of https://github.com/VinciGit00/Scrapegraph-…
VinciGit00 May 6, 2024
5a67bca
Merge branch 'pre/beta' into pr/161
VinciGit00 May 6, 2024
88f04bf
Merge pull request #161 from cemkod/main
VinciGit00 May 6, 2024
c47a505
ci(release): 0.10.0-beta.1 [skip ci]
semantic-release-bot May 6, 2024
2258fe5
add new search graph examples
VinciGit00 May 6, 2024
8632c0a
Merge pull request #169 from VinciGit00/main
VinciGit00 May 7, 2024
186c0d0
fix(examples): openai std examples
PeriniM May 8, 2024
6b71ec1
fix(examples): local, mixed models and fixed SearchGraph embeddings p…
PeriniM May 8, 2024
71fcdfa
Merge pull request #177 from VinciGit00/fix-bugs
PeriniM May 8, 2024
d4c7d4e
fix: removed .lock file for deployment
PeriniM May 8, 2024
3f0e069
ci(release): 0.10.0-beta.2 [skip ci]
semantic-release-bot May 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,4 @@ examples/graph_examples/ScrapeGraphAI_generated_graph
examples/**/result.csv
examples/**/result.json
main.py
poetry.lock

# lock files
*.lock
poetry.lock

97 changes: 87 additions & 10 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,104 @@
## [0.9.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.8.0...v0.9.0) (2024-05-04)
## [0.10.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0-beta.1...v0.10.0-beta.2) (2024-05-08)


### Bug Fixes

* **examples:** local, mixed models and fixed SearchGraph embeddings problem ([6b71ec1](https://github.com/VinciGit00/Scrapegraph-ai/commit/6b71ec1d2be953220b6767bc429f4cf6529803fd))
* **examples:** openai std examples ([186c0d0](https://github.com/VinciGit00/Scrapegraph-ai/commit/186c0d035d1d211aff33c38c449f2263d9716a07))
* removed .lock file for deployment ([d4c7d4e](https://github.com/VinciGit00/Scrapegraph-ai/commit/d4c7d4e7fcc2110beadcb2fc91efc657ec6a485c))


### Docs

* update README.md ([17ec992](https://github.com/VinciGit00/Scrapegraph-ai/commit/17ec992b498839e001277e7bc3f0ebea49fbd00d))

## [0.10.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0...v0.10.0-beta.1) (2024-05-06)


### Features

* Enable end users to pass model instances of HuggingFaceHub ([7599234](https://github.com/VinciGit00/Scrapegraph-ai/commit/7599234ab9563ca4ee9b7f5b2d0267daac621ecf))
* add claude documentation ([5bdee55](https://github.com/VinciGit00/Scrapegraph-ai/commit/5bdee558760521bab818efc6725739e2a0f55d20))
* add gemini embeddings ([79daa4c](https://github.com/VinciGit00/Scrapegraph-ai/commit/79daa4c112e076e9c5f7cd70bbbc6f5e4930832c))
* add llava integration ([019b722](https://github.com/VinciGit00/Scrapegraph-ai/commit/019b7223dc969c87c3c36b6a42a19b4423b5d2af))
* add new hugging_face models ([d5547a4](https://github.com/VinciGit00/Scrapegraph-ai/commit/d5547a450ccd8908f1cf73707142b3481fbc6baa))
* Fix bug for gemini case when embeddings config not passed ([726de28](https://github.com/VinciGit00/Scrapegraph-ai/commit/726de288982700dab8ab9f22af8e26f01c6198a7))
* fixed custom_graphs example and robots_node ([84fcb44](https://github.com/VinciGit00/Scrapegraph-ai/commit/84fcb44aaa36e84f775884138d04f4a60bb389be))
* multiple graph instances ([dbb614a](https://github.com/VinciGit00/Scrapegraph-ai/commit/dbb614a8dd88d7667fe3daaf0263f5d6e9be1683))
* **node:** multiple url search in SearchGraph + fixes ([930adb3](https://github.com/VinciGit00/Scrapegraph-ai/commit/930adb38f2154ba225342466bfd1846c47df72a0))
* refactoring search function ([aeb1acb](https://github.com/VinciGit00/Scrapegraph-ai/commit/aeb1acbf05e63316c91672c99d88f8a6f338147f))


### Bug Fixes

* trailing whitespace ([2878695](https://github.com/VinciGit00/Scrapegraph-ai/commit/2878695d5f35cc9d81f24e4844fdc1988d10cb26))
* bug on .toml ([f7d66f5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f7d66f51818dbdfddd0fa326f26265a3ab686b20))
* **llm:** fixed gemini api_key ([fd01b73](https://github.com/VinciGit00/Scrapegraph-ai/commit/fd01b73b71b515206cfdf51c1d52136293494389))


### Build
### CI

* **deps:** bump tqdm from 4.66.1 to 4.66.3 ([0a17c74](https://github.com/VinciGit00/Scrapegraph-ai/commit/0a17c74e50d0457aec289e81183e9c779c735842))
* **deps:** bump tqdm from 4.66.1 to 4.66.3 ([aff6f98](https://github.com/VinciGit00/Scrapegraph-ai/commit/aff6f983b02a37ced21826847a6ace5fb15ecf3d))
* **release:** 0.9.0-beta.2 [skip ci] ([5aa600c](https://github.com/VinciGit00/Scrapegraph-ai/commit/5aa600cb0a85d320ad8dc786af26ffa46dd4d097))
* **release:** 0.9.0-beta.3 [skip ci] ([da8c72c](https://github.com/VinciGit00/Scrapegraph-ai/commit/da8c72ce138bcfe2627924d25a67afcd22cfafd5))
* **release:** 0.9.0-beta.4 [skip ci] ([8c5397f](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c5397f67a9f05e0c00f631dd297b5527263a888))
* **release:** 0.9.0-beta.5 [skip ci] ([532adb6](https://github.com/VinciGit00/Scrapegraph-ai/commit/532adb639d58640bc89e8b162903b2ed97be9853))
* **release:** 0.9.0-beta.6 [skip ci] ([8c0b46e](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c0b46eb40b446b270c665c11b2c6508f4d5f4be))
* **release:** 0.9.0-beta.7 [skip ci] ([6911e21](https://github.com/VinciGit00/Scrapegraph-ai/commit/6911e21584767460c59c5a563c3fd010857cbb67))
* **release:** 0.9.0-beta.8 [skip ci] ([739aaa3](https://github.com/VinciGit00/Scrapegraph-ai/commit/739aaa33c39c12e7ab7df8a0656cad140b35c9db))

## [0.9.0-beta.8](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.7...v0.9.0-beta.8) (2024-05-06)

### CI

* **release:** 0.8.0-beta.1 [skip ci] ([d277b34](https://github.com/VinciGit00/Scrapegraph-ai/commit/d277b349a98848749a7e38ea3c511271bced3b71))
* **release:** 0.8.0-beta.2 [skip ci] ([892500a](https://github.com/VinciGit00/Scrapegraph-ai/commit/892500afe93c4d96dcffe897b382977a22079b83))
* **release:** 0.9.0-beta.1 [skip ci] ([14615a7](https://github.com/VinciGit00/Scrapegraph-ai/commit/14615a73c71bb5250772a75c415c57cb153660f8))
### Features

* add llava integration ([019b722](https://github.com/VinciGit00/Scrapegraph-ai/commit/019b7223dc969c87c3c36b6a42a19b4423b5d2af))

## [0.9.0-beta.7](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.6...v0.9.0-beta.7) (2024-05-06)


### Bug Fixes

* **llm:** fixed gemini api_key ([fd01b73](https://github.com/VinciGit00/Scrapegraph-ai/commit/fd01b73b71b515206cfdf51c1d52136293494389))

## [0.9.0-beta.6](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.5...v0.9.0-beta.6) (2024-05-06)


### Features

* Fix bug for gemini case when embeddings config not passed ([726de28](https://github.com/VinciGit00/Scrapegraph-ai/commit/726de288982700dab8ab9f22af8e26f01c6198a7))

## [0.9.0-beta.5](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.4...v0.9.0-beta.5) (2024-05-06)


### Features

* fixed custom_graphs example and robots_node ([84fcb44](https://github.com/VinciGit00/Scrapegraph-ai/commit/84fcb44aaa36e84f775884138d04f4a60bb389be))
* multiple graph instances ([dbb614a](https://github.com/VinciGit00/Scrapegraph-ai/commit/dbb614a8dd88d7667fe3daaf0263f5d6e9be1683))
* **node:** multiple url search in SearchGraph + fixes ([930adb3](https://github.com/VinciGit00/Scrapegraph-ai/commit/930adb38f2154ba225342466bfd1846c47df72a0))

## [0.9.0-beta.4](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.3...v0.9.0-beta.4) (2024-05-05)


### Features

* add gemini embeddings ([79daa4c](https://github.com/VinciGit00/Scrapegraph-ai/commit/79daa4c112e076e9c5f7cd70bbbc6f5e4930832c))

## [0.9.0-beta.3](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.2...v0.9.0-beta.3) (2024-05-05)


### Features

* add claude documentation ([5bdee55](https://github.com/VinciGit00/Scrapegraph-ai/commit/5bdee558760521bab818efc6725739e2a0f55d20))

## [0.9.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.1...v0.9.0-beta.2) (2024-05-05)


### Features

* refactoring search function ([aeb1acb](https://github.com/VinciGit00/Scrapegraph-ai/commit/aeb1acbf05e63316c91672c99d88f8a6f338147f))


### Bug Fixes

* bug on .toml ([f7d66f5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f7d66f51818dbdfddd0fa326f26265a3ab686b20))

## [0.9.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.8.0...v0.9.0-beta.1) (2024-05-04)

Expand Down
1 change: 1 addition & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
## Reporting a Vulnerability

For reporting a vulnerability contact directly [email protected]

59 changes: 59 additions & 0 deletions examples/anthropic/smart_scraper_haiku.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
"""
Basic example of scraping pipeline using SmartScraper using Azure OpenAI Key
"""

import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info
from langchain_community.llms import HuggingFaceEndpoint
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings


# required environment variables in .env
# HUGGINGFACEHUB_API_TOKEN
# ANTHROPIC_API_KEY
load_dotenv()

HUGGINGFACEHUB_API_TOKEN = os.getenv('HUGGINGFACEHUB_API_TOKEN')
# ************************************************
# Initialize the model instances
# ************************************************


embedder_model_instance = HuggingFaceInferenceAPIEmbeddings(
api_key=HUGGINGFACEHUB_API_TOKEN, model_name="sentence-transformers/all-MiniLM-l6-v2"
)

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000},
"embeddings": {"model_instance": embedder_model_instance}
}

smart_scraper_graph = SmartScraperGraph(
prompt="""Don't say anything else. Output JSON only. List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time,
event_end_date, event_end_time, location, event_mode, event_category,
third_party_redirect, no_of_days,
time_in_hours, hosted_or_attending, refreshments_type,
registration_available, registration_link""",
# also accepts a string with the already downloaded HTML code
source="https://www.hmhco.com/event",
config=graph_config
)

result = smart_scraper_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
"""
Basic example of scraping pipeline using JSONScraperGraph from JSON documents
Example of Search Graph
"""

import os
from dotenv import load_dotenv
from scrapegraphai.graphs import JSONScraperGraph
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from scrapegraphai.graphs import SearchGraph
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
load_dotenv()

# ************************************************
# Read the JSON file
# ************************************************

FILE_NAME = "inputs/example.json"
curr_dir = os.path.dirname(os.path.realpath(__file__))
file_path = os.path.join(curr_dir, FILE_NAME)
Expand All @@ -20,42 +18,47 @@
text = file.read()

# ************************************************
# Define the configuration for the graph
# Initialize the model instances
# ************************************************

llm_model_instance = AzureChatOpenAI(
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
azure_deployment=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"]
)

embedder_model_instance = AzureOpenAIEmbeddings(
azure_deployment=os.environ["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME"],
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)

# ************************************************
# Create the JSONScraperGraph instance and run it
# ************************************************

graph_config = {
"llm": {
"model": "ollama/mistral",
"temperature": 0,
"format": "json", # Ollama needs the format to be specified explicitly
# "model_tokens": 2000, # set context length arbitrarily
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"temperature": 0,
}
"llm": {"model_instance": llm_model_instance},
"embeddings": {"model_instance": embedder_model_instance}
}

# ************************************************
# Create the JSONScraperGraph instance and run it
# Create the SearchGraph instance and run it
# ************************************************

json_scraper_graph = JSONScraperGraph(
prompt="List me all the authors, title and genres of the books",
source=text, # Pass the content of the file, not the file object
search_graph = SearchGraph(
prompt="List me the best escursions near Trento",
config=graph_config
)

result = json_scraper_graph.run()
result = search_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = json_scraper_graph.get_execution_info()
graph_exec_info = search_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))

# Save to json or csv
# Save to json and csv
convert_to_csv(result, "result")
convert_to_json(result, "result")
2 changes: 2 additions & 0 deletions examples/gemini/search_graph_gemini.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
"temperature": 0,
"streaming": True
},
"max_results": 5,
"verbose": True,
}

# ************************************************
Expand Down
54 changes: 0 additions & 54 deletions examples/local_models/Docker/csv_scraper_docker.py

This file was deleted.

Loading