Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Title: Implement Graph-Based Scraping Logic with SmartScraper
Description:
This pull request introduces a graph-based approach to web scraping, centralizing around the implementation of a new class, SmartScraper. The SmartScraper class serves as a base for constructing scraping workflows using a directed graph of nodes, each representing a distinct step in the scraping process, such as fetching HTML, extracting probable tags, and generating answers based on user queries.
Key components added in this PR include:
BaseNode
and its subclasses (FetchHTMLNode
,GetProbableTagsNode
,ParseHTMLNode
,GenerateAnswerNode
,ConditionalNode
) for creating versatile and reusable scraping operations.BaseGraph
for managing the execution flow among nodes.SmartScraper
class, which encapsulates the graph logic and simplifies the creation of scraping tasks.Example Usage:
Below is a brief example demonstrating how to use the SmartScraper to extract information from a webpage:
Future Work