feat: add support for nested fields #1
Merged
+159
−28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Type
Summary
This pull request introduces nested field support for HTML data extraction, enhancing the
createScraper
function to handle nested structures within the data schema. Additionally, improvements were made to the README for better readability and documentation on these new nested field features.Changes Made
README Enhancements:
Nested Field Support in Types:
FieldDefinition
type to support nested field definitions through a newNestedFieldDefinition
type. This update allows fields to be defined with sub-fields, enabling the extraction of complex HTML structures.Helper Function
extractData
:extractData
withincreateScraper.ts
to manage nested field extraction recursively. This function processes each field in the schema, distinguishing between simple and nested fields, and extracting values accordingly.Update to
createScraper
:createScraper
to incorporateextractData
, enhancing its capability to parse nested structures while maintaining backward compatibility.How to Test
og:image
,og:image:width
, andog:image:height
.Possible Regressions
Screenshots/Logs
Additional Notes
These changes allow for greater flexibility and extensibility in parsing structured HTML content with nested attributes, broadening the scraper’s utility for more complex data extraction tasks.