Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to pass some information from the parent page #31

Open
nietaki opened this issue May 16, 2017 · 0 comments
Open

Make it possible to pass some information from the parent page #31

nietaki opened this issue May 16, 2017 · 0 comments

Comments

@nietaki
Copy link
Owner

nietaki commented May 16, 2017

Currently each of the pages are considered on their own. This isn't a flexible enough model, when some page's data is important in the context of the parent page - for example you're crawling some website to download all the images it links to, but want to know the anchor texts for each of them.

The solution would be to have ParserLogic.extract_uris() return {tag :: term, uri :: URI.t} | URI.t, where the latter would be converted to {nil, uri} and have it passed as an aditional argument to ParserLogic.parse(). The user could then choose to (or not to) include the parent tag in the parse_result for their convenience.

This will require some manual labour, but shouldn't be too bad ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant