Make it possible to pass some information from the parent page #31

nietaki · 2017-05-16T21:29:39Z

Currently each of the pages are considered on their own. This isn't a flexible enough model, when some page's data is important in the context of the parent page - for example you're crawling some website to download all the images it links to, but want to know the anchor texts for each of them.

The solution would be to have ParserLogic.extract_uris() return {tag :: term, uri :: URI.t} | URI.t, where the latter would be converted to {nil, uri} and have it passed as an aditional argument to ParserLogic.parse(). The user could then choose to (or not to) include the parent tag in the parse_result for their convenience.

This will require some manual labour, but shouldn't be too bad ;)

The text was updated successfully, but these errors were encountered:

nietaki added the enhancement label May 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it possible to pass some information from the parent page #31

Make it possible to pass some information from the parent page #31

nietaki commented May 16, 2017

Make it possible to pass some information from the parent page #31

Make it possible to pass some information from the parent page #31

Comments

nietaki commented May 16, 2017