You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently each of the pages are considered on their own. This isn't a flexible enough model, when some page's data is important in the context of the parent page - for example you're crawling some website to download all the images it links to, but want to know the anchor texts for each of them.
The solution would be to have ParserLogic.extract_uris() return {tag :: term, uri :: URI.t} | URI.t, where the latter would be converted to {nil, uri} and have it passed as an aditional argument to ParserLogic.parse(). The user could then choose to (or not to) include the parent tag in the parse_result for their convenience.
This will require some manual labour, but shouldn't be too bad ;)
The text was updated successfully, but these errors were encountered:
Currently each of the pages are considered on their own. This isn't a flexible enough model, when some page's data is important in the context of the parent page - for example you're crawling some website to download all the images it links to, but want to know the anchor texts for each of them.
The solution would be to have
ParserLogic.extract_uris()
return{tag :: term, uri :: URI.t} | URI.t
, where the latter would be converted to{nil, uri}
and have it passed as an aditional argument toParserLogic.parse()
. The user could then choose to (or not to) include the parent tag in theparse_result
for their convenience.This will require some manual labour, but shouldn't be too bad ;)
The text was updated successfully, but these errors were encountered: