Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let crawler check url is reachable. #51

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tribela
Copy link
Member

@tribela tribela commented May 29, 2014

Check url is reachable using crawler module.

@tribela tribela changed the title Let crawler check url is reachable. refs earthreader/web#44 Let crawler check url is reachable. May 29, 2014
@tribela
Copy link
Member Author

tribela commented May 29, 2014

refs earthreader/web#44

@dahlia
Copy link
Contributor

dahlia commented May 29, 2014

I think making autodiscovery()/crawl() to raise an exception when the given url is unreachable, rather than making an independent predicate function for reachability check. The most of library users who write client codes simply are unaware of it even if it’s documented while they use autodiscovery() or crawl(). So such kind of checks should be handled by functions that actually depend on the constraints, not done by separated functions.

@tribela
Copy link
Member Author

tribela commented May 29, 2014

https://github.com/earthreader/libearth/blob/master/libearth/parser/autodiscovery.py#L50
parser.autodiscovery takes 2 parameters to find out document format.
It can be remove document parameter and crawl document itself using crawler. and raise another exception.

@dahlia
Copy link
Contributor

dahlia commented May 29, 2014

How about making other higher one more?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants