-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implementation of additional requests #22
Conversation
Codecov Report
@@ Coverage Diff @@
## master #22 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 6 11 +5
Lines 224 285 +61
=========================================
+ Hits 224 285 +61
|
6507672
to
391a449
Compare
web_poet/requests.py
Outdated
url: str | ||
method: str = "GET" | ||
headers: Optional[mapping] = None | ||
body: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be bytes, not str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be split into raw_data: bytes
and html: str
into a separate HttpResponseBody
as per our recent discussions.
I've opened up a separate PR in #24 to open up the deep discussion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We addressed it for Response, not we need to fix it for the Request in some way (which could actually be different).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this in 3a76bc3
2e973aa
to
c5537ce
Compare
326c9ed
to
74e5c89
Compare
b02b2dd
to
9025660
Compare
…nd key requirements
implementation** from a given framework is injected to it. | ||
|
||
|
||
Exception Handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should describe error handling before describing backends - just give some examples, of how to handle errors for
- individual requests (e.g. get or post)
- batch requests: what're the semantics, is it possible to get partial data, etc.
There is no need to move the whole section above. I think our docs could have 2 main parts, aimed to a different type of users:
- People who write Page Objects, and need additional requests
- People who're implementing web-poet intergations. Information about backends, etc. is mostly for this group.
I think it'd be good to make it clear what to do for (1) first, and then provide information for (2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, that makes sense. Thanks for raising it!
I moved the Exception Handling section right after discussing how to make additional requests so it's a smoother transition to users developing POs (as you've pointed out). Also written down more examples regarding execption handling in 7947d58
Regarding batch_requests()
, thanks for pointing out about retrieving partial data as I haven't considered it before. The behavior is now updated in 5d2a751
web_poet/requests.py
Outdated
""" | ||
|
||
coroutines = [self.request_downloader(r) for r in requests] | ||
responses = await asyncio.gather(*coroutines, return_exceptions=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about exposing return_exceptions option? It could make sense to keep the interface similar to asyncio.gather. I also wonder if we should keep False as a default; it may be more intuitive. Otherwise, user would always need to add some code to the PO itself to handle the exceptions; with return_exceptions=False an error would be just raised & propagated, which may be a right behavior in many cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing return_exceptions
would be a good idea, as it could prevent some additional effort from the the user to manually tweak the behavior. 👍 Updated this on bc4baea
Regarding on having its default value set to False
: I do agree that it would be consistent with the underlying asyncio.gather()
API default expectations.
However, I'm not quite sure if we're encouraging the right behavior to the user. For our use case, executing HTTP requests incurs more cost than other coroutine operations that asyncio could have. As such, we'd be interested in encouraging the behavior of salvaging any usable HTTP Responses that the user can. Having the exceptions err up when return_exceptions=False
would lose track of usable responses altogether.
Moreover, it doesn't promote future possibilities like retrying a specific failed request from a batch_request()
since the developer doesn't know which request had the issue.
Let me know what you think. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kmike , as discussed offline last week, we'll be having the default value of return_exceptions
set to False
.
This leads to a more meaningful exception message that explicitly logs HttpRequestError
as opposed to something like "css method not found in ..." since the developer has not automatically handled the returned exception. Raising them by default makes the developer more aware on how to handle them.
Updated in 80f95a8.
from .page_inputs import HttpResponse, HttpResponseBody, HttpResponseHeaders | ||
from .requests import ( | ||
request_backend_var, | ||
HttpClient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's interesting is that HttpClient also can be a PO dependency :) I wonder if we should part from having page_inputs module, and just split it. E.g. web_poet.http, or something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good thought to ponder. I think we have 3 main choices for this which are:
- Move
HttpClient
from requests.py to page_inputs.py - Create a page_inputs/ subpackage which houses page_inputs/meta.py and page_inputs/http.py (
HttpClient
is here) - Disband the page_inputs.py and simply have modules like meta.py and http.py (
HttpClient
is here)
I'm torn between 1 and 2 since having some sort of subpackage/module named page_inputs helps portray the idea that anything within it could be a PO dependency.
What do you think? Perhaps there might be other good options aside from these that I might've missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure :) Let's handle it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I've filed a draft in #37 which explores option 2 to see how it may look like.
Hello @kmike, Thanks for reviewing this PR again last week. 🙌 Aside from these large changes:
Small doc-related changes are included in 753e6ad which includes (but not limited to):
Lastly, I've created another PR in #38 to explore the default behavior of raising exceptions when receiving 400-5xx responses. |
Thanks so much @BurnzZ! I've checked the PR again, it looks good 💪 We might think about re-organizing docs a bit, but the content is there; this can be done later. |
Thanks for your patience in reviewing this one @kmike ! 🙏 |
Progress