Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementation of additional requests #22

Merged
merged 49 commits into from
Apr 27, 2022
Merged

implementation of additional requests #22

merged 49 commits into from
Apr 27, 2022

Conversation

BurnzZ
Copy link
Contributor

@BurnzZ BurnzZ commented Feb 8, 2022

Progress

  • Testing across a wide-variety of scenarios/cases
    • contextvars and Twisted
    • scrapy-poet and scrapy integration
  • Docs
  • Tests
  • Changelog

@codecov
Copy link

codecov bot commented Feb 8, 2022

Codecov Report

Merging #22 (753e6ad) into master (a29d86d) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #22   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6        11    +5     
  Lines          224       285   +61     
=========================================
+ Hits           224       285   +61     
Impacted Files Coverage Δ
web_poet/__init__.py 100.00% <100.00%> (ø)
web_poet/_base.py 100.00% <100.00%> (ø)
web_poet/exceptions/__init__.py 100.00% <100.00%> (ø)
web_poet/exceptions/core.py 100.00% <100.00%> (ø)
web_poet/exceptions/http.py 100.00% <100.00%> (ø)
web_poet/page_inputs.py 100.00% <100.00%> (ø)
web_poet/requests.py 100.00% <100.00%> (ø)

url: str
method: str = "GET"
headers: Optional[mapping] = None
body: Optional[str] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be bytes, not str

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be split into raw_data: bytes and html: str into a separate HttpResponseBody as per our recent discussions.

I've opened up a separate PR in #24 to open up the deep discussion on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We addressed it for Response, not we need to fix it for the Request in some way (which could actually be different).

Copy link
Contributor Author

@BurnzZ BurnzZ Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this in 3a76bc3

web_poet/requests.py Outdated Show resolved Hide resolved
@BurnzZ BurnzZ mentioned this pull request Feb 25, 2022
3 tasks
@BurnzZ BurnzZ marked this pull request as ready for review March 14, 2022 12:08
@BurnzZ BurnzZ requested a review from kmike April 11, 2022 08:53
implementation** from a given framework is injected to it.


Exception Handling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should describe error handling before describing backends - just give some examples, of how to handle errors for

  1. individual requests (e.g. get or post)
  2. batch requests: what're the semantics, is it possible to get partial data, etc.

There is no need to move the whole section above. I think our docs could have 2 main parts, aimed to a different type of users:

  1. People who write Page Objects, and need additional requests
  2. People who're implementing web-poet intergations. Information about backends, etc. is mostly for this group.

I think it'd be good to make it clear what to do for (1) first, and then provide information for (2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, that makes sense. Thanks for raising it!

I moved the Exception Handling section right after discussing how to make additional requests so it's a smoother transition to users developing POs (as you've pointed out). Also written down more examples regarding execption handling in 7947d58

Regarding batch_requests(), thanks for pointing out about retrieving partial data as I haven't considered it before. The behavior is now updated in 5d2a751

@BurnzZ BurnzZ requested a review from kmike April 13, 2022 08:54
"""

coroutines = [self.request_downloader(r) for r in requests]
responses = await asyncio.gather(*coroutines, return_exceptions=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about exposing return_exceptions option? It could make sense to keep the interface similar to asyncio.gather. I also wonder if we should keep False as a default; it may be more intuitive. Otherwise, user would always need to add some code to the PO itself to handle the exceptions; with return_exceptions=False an error would be just raised & propagated, which may be a right behavior in many cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing return_exceptions would be a good idea, as it could prevent some additional effort from the the user to manually tweak the behavior. 👍 Updated this on bc4baea


Regarding on having its default value set to False: I do agree that it would be consistent with the underlying asyncio.gather() API default expectations.

However, I'm not quite sure if we're encouraging the right behavior to the user. For our use case, executing HTTP requests incurs more cost than other coroutine operations that asyncio could have. As such, we'd be interested in encouraging the behavior of salvaging any usable HTTP Responses that the user can. Having the exceptions err up when return_exceptions=False would lose track of usable responses altogether.

Moreover, it doesn't promote future possibilities like retrying a specific failed request from a batch_request() since the developer doesn't know which request had the issue.

Let me know what you think. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kmike , as discussed offline last week, we'll be having the default value of return_exceptions set to False.

This leads to a more meaningful exception message that explicitly logs HttpRequestError as opposed to something like "css method not found in ..." since the developer has not automatically handled the returned exception. Raising them by default makes the developer more aware on how to handle them.

Updated in 80f95a8.

web_poet/requests.py Outdated Show resolved Hide resolved
web_poet/requests.py Outdated Show resolved Hide resolved
web_poet/page_inputs.py Outdated Show resolved Hide resolved
from .page_inputs import HttpResponse, HttpResponseBody, HttpResponseHeaders
from .requests import (
request_backend_var,
HttpClient,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's interesting is that HttpClient also can be a PO dependency :) I wonder if we should part from having page_inputs module, and just split it. E.g. web_poet.http, or something like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good thought to ponder. I think we have 3 main choices for this which are:

  1. Move HttpClient from requests.py to page_inputs.py
  2. Create a page_inputs/ subpackage which houses page_inputs/meta.py and page_inputs/http.py (HttpClient is here)
  3. Disband the page_inputs.py and simply have modules like meta.py and http.py (HttpClient is here)

I'm torn between 1 and 2 since having some sort of subpackage/module named page_inputs helps portray the idea that anything within it could be a PO dependency.

What do you think? Perhaps there might be other good options aside from these that I might've missed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure :) Let's handle it separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I've filed a draft in #37 which explores option 2 to see how it may look like.

@BurnzZ
Copy link
Contributor Author

BurnzZ commented Apr 25, 2022

Hello @kmike, Thanks for reviewing this PR again last week. 🙌

Aside from these large changes:

  • update return_exceptions default from True to False 80f95a8
  • expose execute() for single requests and rename batch_requests() into batch_execute() 5d82a37

Small doc-related changes are included in 753e6ad which includes (but not limited to):

  • fixing some incorrect code samples (they can't run due to syntax errors)
  • emphasize that order is preserved for batch_execute()
  • mention that batch_execute() is merely a shortcut for asyncio.gather() and other asyncio functions like asyncio.as_completed() can be used depending on the need
  • mention what are the expectations for redirections

Lastly, I've created another PR in #38 to explore the default behavior of raising exceptions when receiving 400-5xx responses.

@BurnzZ BurnzZ requested a review from kmike April 25, 2022 14:19
@kmike
Copy link
Member

kmike commented Apr 27, 2022

Thanks so much @BurnzZ! I've checked the PR again, it looks good 💪

We might think about re-organizing docs a bit, but the content is there; this can be done later.

@kmike kmike merged commit 025f5b1 into master Apr 27, 2022
@BurnzZ BurnzZ deleted the additional-requests branch April 28, 2022 04:03
@BurnzZ
Copy link
Contributor Author

BurnzZ commented Apr 28, 2022

Thanks for your patience in reviewing this one @kmike ! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants