-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZyteApiProvider could make an unneeded API request #91
Comments
Findings so far:
|
Yeah, the problem AFAIK is that ItemProvider calls build_instances itself. scrapinghub/scrapy-poet#151 is actually about a third request done in this or similar use case. |
We also thought the solution may involve the caching feature in ItemProvider but didn't investigate further. |
New finding: Switching |
I looked into this further and it still occurs without any Page Objects involved. The sent Zyte API requests were determined by setting Given the following spider: class BooksSpider(scrapy.Spider):
name = "books"
def start_requests(self):
yield scrapy.Request(
url="https://books.toscrape.com",
callback=self.parse_nav,
meta={"zyte_api": {"browserHtml": True}},
) Case 1✅ The following callback set up is correct since it has only 1 request: # {"productNavigation": true, "url": "https://books.toscrape.com"}
def parse_nav(self, response: DummyResponse, navigation: ProductNavigation):
... Case 2❌ However, the following has 2 separate requests: # {"browserHtml": true, "url": "https://books.toscrape.com"}
# {"productNavigation": true, "url": "https://books.toscrape.com"}
def parse_nav(self, response, navigation: ProductNavigation):
... This case should not happen since Case 3However, if we introduce a Page Object to the same spider: @handle_urls("")
@attrs.define
class ProductNavigationPage(ItemPage[ProductNavigation]):
response: BrowserResponse
nav_item: ProductNavigation
@field
def url(self):
return self.nav_item.url
@field
def categoryName(self) -> str:
return f"(modified) {self.nav_item.categoryName}" ❌ Then, the following callback set up would have 3 separate Zyte API Requests: # {"browserHtml": true, "url": "https://books.toscrape.com"}
# {"productNavigation": true, "url": "https://books.toscrape.com"}
# {"browserHtml": true, "url": "https://books.toscrape.com"}
def parse_nav(self, response: DummyResponse, navigation: ProductNavigation):
... Note that the same series of 3 separate requests still occurs on: def parse_nav(self, response, navigation: ProductNavigation):
... |
I wonder if some of the unexpected requests are related to #135. |
Re-opening this since Case 2 is still occurring. Case 3 has been fixed though. |
@BurnzZ so do you think after your latest analysis that case 2 still happens or not? |
@wRAR I can still reproduce Case 2. 👍 |
OK, so the difference between this use case and ones that we already test is having |
OTOH I'm not sure if even we handle this in the provider the request itself won't be sent? |
@wRAR Let's try to focus on how Case 2 (or any of these cases) affect https://github.com/zytedata/zyte-spider-templates, not on the case itself. The priority of supporting meta is not clear to me now; it may not be necessary in the end, or it could be. |
In the example below ZyteApiProvide makes 2 API requests instead of 1:
The text was updated successfully, but these errors were encountered: