-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle item overrides #164
Conversation
@@ -34,7 +34,7 @@ | |||
PageParamsProvider: 700, | |||
RequestUrlProvider: 800, | |||
ResponseUrlProvider: 900, | |||
ItemProvider: 1000, | |||
ItemProvider: 2000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increasing this value allows ZyteApiProvider
(as well as any other item provider in the future) to run before scrapy-poet's ItemProvider
. This ensures any item dependency in page objects are available.
): | ||
"""Build dependencies handled by registered providers""" | ||
instances: Dict[Callable, Any] = {} | ||
instances: Dict[Callable, Any] = prev_instances or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing the prev_instances
allows the PO to gain access to the item produced by a provider (as long as that provider was called earlier).
@attrs.define
class ProductPage(WebPage[Product]):
product: Product
In the above example, before the ProductPage
instance can be made, the product: Product
instance should first be created. This is where this passing of new prev_instances
parameter is useful.
1169e79
to
5e93669
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #164 +/- ##
==========================================
+ Coverage 85.64% 85.78% +0.14%
==========================================
Files 14 14
Lines 801 809 +8
==========================================
+ Hits 686 694 +8
Misses 115 115
|
scrapy_poet/injection.py
Outdated
@@ -414,6 +441,7 @@ class MySpider(Spider): | |||
spider = MySpider() | |||
spider.settings = settings | |||
crawler.spider = spider | |||
crawler.stats = load_object(crawler.settings["STATS_CLASS"])(crawler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scrapy 2.11 changes broke this since stats
was missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, I've already fixed it in master before looking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's good to see we did it in (kinda) the same way!
} | ||
# item from 'to_return' | ||
item, deps = yield crawl_item_and_deps(Kangaroo, override_settings=settings) | ||
assert item == Kangaroo(name="(modified by Joey) data from KangarooProvider") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that only JoeyPage
had touched the data and KangarooPage
didn't because the earlier PO had higher priority.
Conflicts resolved, but john-kurkowski/tldextract#305 breaks tests through url-matcher’s get_domain, will fix that now. To-do:
|
Never mind, it was caused by an outdated test expectation. |
Thanks @BurnzZ and @Gallaecio! |
Fixes the case when the following code below results in a
ProviderDependencyDeadlockError
. In this example, the issue stems from theproduct: Product
dependency that's declared. However, scrapy-poet seesProductPage
as the one that provides it which causes the deadlock error.The use case for this is when items like
Product
are produced by a provider and not a page object. For example, scrapy-zyte-api'sZyteApiProvider
(reference). TheProductPage
in this scenario tries to override the item produced byZyteApiProvider
.TODO: