-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Avoid slowdowns by caching Domain.__eq__ #6764
Conversation
591f299
to
c0b0139
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6764 +/- ##
=======================================
Coverage 88.26% 88.27%
=======================================
Files 326 326
Lines 71142 71178 +36
=======================================
+ Hits 62797 62830 +33
- Misses 8345 8348 +3
|
88ec566
to
ea23d7f
Compare
975fa5e
to
b87741e
Compare
This is now ready for review. |
b87741e
to
eee8618
Compare
This PR is ready for review. :) |
Your modifications look OK (I like the dict with max 10 items!), and tests pass, so I merged the PR. But I had my fingers tightly crossed: with a change so deep down, there may be problems we haven't thought about. On the other hand, if something was wrong in such important part of the code, it would for sure trip some test (beyond those where the test itself was incorrect). |
Issue
Per Quasars/orange-spectroscopy#713 domain comparison can be so slow that it seems that Orange crashed. They are slow especially when multiple domain transformations are chained and if there are many attributes.
A demo to reproduce the issue follows. This
__eq__
implementation follows how we implemented__eq__
for PCA and PLS, but there, fortunately, because scikit-learn does not properly implement__eq__
s we do not see this issue.What follows is the simplest
SharedComputeValue
with__eq__
that ensures that the input domain is the same (as all our domain transformations).On this branch, I see:
While on master I get (each increase in depth makes it 100 times slower):
Description of changes
I implemented caching in
Domain.__eq__
. We deem domains read-only and already cache__hash__
, so why not this one too?The first commit refactors id-based caching used in
table.py
for use elsewhere.Includes