-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add CombinatorialGapKFold #41
base: master
Are you sure you want to change the base?
Conversation
Hello @aldder! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2022-02-11 12:19:56 UTC |
Hi @aldder , please try to add some test cases in the |
Codecov Report
@@ Coverage Diff @@
## master #41 +/- ##
==========================================
+ Coverage 97.51% 97.88% +0.37%
==========================================
Files 3 3
Lines 643 756 +113
==========================================
+ Hits 627 740 +113
Misses 16 16
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comments and make changes accordingly.
tscv/_split.py
Outdated
self.n_groups = N | ||
self.test_splits = k | ||
|
||
def split(self, X, y=None, groups=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canonical way of doing this is to redefine _iter_test_indices(self, X, y=None, groups=None)
from the base class and generate only the test indices. The base class will take care of the rest. Please refer to the other derived classes and make modification accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that in order to check if the training set size is > 0 we need to compute the complement of test indices after their generation.
And if we already do this, there is no point in discarding this information to recalculate it after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check of the training set size (and the test set size in some cases) can be delayed and implemented in GapCrossValidator._iter_train_indices()
and its 3 siblings. You can also implement it in GapCrossValidator.split()
if you find it a hustle to implement it four times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I don't find it necessary to check the non-emptiness of the training set. Some models/algorithms can output a default estimator/strategy given an empty training set (e.g., equal weight portfolio). If your model/algorithm requires a non-empty training set, it's probably a good idea to check it in the model/algorithm rather than in the cross-validator. That's why I didn't implement this check when I released the package.
That said, I probably forgot to check the non-emptiness of the test set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok then, I will replace the implementation of the split method with _iter_test_indices
About the non-emptiness of the test set I think you prefer to implement it in the base class with a specific PR, so I won't touch it, ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your judgement is correct.
tscv/_split.py
Outdated
n_splits : int | ||
Returns the number of splitting iterations in the cross-validator. | ||
""" | ||
return len(list(combinations(range(self.n_groups), self.test_splits))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the combination number to generate the result directly rather than instantiating all combinations.
from inspect import signature | ||
from scipy.special import comb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add scipy
in
Line 55 in 2abbc3d
install_requires=['numpy>=1.13.3', 'scikit-learn>=0.22'] |
From "Advances in Financial Machine Learning" book by Marcos López de Prado
the implemented version of Combinatorial Cross Validation with Purging and Embargoing
explaining video: https://www.youtube.com/watch?v=hDQssGntmFA