add CombinatorialGapKFold #41

aldder · 2022-02-01T17:35:38Z

From "Advances in Financial Machine Learning" book by Marcos López de Prado
the implemented version of Combinatorial Cross Validation with Purging and Embargoing

explaining video: https://www.youtube.com/watch?v=hDQssGntmFA

pep8speaks · 2022-02-01T17:35:46Z

Hello @aldder! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file tscv/_split.py:

Line 451:9: F841 local variable 'gap_before' is assigned to but never used
Line 451:21: F841 local variable 'gap_after' is assigned to but never used

In the file tscv/tests/test_split.py:

Line 672:80: E501 line too long (102 > 79 characters)
Line 678:80: E501 line too long (103 > 79 characters)
Line 684:80: E501 line too long (115 > 79 characters)
Line 690:80: E501 line too long (119 > 79 characters)
Line 698:80: E501 line too long (106 > 79 characters)

Comment last updated at 2022-02-11 12:19:56 UTC

WenjieZ · 2022-02-05T03:53:49Z

Hi @aldder , please try to add some test cases in the test_split.py file.

codecov · 2022-02-07T07:23:45Z

Codecov Report

Merging #41 (c7b2bed) into master (c05265a) will increase coverage by 0.37%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #41      +/-   ##
==========================================
+ Coverage   97.51%   97.88%   +0.37%     
==========================================
  Files           3        3              
  Lines         643      756     +113     
==========================================
+ Hits          627      740     +113     
  Misses         16       16

Impacted Files	Coverage Δ
tscv/__init__.py	`100.00% <100.00%> (ø)`
tscv/_split.py	`94.50% <100.00%> (+0.72%)`	⬆️
tscv/tests/test_split.py	`99.78% <100.00%> (+0.04%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c05265a...c7b2bed. Read the comment docs.

WenjieZ

Please see the comments and make changes accordingly.

WenjieZ · 2022-02-11T08:19:30Z

tscv/_split.py

+        self.n_groups = N
+        self.test_splits = k
+
+    def split(self, X, y=None, groups=None):


The canonical way of doing this is to redefine _iter_test_indices(self, X, y=None, groups=None) from the base class and generate only the test indices. The base class will take care of the rest. Please refer to the other derived classes and make modification accordingly.

The problem here is that in order to check if the training set size is > 0 we need to compute the complement of test indices after their generation.
And if we already do this, there is no point in discarding this information to recalculate it after

The check of the training set size (and the test set size in some cases) can be delayed and implemented in GapCrossValidator._iter_train_indices() and its 3 siblings. You can also implement it in GapCrossValidator.split() if you find it a hustle to implement it four times.

On second thought, I don't find it necessary to check the non-emptiness of the training set. Some models/algorithms can output a default estimator/strategy given an empty training set (e.g., equal weight portfolio). If your model/algorithm requires a non-empty training set, it's probably a good idea to check it in the model/algorithm rather than in the cross-validator. That's why I didn't implement this check when I released the package.

That said, I probably forgot to check the non-emptiness of the test set.

Ok then, I will replace the implementation of the split method with _iter_test_indices

About the non-emptiness of the test set I think you prefer to implement it in the base class with a specific PR, so I won't touch it, ok?

Your judgement is correct.

WenjieZ · 2022-02-11T08:22:04Z

tscv/_split.py

+        n_splits : int
+            Returns the number of splitting iterations in the cross-validator.
+        """
+        return len(list(combinations(range(self.n_groups), self.test_splits)))


Use the combination number to generate the result directly rather than instantiating all combinations.

WenjieZ · 2022-02-11T10:40:23Z

tscv/_split.py

 from inspect import signature
+from scipy.special import comb


Please add scipy in

TSCV/setup.py

Line 55 in 2abbc3d

install_requires=['numpy>=1.13.3', 'scikit-learn>=0.22']

add CombinatorialGapKFold

3c066a6

aldder added 2 commits February 1, 2022 18:54

pep8

cb3b0df

pep8

60029a9

aldder added 3 commits February 7, 2022 12:43

add test WenjieZ#41 (comment)

43dd806

more tests

d77fe02

fix docstring

e587202

WenjieZ requested changes Feb 11, 2022

View reviewed changes

fix WenjieZ#41 (comment)

0f8c24d

WenjieZ reviewed Feb 11, 2022

View reviewed changes

aldder added 2 commits February 11, 2022 13:16

fix WenjieZ#41 (review) and WenjieZ#41 (comment)

c226a52

fix tests

c7b2bed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add CombinatorialGapKFold #41

add CombinatorialGapKFold #41

aldder commented Feb 1, 2022

pep8speaks commented Feb 1, 2022 •

edited

Loading

WenjieZ commented Feb 5, 2022

codecov bot commented Feb 7, 2022 •

edited

Loading

WenjieZ left a comment

WenjieZ Feb 11, 2022

aldder Feb 11, 2022

WenjieZ Feb 11, 2022 •

edited

Loading

WenjieZ Feb 11, 2022 •

edited

Loading

aldder Feb 11, 2022

WenjieZ Feb 11, 2022

WenjieZ Feb 11, 2022

WenjieZ Feb 11, 2022

add CombinatorialGapKFold #41

Are you sure you want to change the base?

add CombinatorialGapKFold #41

Conversation

aldder commented Feb 1, 2022

pep8speaks commented Feb 1, 2022 • edited Loading

Comment last updated at 2022-02-11 12:19:56 UTC

WenjieZ commented Feb 5, 2022

codecov bot commented Feb 7, 2022 • edited Loading

Codecov Report

WenjieZ left a comment

Choose a reason for hiding this comment

WenjieZ Feb 11, 2022

Choose a reason for hiding this comment

aldder Feb 11, 2022

Choose a reason for hiding this comment

WenjieZ Feb 11, 2022 • edited Loading

Choose a reason for hiding this comment

WenjieZ Feb 11, 2022 • edited Loading

Choose a reason for hiding this comment

aldder Feb 11, 2022

Choose a reason for hiding this comment

WenjieZ Feb 11, 2022

Choose a reason for hiding this comment

WenjieZ Feb 11, 2022

Choose a reason for hiding this comment

WenjieZ Feb 11, 2022

Choose a reason for hiding this comment

pep8speaks commented Feb 1, 2022 •

edited

Loading

codecov bot commented Feb 7, 2022 •

edited

Loading

WenjieZ Feb 11, 2022 •

edited

Loading

WenjieZ Feb 11, 2022 •

edited

Loading