-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue/120/check estimator columns #164
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #164 +/- ##
==========================================
+ Coverage 98.35% 98.38% +0.02%
==========================================
Files 45 45
Lines 2497 2536 +39
==========================================
+ Hits 2456 2495 +39
Misses 41 41 ☔ View full report in Codecov by Sentry. |
Some notes with the PR (and my confusions):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments a more senior member should review the bigger picture.
|
||
@classmethod | ||
def _check_data_columns(cls, path, columns_to_check, parent_groupname=None, **kwargs): | ||
raise NotImplementedError # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some documentation of what the fields are meant to be would be useful:
(what are the cls, path to what, ... etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added explanation of these parameters.
@@ -481,3 +481,43 @@ def _finalize_tag(self, tag): | |||
final_name = PipelineStage._finalize_tag(self, tag) | |||
handle.path = final_name | |||
return final_name | |||
|
|||
def _check_column_names(self, data, columns_to_check, **kwargs): | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the try
here?
If get
fails, it will always default to self.config.hdf5_groupname
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe the try is in case self.config.hdf5_groupname is not defined
else: | ||
col_list = list(data[groupname].keys()) | ||
# check columns | ||
intersection = set(columns_to_check).intersection(col_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
self._check_column_names(data, self.stage_columns) | ||
|
||
def _get_stage_columns(self): | ||
self.stage_columns=[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just:
self.stage_columns = [self.config.column_name]
Also, this might lead to confusion between the names and the columns themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to above. Indeed the names are confusing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine from my end, the valiate()
calls looks to make sense. I might suggest adding more comments on the functions, this will make it easier for the future to understand the ideas implemented
@@ -71,3 +71,16 @@ def _process_chunk(self, start, end, data, first): | |||
) | |||
qp_d.set_ancil(dict(zmode=zmode)) | |||
self._do_chunk_output(qp_d, start, end, first) | |||
|
|||
|
|||
def validate(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit more descriptive of what you intend to validate with this function might be good for future code readers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a description.
self.set_data("input", training_data) | ||
self.validate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth adding to line 102/104 that validate also needs to be defined in the sub-class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now added.
@@ -20,7 +20,8 @@ dependencies = [ | |||
"pyyaml", | |||
"numpy<2.0.0", | |||
"click", | |||
"ceci>=2.0.1", | |||
#"ceci>=2.0.1", | |||
"ceci@git+https://github.com/LSSTDESC/ceci", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this will prevent you from pushing this to pypi. I think you will need to tag & release a version of ceci to be able to do ath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will change this once ceci has rolled a new release (sometime soon maybe? @empEvil)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, but I think you will need to fix the pyproject.toml file to include a version of ceci before you can push this to pypi
Problem & Solution Description (including issue #)
This PR addresses #120 and implement the check_columns function in tables_io into the rail base classes.
Code Quality
#pragma: no cover
; in the case of a bugfix, a new test that breaks as a result of the bug has been added