Add helper function to container base class to replace NaNs with NoneType to accommodate JSON outputs #3

esherman-credo · 2022-11-17T14:50:49Z

JSON cannot support NaN type. Need recast all pandas dataframes with Nones rather than NaNs before converting to evidence.

This changes does NOT check for NaNs in non-DataFrame evidences (e.g. deepchecks) and that may pose an issue down the road (not an issue at the moment, as we are hand-crafting a return DataFrame for deepchecks results)

Change description

Added a helper function to the container base class to replace NaNs.

Modified each EvidenceContainer class to call the helper at the start of get_evidence() function call

Type of change

[ X] Bug fix (fixes an issue)

…Type to accommodate JSON outputs. JSON cannot support NaN type

IanAtCredo · 2022-11-18T14:07:23Z

connect/evidence/lens_evidence/containers.py

@@ -14,6 +14,7 @@ def __init__(self, data, labels: dict = None, metadata: dict = None):
        super().__init__(DataProfilerEvidence, data, labels, metadata)

    def to_evidence(self, **metadata):
+        self.remove_NaNs()


DataProfiler data is not a pandas dataframe. It's a pandas profiler. get_description returns a dictionary.

You can do something like:
scrubbed_data = self.remove_NaNs(self._data.get_description())

then pass the scrubbed_data.

IanAtCredo · 2022-11-18T14:17:50Z

connect/evidence/containers.py

@@ -61,6 +62,15 @@ def _validate_inputs(self, data):
    def _validate(self, data):
        pass

+    def remove_NaNs(self):


I think this may need to be more robust. What happens when this is a dictionary? Or a dictionary of dictionaries? Or a list of dictionaries? For those non-pandas cases, I'd probably use a recursive function. Not sure if this is the best way, but this is how I coded up the check_subset function

Also, the way you use it (always calling it in to_evidence) you don't need to change _data in place. Just have it do the transformation on the data, and return the cleaned data for export.

Still working on the data profiler sanitization. It is complicated because it contains nested dictionaries and then some DataFrames/Series within those dictionaries. It's probably safe to assume those DataFrames aren't further nested (i.e. they just contain elementary types) but if we don't want to assume that we'll need a very complicated sanitizer.

On further thought, I changed the paradigm of the call. This should be a forced sanitization so I've moved the call to the base class EvidenceContainer's init function. The function is now abstract and forces subclasses to implement it. This will help prevent some future developer from implementing a new evidence while forgetting to sanitize for JSONs.

I don't really see the harm in having self._data reflect a de-NaNified structure. As currently written, Evidence always gets converted to a JSON. The prior implementation isn't really "in place" --> I was passing the _data object on the RHS and the return was a copy assigned to the same variable name. The way I'm doing it now precludes that by just sanitizing at the start (sanitizing a copy so we don't have to worry about deep copy issues).

IanAtCredo

Comments with more details but:

Believe this function can be more robust
Not applied properly to PandasProfiler
Doesn't need to modify the data in place.

…class. Each class must implement its own NaN sanitization function to ensure future evidences don't forget to do so.

esherman-credo · 2022-11-18T15:46:34Z

Not sure why the Lint rule is failing. Some Node.js error?

…ather than internal _data object

esherman-credo · 2022-11-18T18:31:34Z

New solution for data_profiler uses dictionary helper in the containers base file. It's...not pretty.

Similarity between dictionary helper and list helper is unfortunate. Not clear if there's a workaround though, since one relies on a class function (data.items()) and the other relies on a function applied to a list (enumerate(data)). Tried thinking about ways to store or pass those as a function object func but the latter requires a call when it's declared and it seems like either way you'd end up with some stupid-large if-statement

esherman-credo · 2022-11-21T15:56:20Z

Encountered a bug. Converting to draft.

…t_description

esherman-credo · 2022-11-21T17:35:55Z

Now fixes #4

…which overwrites data

…ith model and datasets

…ather than internal _data object

…rocess

IanAtCredo

Looks good to me. Please review my changes and see if they still work and I didn't miss anything. I've tested on your integration notebook.

esherman-credo

Looks good to me.

~~It won't let me approve since I'm original PR author~~ Nevermind once you approved it's good.

Add helper function to container base class to replace NaNs with None…

86d0491

…Type to accommodate JSON outputs. JSON cannot support NaN type

esherman-credo requested a review from IanAtCredo November 17, 2022 14:50

IanAtCredo reviewed Nov 18, 2022

View reviewed changes

IanAtCredo suggested changes Nov 18, 2022

View reviewed changes

Modify sanitization to be forced. Occurs in setup function for super …

161ca6f

…class. Each class must implement its own NaN sanitization function to ensure future evidences don't forget to do so.

esherman-credo added 4 commits November 18, 2022 11:02

Modify removeNans to operate on input data structure (not in place) r…

b0dbc4d

…ather than internal _data object

Black not working locally. Add line at end of file

2cadc61

Changed local python distro. Black should work now?

ba856b5

Recursive helper for converting NaNs in dictionary to NoneType

68d8b9a

esherman-credo requested a review from IanAtCredo November 18, 2022 18:26

esherman-credo marked this pull request as draft November 21, 2022 15:56

esherman-credo added 2 commits November 21, 2022 12:19

Fix bug with calling remove nans for pandas profiler. Move call to ge…

974ef2f

…t_description

Fixes issue 4. Datetimes (used in pandas profiler) not serializable

6135092

esherman-credo marked this pull request as ready for review November 21, 2022 17:36

remove_NaNs implemented for deepchecks. previously returned nothing, …

2121742

…which overwrites data

esherman-credo mentioned this pull request Nov 23, 2022

Tests/export json credo-ai/credoai_lens#257

Merged

1 task

esherman-credo requested review from IanAtCredo and removed request for IanAtCredo November 28, 2022 15:15

IanAtCredo added 7 commits November 28, 2022 12:15

reformatted, added tests and black

39466da

fixed bugs in tests

9e91014

added adapters

391097e

added validation

98381b5

updated tests:

039474b

updated adapter with adapter class and allowed labeling of evidence w…

ec17275

…ith model and datasets

udpated tests and fixed bug

3a8490a

IanAtCredo and others added 9 commits November 28, 2022 12:15

updated installation

6a663b8

Trying to rebase to develop

eb153ba

Modify removeNans to operate on input data structure (not in place) r…

a5956d8

…ather than internal _data object

Recursive helper for converting NaNs in dictionary to NoneType

9e08c8d

Merge branch 'develop' into bugfix/json_nans

20328bd

Run black locally

2b7351a

remove merge metadata and reformat with black

028ab24

Fix some issues with TableContainer that went away during the merge p…

699696e

…rocess

refactored using a Scrubber class and 'scrubbed_data' property

7508ae3

IanAtCredo approved these changes Nov 28, 2022

View reviewed changes

IanAtCredo added 2 commits November 28, 2022 12:55

updated version

24b91fd

Merge branch 'develop' into bugfix/json_nans

a344080

IanAtCredo force-pushed the bugfix/json_nans branch from 1bdaa02 to a344080 Compare November 28, 2022 20:56

esherman-credo commented Nov 28, 2022

View reviewed changes

esherman-credo merged commit c816b2c into develop Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add helper function to container base class to replace NaNs with NoneType to accommodate JSON outputs #3

Add helper function to container base class to replace NaNs with NoneType to accommodate JSON outputs #3

esherman-credo commented Nov 17, 2022

IanAtCredo Nov 18, 2022 •

edited

Loading

IanAtCredo Nov 18, 2022 •

edited

Loading

esherman-credo Nov 18, 2022

IanAtCredo left a comment

esherman-credo commented Nov 18, 2022

esherman-credo commented Nov 18, 2022

esherman-credo commented Nov 21, 2022

esherman-credo commented Nov 21, 2022

IanAtCredo left a comment

esherman-credo left a comment •

edited

Loading

Add helper function to container base class to replace NaNs with NoneType to accommodate JSON outputs #3

Add helper function to container base class to replace NaNs with NoneType to accommodate JSON outputs #3

Conversation

esherman-credo commented Nov 17, 2022

Change description

Type of change

IanAtCredo Nov 18, 2022 • edited Loading

Choose a reason for hiding this comment

IanAtCredo Nov 18, 2022 • edited Loading

Choose a reason for hiding this comment

esherman-credo Nov 18, 2022

Choose a reason for hiding this comment

IanAtCredo left a comment

Choose a reason for hiding this comment

esherman-credo commented Nov 18, 2022

esherman-credo commented Nov 18, 2022

esherman-credo commented Nov 21, 2022

esherman-credo commented Nov 21, 2022

IanAtCredo left a comment

Choose a reason for hiding this comment

esherman-credo left a comment • edited Loading

Choose a reason for hiding this comment

IanAtCredo Nov 18, 2022 •

edited

Loading

IanAtCredo Nov 18, 2022 •

edited

Loading

esherman-credo left a comment •

edited

Loading