-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add include_empty keyword argument #250
base: master
Are you sure you want to change the base?
Conversation
When using get-like functions (edges.pathway_edges, nodes.get) it is common to concatenate the outputs of the functions to mimic the old bluepysnap interface. However, if no nodes/edges satisfy the query, instead of resulting in an empty dataframe, this approach will result in an error, because no dataframes at all are yielded by the generator. The include_empty keyword argument allows the user to specify that they still want to see a dataframe if the query turns up nothing. One example where this is useful for me is when I use the length of a connection dataframe to determine connection probability.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #250 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 32 32
Lines 2449 2449
=========================================
Hits 2449 2449
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks for the PR.
I'm generally a bit leary to add more arguments to functions, since it's a bit a of a slipperly slope (but I'm not ruling this out).
|
The current behavior does not yield anything if the query comes up empty. So |
If I understood correctly, the use case is to def _concat(iterator):
dfs = [df for _,df in iterator]
return pd.concat(dfs) if dfs else pd.DataFrame([]) # or if you just want to check the length, you might just have "else []"
df = _concat(c.nodes.get(query)) or, you can use def _concat(iterator):
dfs = dict(iterator).values()
return pd.concat(dfs) if dfs else pd.DataFrame([]) # or "else []" if just for length You can even one-line it, albeit the readability takes a hit: def _concat(iterator):
return pd.concat(d.values()) if (d := dict(iterator)) else pd.DataFrame([]) |
The issue with this is that downstream code which expects the dataframe to have columns will break because it has none, and so I will need to add a if df.empty then do something else clause to everything. |
No worries, you can set any columns to empty dataframes, too: columns = list(circuit.nodes.property_names)
pd.DataFrame(columns=columns) I'm afraid I don't understand why do you need to process empty dataframes. If you can point me to your code and especially those parts in which the empty dataframe would cause issues, I can take a look and see what can we do so those needn't be processed. |
I often only retrieve a subset of the columns, so then I'll need to pass this subset to the function as well. In this case I may be best off wrapping the .get method call as a whole
So strictly speaking it isn't necessary to have the empty dataframes, just simpler and more concise. |
If wrapping the
Yeah, I get it. But, to be fair, it depends on the point of view: adding/passing/handling extra keyword arguments for covering one particular use case is not simpler, nor more concise for us. I'd rather always return empty dataframes than add an extra behaviour handled by an extra argument if this is absolutely necessary. If not, I wouldn't break the current behavior. Anyway, I still don't understand why do you need to keep processing the empty dataframes and, again, if you'd like me to take a look at the code to possibly not process them, let me know, I'm happy to help. |
When using get-like functions (edges.pathway_edges, nodes.get) it is common to concatenate the outputs of the functions to mimic the old bluepysnap interface. However, if no nodes/edges satisfy the query, instead of resulting in an empty dataframe, this approach will result in an error, because no dataframes at all are yielded by the generator.
The include_empty keyword argument allows the user to specify that they still want to see a dataframe if the query turns up nothing.
One example where this is useful for me is when I use the length of a connection dataframe to determine connection probability.