-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add audeer.unique() #149
Add audeer.unique() #149
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether the implicit type conversion as list is always desired?
import pandas as pd
import audeer
strlist = audeer.unique("aabbaa")
result = audeer.unique(strlist)
print(result)
>>> ['a', 'b']
ser = pd.Series([1,2,1,2,4,1])
result = audeer.unique(ser)
print(result)
>>> [1, 2, 4]
In order to return the original type, one would have to call "'.join(strlist)
on the string input and pd.Series
on the series. In both cases the union works but has an implicit type conversion. It that is desired this is ok.
A second point is the typing:
But in my superficial understanding, typing.Sequence
covers everything that implements __iter__
. So for example a pandas df (I have not checked it though) would fall in that category as well.
So would it be more precise if one would have a kind of union type for the input type hint, covering (at least) str, tuple, list and pd.Series?
But to support that, it is usually easier to convert everything to a fixed output type, instead of returning the original type. I also prefer to have a fixed output type in basic functions instead of defining the output type by the input type. |
I see: a import pandas as pd
import collections.abc
df = pd.DataFrame.from_dict({
'v1': [1,2,1,2,4,1],
'v2': [2,4,4,2,3,1],
}
)
typecheck = isinstance(df, collections.abc.Sequence)
print(typecheck)
print(df.__iter__) results in
In other words: ser = pd.Series([1,2,1,2,4,1])
isinstance(ser, collections.abc.Sequence)
>>> False But I am happy with it and will approve it as this becomes to academic. |
I accidentally pressed the "close" button next to the comment button. Reopening. Apologies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All question have been discussed.
Possibly adding a string to the test case would be nice in order to detail the behavior on strings.
I will not be fussy about it and approve though.
Yes. A |
It might be that a user wants to get unique values of a sequence, in the order they are appearing in that sequence.