-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Feature Constructor: Easier categorical features, allow creation of date/time, easier use of string data #3936
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3936 +/- ##
=========================================
- Coverage 84.95% 84.7% -0.25%
=========================================
Files 376 371 -5
Lines 65907 64957 -950
=========================================
- Hits 55988 55022 -966
- Misses 9919 9935 +16 |
The idea is the right one, but it doesn't seem to work. When I create a datetime variable from string, which returns '2019-01-01' for example, the widget says |
That's why I asked whether it's useful - because I suspected it's not. :) The function is expected to return a float. So we should wrap the expression into a function that checks whether the result is a string and in this case converts it into a float? I'll see. |
9f99cb4
to
f0516f9
Compare
Testing briefly, this is really cool. 👍 |
) | ||
placeholderText="Expression...", | ||
toolTip="Result must be a number for numeric variables, " | ||
"and strings for text variables.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"and a string for text and date/time variables.\n"
?
EDIT: Actually, I see that for datetime vars both are options as well - number or string. Although currently, the number must be a float and not an int, which is a bit awkward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be even better if subclasses would replace this tooltip with just the information specific to that variable type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agreed, implemented it, and no longer agree. It looks visually as the same line edit and the user who will get a tooltip once (and the default is that for numeric attributes, which tells the user to enter a numeric expression) will not expect to get a different tooltip the next time.
Please check and tell me that I can/should remove ac97c90. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is not perfect, but I think removing ac97c90 is worse... The best thing about the tooltip for datetime is that you were able to put in an example, so people actually know what they have to produce in the expression. If those 3 lines are added to the original 4 of the tooltip it becomes a mess.
And anyway, I think the problem would be minimal if when no variable has yet been selected it showed something generic or even no tooltip.
But until the next round of improvements, I would say let's just keep this commit. People not expecting a different tooltip is probably not that big a problem, because you usually need to check it when you have an error in the expression and then you see the relevant information.
self.valuesedit = QLineEdit() | ||
tooltip = \ | ||
"If values are given, expression must return integer indices.\n" \ | ||
"Otherwise, expression must return strings." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the third option (using names of values instead of indices or strings) is intentionally not mentioned and is just a bonus hack for power users?
Since it is basically the same as using strings (minus the quotes), we could consider dropping the unmentioned 3rd option and make this tooltip true...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gladly. I suppose this was so exotic that we don't care about backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can ignore it in this case (not sure it was documented at any time anyway).
parse = Orange.data.TimeVariable("_").parse | ||
|
||
def cast(e): # pylint: disable=function-redefined | ||
if isinstance(e, float): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is a niche case, if we are supporting numbers I don't see a reason why 1000.
(float) would be accepted and 1000
(int) not...
Maybe we should write the documentation together with these changes, so that we are sure what should work and what should not. It adds a nice incentive to make it simple enough that explaining it is not too hard :)
try: | ||
return parse(e) | ||
except: # pylint: disable=bare-except | ||
return nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows user mistakes to pass too silently in my opinion. I would expect an error in the widget to be shown if I return a date "9/11/2001", "2019-01-33" or "2019-21-12" etc. Currently it could be that some dates work, but the incorrect just become missing values without any notice.
The documentation should also definitely link to https://en.wikipedia.org/wiki/ISO_8601 since most people don't actually use the iso standard for dates and probably don't know what they should look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as a part of this PR, but it would be soooo nice if Orange worked with more than one datetime format. Especially when importing from Excel, which pretends to be smart and formats dates in its own way, this would make it a lot easier for people to import datetimes. I assume we can make a simple function that checks the most likely format and applies it to the datetime column... Or actually converts it into ISO, because I suppose nothing would work in the Timeseries add-on otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget why I wanted to catch exceptions here. Removed.
Accepting different date formats indeed belongs to a separate issue.
Codecov Report
@@ Coverage Diff @@
## master #3936 +/- ##
==========================================
+ Coverage 85.12% 85.13% +<.01%
==========================================
Files 378 378
Lines 67117 67194 +77
==========================================
+ Hits 57134 57204 +70
- Misses 9983 9990 +7 |
f0516f9
to
09bfee3
Compare
09bfee3
to
ac97c90
Compare
Fixes #2900. Fixes #3056.
Lets the expression for discrete variables return strings instead of indices.
If the user defines values, the function must return integer indices. If function returns strings (or instances of Value), the user mustn't define values. It is contrived, but you can't have both. We will have to mention this in documentation. :) If you have a better solution, feel free to change the code.
Casts string values into strings, so name[0] is now a valid expression for zoo. Before, you'd have to write str(name)[0] -- and, of course, list all possible values.
Allows creation of Date / Time variables. If expression returns a string, it is parsed as date/time.
Includes