Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Widget: Can't switch column type from categorical to datetime #2974

Closed
tbuttle opened this issue Mar 26, 2018 · 18 comments
Closed

File Widget: Can't switch column type from categorical to datetime #2974

tbuttle opened this issue Mar 26, 2018 · 18 comments

Comments

@tbuttle
Copy link

tbuttle commented Mar 26, 2018

When loading my CSV or Excel file with a Date field Orange does not recognize or provide the ability to change the field type to DateTime. All date fields come in as Categorical data. I'm trying to do a time series prediction but without a the dates being recognized I'm unable to build out my prediction model. I've tried date in several different formats to no avail.

Please note that the format per Orange's documentation matches what is in my data set. YYYY-MM-DD is the required format and is one of the format's I've tried to get Orange to recognize. Please see documentation & screenshots below.

date time specs - orange
time series data - orange

@kernc
Copy link
Contributor

kernc commented Mar 26, 2018

The following minimal example file works for me: date-test.csv.zip (.xls or .xlsx work the same).

Can you attach a small example of your data?

@tbuttle
Copy link
Author

tbuttle commented Mar 26, 2018

File Attached -

Orange isn't recognizing any of the date formats attached. I'm able to change the "Numeric" values to date/time but unable to do so when the dates register as "Categorical" values.

time series sample - orange

Date Sample.xlsx

@kernc
Copy link
Contributor

kernc commented Mar 26, 2018

Right. The columns are heuristically marked as categorical due to only some 60 unique values for some 2.5k rows. The real issue here imho is that the widget doesn't allow switching from categorical to datetime.

You can force Orange to interpret a column as datetime by prefixing the name with "T#", e.g.

T#Date

See also: http://orange-visual-programming.readthedocs.io/loading-your-data/index.html#header-with-attribute-type-information

@kernc kernc changed the title Orange is not recognizing a Date/Time Field when loading a CSV or Excel File despite format matching specified documentation File Widget: Can't switch column type from categorical to datetime Mar 26, 2018
@ajdapretnar
Copy link
Contributor

ajdapretnar commented Mar 28, 2018

The real issue here imho is that the widget doesn't allow switching from categorical to datetime.

+1 on this. An issue was opened a while ago with similar concerns: #1520.

@nemontemi
Copy link

It seems that the column containing the datetime data defaults to "categorical" (without possibility to change) if the column is non-unique, i.e., if there are duplicate datetimes within that column.

@Ejmric Ejmric self-assigned this Mar 8, 2019
@drgooo
Copy link

drgooo commented Mar 11, 2019

I had the same issue. The values didn't follow the required datetime format for Orange. Adding a small python script solved the issue. Might not be the cleanest but did the job for me.

It takes the input file, creates a new column and sets the value to the correct string format.

import datetime
from Orange.data import Domain, Table, TimeVariable

#Make a new domain based on existing columns and adding the extra 'newDate' one
new_domain = Domain(["Fiscal Date", "Column2", TimeVariable.make("newDate")], new_data.domain.class_vars, source=new_data.domain)
#Construct a new table based on the new domain, using inputdata
new_data = Table(new_domain, in_data)

#format the date to align to Orange's requirements
for inst in new_data:
    inst[2] = str(datetime.datetime.strptime(str(inst[0]), "%Y-%m-%d"))`

Looking at your table, you'll need lines of code to set your column 1, 3 and 7 into new columns. I included your first column in the domain definition as an example, but you'd need to add the rest if you wanted them in the final output

@bhavin83012
Copy link

Hello,

Software: Orange V 3.20

I loaded excel file, which include date column. I matched the date format to ISO. Orange file widget could not recognize the date and loaded as Categorical. On the data table, there are strange numbers loaded on Date column. I tried with different files and same is repeating. (Please see below screenshot.)
2019-03-14 13_59_10-_

Then, I tried with T# to enforce the date format to the column. Orange did recognize that as date, but, this time the date range was completely different. The dates started in 1970-01-01 hh:mm:ss. Below screenshot.
image

It will be very helpful if someone can help me to get this issue fixed. I am not a programmer, so I can not see the solution in this matter.
Below is the screenshot of the file.
image

The file is attached here with.
date-test - Copy.xlsx

@ajdapretnar
Copy link
Contributor

@bhavin83012 This is not an issue of Orange, but an issue of Excel. Excel tries to be smart and once it recognizes this is a datetime variable, it reformats is behind the scenes. You need to set the number format to Text to force Excel not to mess with your data.

@janezd janezd closed this as completed Mar 19, 2019
@ajdapretnar
Copy link
Contributor

This is not solved. File widget's domain editor should enable changing categorical to datetime if possible.

@ajdapretnar ajdapretnar reopened this Mar 19, 2019
@bhavin83012
Copy link

Hi Ajda,
Your advise is useful - I worked out by below way.
The original excel files opened with Open Office software - (Say Libero Office). Then change the date format to ISO - Then you can use the file as it is or convert it to CSV and that should work in Orange.

Thank you for your support.

@Ejmric Ejmric removed their assignment Mar 29, 2019
@janezd janezd added the snack This will take an hour or two label Jun 2, 2019
@NejcDebevec NejcDebevec self-assigned this Jul 1, 2019
@janezd
Copy link
Contributor

janezd commented Jul 2, 2019

@NejcDebevec, I looked into it: I suppose the best solution is to change ExcelReader.read. Instead of reading all cells from Excel and pass pure (untyped) data to the inherited data_table, reimplement ExcelReader to deduce variable types from Excel table, construct domain and read the data without relying on inherited methods.

You can do this without any refactoring of FileFormat class.

@JonB451
Copy link

JonB451 commented Aug 6, 2019

Orange 3.19.0

'YYYY-MM-DD' was not seen by Orange3 as a datetime for me, despite understanding this is ISO 8601 compliant, if i have understood https://en.wikipedia.org/wiki/ISO_8601

'YYYY-MM-DD HH:MM' did work for me.

If Orange3's required datetime format is not ISO 8601, is there any chance please that the requried format be specified precisely and comprehensively at https://orange3-timeseries.readthedocs.io/en/latest/widgets/as_timeseries.html?

@ajdapretnar
Copy link
Contributor

Orange most certainly should and does recognize 'YYYY-MM-DD' as datetime. Please check your entries for mistakes.

Timeseries is no longer maintained. The reference for using datetime values can be found at: https://orange-visual-programming.readthedocs.io/loading-your-data/index.html#datetime-format

@NejcDebevec NejcDebevec removed their assignment Sep 6, 2019
@janezd janezd removed snack This will take an hour or two good first issue labels Dec 6, 2019
@PrimozGodec PrimozGodec added bug report Bug is reported by user, not yet confirmed by the core team and removed bug report Bug is reported by user, not yet confirmed by the core team labels Dec 24, 2019
@PrimozGodec
Copy link
Contributor

PrimozGodec commented Dec 24, 2019

The issue when it is not possible to switch to time was solved in #4226. Now all variables that contain string in ISO date-time format are automatically recognized as time variables. So they will become a time variable and not a categorical variable. This way all of them have the option to be switched to categorical and back to time.

I discovered a new potential issue here. When I opened data provided by @tbuttle, few variables that before showed the data itself are now strings with excel formulas. Is it intentional? @VesnaT
It probably started to happen after #4279

Screenshot 2019-12-24 13 07 06

@PrimozGodec
Copy link
Contributor

Previously described bug fixed in #4299 and initial bug with time-variable fixed in #4226.

@jpstarpaper
Copy link

jpstarpaper commented Dec 9, 2020

Ajda, regarding the recommendation that we should read the doc at:
https://orange-visual-programming.readthedocs.io/loading-your-data/index.html#datetime-format

"SORRY /
\ /
\ This page does /
] not exist yet."

It may have existed at one time, but it's gone now.

I'm dealing with the same datetime error, for a csv with the date formated as 05-30-2020. Orange outputs: "TypeError: dtype bool cannot be converted to datetime64[ns]"

The doc for the file widget (as an overview, not a drill down) is here (but it deals with datetime in a cursory manner and doesn't offer enough info to overcome the TypeError):

https://orange3.readthedocs.io/projects/orange-visual-programming/en/latest/widgets/data/file.html

Error/The Solution:

I imported a set of Tweets, then exported to csv. The csv had a date format of MM/DD/YEAR, but the ISO 8601 standard (as others noted above) is YEAR-MM-DD. That's the reason for the error when attempting to import the csv into Orange.

My solution was to open the csv in Numbers (on Mac) and follow the instructions here for reformatting the date column (the time column doesn't need reformatted if it's in Hours-Minutes-Seconds xx-xx-xx): https://support.apple.com/en-in/guide/numbers/tan23393f3a/mac

On that link you'd open the "Date and time" dropdown. Obviously I'm pretty new to Orange and coding as a whole if I have to look up info on changing the format of a date variable!

If you're not on Mac, you can edit the spreadsheet by uploading it to GoogleDrive and following the date time reformat instructions here:

https://support.google.com/docs/answer/56470?co=GENIE.Platform%3DDesktop&hl=en

This might also be a solution (but a far more complicated one): "Python Datetime Tutorial: Manipulate Times, Dates, and Time Spans"

https://www.dataquest.io/blog/python-datetime-tutorial/

@sjscotti
Copy link

sjscotti commented Apr 2, 2022

I had an issue with importing clippings from a newspaper archive site when I wanted to get a histogram of the number of articles binned by year. The site exported publication dates as strings (which is a meta) in a format like this example, 13 Dec 1899, Wed. So I ended up selecting the publication date column (using the Select Column widget), and piping that into a Python Script widget that looked like this...

import datetime
from Orange.data import Domain, Table, TimeVariable, StringVariable

#Make a new domain having a TimeVariable column, 'newDate' 
new_domain = Domain([TimeVariable("newDate")])

#Construct a new table based on the new domain, using inputdata
new_data = Table.from_domain(new_domain, n_rows = len(in_data))

#reformat the date to align to Orange's TimeVariable requirement
for idx, inst in enumerate(new_data):

    temp = in_data.metas[idx][0].split(',')[0].split(' ')   

    year = str(temp[2])
    month_name = str(temp[1])
    month = datetime.datetime.strptime(month_name, '%b').month
    formatted_month = f"{month:02}"   # need to have months be an ordinal two characters long
    day = str(temp[0])                # ordinal for day in month exported by archive site is already 2 characters long
    
    print(year + '-' + formatted_month + '-' +day)
    inst[0] = year + '-' + formatted_month + '-' +day # date reformatted as TimeVariable
    
out_data = new_data

The Distributions widget accepted the output from the Python Script and gave me the histogram of articles per year that I wanted.

@ajdapretnar
Copy link
Contributor

This was further fixed in #5819.
Also, the above link works for me (in different browsers, too).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests