You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description of problem and/or code sample that reproduces the issue
I noticed that if I save a dataframe where the UTC date carries over to the next day, most functions (reverse_iterator, get_chunk_ranges, get_info, ...) don't return the chunk for the new date. The following example will make this clear (jupyter notebook attached in the zip file):
Set Up
import pandas as pd
from arctic import Arctic, CHUNK_STORE
store = Arctic("localhost")
store.initialize_library("scratch_lib", lib_type=CHUNK_STORE)
lib = store["scratch_lib"]
Create an Index with some times that will change dates when converted to UTC
@bmoscon#384 is probably related to this issue. Aside from the simple example above, I am saving 1-minute frequency data with a chunk size of D, similar to #384 and noticed that I was not able to get the data for the last day where UTC date had rolled over to the next day, and the chunk was missing from the reverse_iterator.
Arctic Version
Arctic Store
Platform and version
Python 3.8.5
Description of problem and/or code sample that reproduces the issue
I noticed that if I save a dataframe where the UTC date carries over to the next day, most functions (reverse_iterator, get_chunk_ranges, get_info, ...) don't return the chunk for the new date. The following example will make this clear (jupyter notebook attached in the zip file):
Set Up
Create an Index with some times that will change dates when converted to UTC
Output:
DatetimeIndex(['2012-12-08 16:00:00-05:00', '2012-12-08 18:00:00-05:00', '2012-12-08 20:00:00-05:00', '2012-12-08 22:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', name='date', freq=None)
print(ind.tz_convert("UTC"))
Output
DatetimeIndex(['2012-12-08 21:00:00+00:00', '2012-12-08 23:00:00+00:00', '2012-12-09 01:00:00+00:00', '2012-12-09 03:00:00+00:00'], dtype='datetime64[ns, UTC]', name='date', freq=None)
Create dataframe, write it to the library, and read it back out
Output
date col
2012-12-08 21:00:00 1
2012-12-08 23:00:00 2
2012-12-09 01:00:00 3
2012-12-09 03:00:00 4
This is different from what I expected. Is this behavior expected?
lib.get_info("example_df")
Output
{'chunk_count': 1,
'len': 4,
'appended_rows': 0,
'metadata': {'columns': ['date', 'col']},
'chunker': 'date',
'chunk_size': 'D',
'serializer': 'FrameToArray'}
>> expected chunk_count = 2, not 1
list(lib.get_chunk_ranges("example_df"))
Output
[(b'2012-12-08 00:00:00', b'2012-12-08 23:59:59.999000')]
>> expected [(b'2012-12-08 00:00:00', b'2012-12-08 23:59:59.999000'), (b'2012-12-09 00:00:00', b'2012-12-09 23:59:59.999000')]
Output
date col
2012-12-08 21:00:00 1
2012-12-08 23:00:00 2
**>> expected the following:
date col
2012-12-09 01:00:00 3
2012-12-09 03:00:00 4
date col
2012-12-08 21:00:00 1
2012-12-08 23:00:00 2**
arctic_issue_example.zip
The text was updated successfully, but these errors were encountered: