Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipline.run_algorithm start/end tz naive dates are incompatible with zipline.api.set_symbol_lookup_date and symbol #265

Open
jimwhite opened this issue Oct 1, 2024 · 0 comments

Comments

@jimwhite
Copy link

jimwhite commented Oct 1, 2024

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

Environment independent issue, running Python 3.11.9 on Darwin and Python 3.10.12 on Google Colab Linux both 64bit.

A generous hand selected subset of packages

Package                          Version
-------------------------------- -------------------
bcolz-zipline                    1.2.12
cffi                             1.17.1
cudf-cu12                        24.4.1
Cython                           3.0.11
empyrical                        0.5.5
empyrical-reloaded               0.5.11
exchange_calendars               4.5.6
geopandas                        1.0.1
glob2                            0.7
google                           2.0.3
holidays                         0.57
intervaltree                     3.1.0
Mako                             1.3.5
numpy                            1.26.4
pandas                           2.2.3
pandas-datareader                0.10.0
pandas-gbq                       0.23.1
pandas-stubs                     2.1.4.231227
pip                              24.1.2
pip-tools                        7.4.1
pyarrow                          17.0.0
pyarrow-hotfix                   0.6
pyfolio-reloaded                 0.9.8
pyparsing                        3.1.4
pytensor                         2.25.4
pytest                           7.4.4
python-apt                       2.4.0
python-box                       7.2.0
python-dateutil                  2.8.2
python-utils                     3.8.2
pytz                             2024.2
regex                            2024.9.11
requests                         2.32.3
requests-oauthlib                1.3.1
requirements-parser              0.9.0
seaborn                          0.13.1
simple-parsing                   0.1.6
six                              1.16.0
sympy                            1.13.3
ta-lib                           0.4.26
ta-lib-bin                       0.4.26
tables                           3.8.0
tabulate                         0.9.0
tokenizers                       0.19.1
toml                             0.10.2
tomli                            2.0.1
toolz                            0.12.1
types-pytz                       2024.2.0.20240913
typing_extensions                4.12.2
tzdata                           2024.1
tzlocal                          5.2
wheel                            0.44.0
xarray                           2024.9.0
zipline-polygon-bundle           0.1.4
zipline-reloaded                 3.1

Now that you know a little about me, let me tell you about the issue I am
having:

Description of Issue

I'm trying to make Clenow's S&P 500 momentum portfolio example work to demonstrate using my WIP Polygon.io Zipline data bundle and to make it easy-as-possible to get up and running with Zipline Reloaded on Colab. This is the rather large complete notebook: https://github.com/fovi-llc/trading_evolved/blob/main/Chapter%2012%20-%20Momentum/S%26P_500_Momentum_Model_using_Polygon.ipynb

As noted #227 (and demonstrated by the original version of the code in Clenow's book), at some time in the past zipline.run_algorithm accepted tz aware dates for start and end but then that stopped working. As I recently commented there, it appears to me the probably came Zipline upgrading to exchange_calendar 4.x.

This issue is about how zipline.run_algorithm can't work correctly if a tz naive date is used for end (and therefore start either) because it isn't possible to use zipline.api.symbol as it was intended (i.e. date sensitive symbol lookups).

As you can see at

def set_symbol_lookup_date(self, dt):
the zipline.api.set_symbol_lookup_date method always sets self._symbol_lookup_date to be UTC.

zipline.api.symbol at

def symbol(self, symbol_str, country_code=None):
then does this:

        _lookup_date = (
            self._symbol_lookup_date
            if self._symbol_lookup_date is not None
            else self.sim_params.end_session
        )

        return self.asset_finder.lookup_symbol(
            symbol_str,
            as_of_date=_lookup_date,
            country_code=country_code,
        )

Note that using self.sim_params.end_session for the default lookup date defeats the very purpose of date-sensitive symbol lookup (which is how I got here because the S&P 500 portfolio has symbols that don't denote the asset at the end of the session and the lookup needs to be done at the simulation's today).

I think this default for _lookup_date should change to be self.get_datetime().

lookup_symbol get us here

def _lookup_symbol_strict(self, ownership_map, multi_country, symbol, as_of_date):
which does this
if start <= as_of_date < end:

That will always fail if set_symbol_lookup_date was called because as_of_date will be tz-aware but start and end are tz-naive. And of course if set_symbol_lookup_date is not called then it will be sim_params.end_session which is now (unlike before the change to run_algorithm) tz-naive. And as noted above, not calling set_symbol_lookup_date means as_of_date will be the very end of the simulation which makes a date-sensitive symbol lookup unhelpful.

  • What did you expect to happen?
  1. zipline.api.symbol returns sid for a symbol as of the current (self.get_datetime()) of the simulation.

  2. Calling zipline.api.set_symbol_lookup_date should work with zipline.api.symbol so that the lookup is as of that date.

  • What happened instead?
  1. By default zipline.api.symbol returns the sid, if any, of a symbol as of the end of the simulation, regardless of the current simulation time it is called at.

  2. Calling zipline.api.set_symbol_lookup_date causes zipline.api.symbol to fail with a "Can't compare tz naive with tz aware" error.

Here is how you can reproduce this issue on your machine:

Reproduction Steps

For the problem with using self.sim_params.end_session as the default value for zipline.api.symbol:

  1. Set up run_algorithm simulation that starts before September 2020 (mine happened to be 2020-08-03) and ends after that (mine happened to be 2024-08-30).
  2. Call zipline.api.symbol on "CTL" (CenturyLink an S&P 500 constituent that changed symbol to LUMN).

For the problem that you can't work around that by changing the as of date:

  1. Set up run_algorithm
  2. Schedule a function that calls zipline.api.set_symbol_lookup_date then zipline.api.symbol.

What steps have you taken to resolve this already?

A lot of digging and hacking.

Anything else?

I can prepare a PR if there is agreement on the solution. I propose two changes:

  1. Change the default for zipline.api.symbol to be self.get_datetime() (instead of self.sim_params.end_session). AFAIK self.get_datetime() is always tz-aware but I don't know for sure.

  2. Change zipline.run_algorithm back to using tz-aware start and end so that the comparisons done by zipline.api.symbol work correctly (actually the requirement isn't so much about the args but that the values passed to SimulationParameters need to be tz-aware).

Sincerely,
Jim White

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant