Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Chunkstore mongo read query #851

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BaiBaiHi
Copy link
Contributor

@BaiBaiHi BaiBaiHi commented Apr 2, 2020

Optimize chunkstore read query by joining metadata collection instead of querying for metadata on each individual date which accounted for about 40% of the processing time. Profile results below are run on data with 365 chunks.

Original Profile
image

New Profile
image

@BaiBaiHi BaiBaiHi force-pushed the optimize_chunkstore_read_query branch from f382365 to 38a138c Compare April 2, 2020 22:22
@BaiBaiHi
Copy link
Contributor Author

BaiBaiHi commented Apr 3, 2020

Looks like tests are failing since it's trying to build with mongodb version 3.4.10.
Syntax for the $lookup/$let pattern is only available after mongodb version 3.6...

@bmoscon
Copy link
Collaborator

bmoscon commented Apr 3, 2020

@BaiBaiHi there are other things deprecated that are breaking the build as well. I'm working on fixing them. @jamesblackburn whats the min mongodb version we need to support? 3.4 is quite old, can we bump it to 3.6 or 4.0+?

@jamesblackburn
Copy link
Contributor

Yep I think we can bump to 4.0.x (CC @rob256 )

@rob256
Copy link
Contributor

rob256 commented Apr 6, 2020

Yep I think we can bump to 4.0.x (CC @rob256 )

Yep, let's go to the latest 4.0 (4.0.17).

@bmoscon
Copy link
Collaborator

bmoscon commented Apr 6, 2020

I've already updated the version of mongo on my branch that addresses a lot of the deprecations and what not, so once thats merged in, this branch can be rebased off that

@bmoscon
Copy link
Collaborator

bmoscon commented Apr 7, 2020

@BaiBaiHi can you rebase your fork off mainline? I updated many things to fix the build, so assuming after that the tests pass on this PR we can get it merged

@BaiBaiHi BaiBaiHi force-pushed the optimize_chunkstore_read_query branch from 38a138c to ed84b68 Compare April 7, 2020 15:56
@BaiBaiHi
Copy link
Contributor Author

BaiBaiHi commented Apr 7, 2020

@bmoscon
Looks like the Python 3 build is failing for tests/integration/tickstore/test_ts_write.py: test_ts_write_pandas due to column order.

The change I made should have no effect on tickstore and it looks like it's only an issue in the python 3 build.

@bmoscon
Copy link
Collaborator

bmoscon commented Apr 7, 2020

@BaiBaiHi sure but can you fix the style issues?

arctic/chunkstore/chunkstore.py:278:55: W291 trailing whitespace
arctic/chunkstore/chunkstore.py:282:27: E261 at least two spaces before inline comment
arctic/chunkstore/chunkstore.py:282:134: W291 trailing whitespace

@BaiBaiHi
Copy link
Contributor Author

BaiBaiHi commented Apr 7, 2020

Yeah of course. My bad. Just set up my IDE on this machine. Didn't realize that my auto-formatter/checker wasn't turned on.

@BaiBaiHi BaiBaiHi force-pushed the optimize_chunkstore_read_query branch from ed84b68 to 896b95e Compare April 7, 2020 17:18
@bmoscon
Copy link
Collaborator

bmoscon commented Apr 7, 2020

@jamesblackburn it seems reasonable and it passes all the tests. I'm not sure how it will perform if actually need to spill out onto disk

@bmoscon
Copy link
Collaborator

bmoscon commented May 2, 2020

@BaiBaiHi @jamesblackburn what do you want to do with this?

@bmoscon
Copy link
Collaborator

bmoscon commented Sep 4, 2020

@jamesblackburn - just another ping - I think this seems reasonable from my testing

@shashank88
Copy link
Contributor

Should we merge this?

@crazy25000
Copy link

If it's good to go we should merge this. If not, are there outstanding issues that I could help with? Would love to have consistent performance.

@TomTaylorLondon
Copy link
Contributor

Can we hold off merging

You are correct in the issue diagnostics but the root cause of this is a missing index for the query. I have a fix which is simply adding the correct index and should be more efficient. Let me put out a PR and you can test locally? Let me know if this works!

Tom

Copy link
Contributor

@TomTaylorLondon TomTaylorLondon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test new fix

{'$match': spec},
{'$lookup': {
'from': self._mdata.name,
'let': {'symbol': '${}'.format(SYMBOL),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget, are we still supporting python 2.7? If not, can we use f-strings here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomTaylorLondon - assuming you're using something like this in your code, the comment above is for you

@TomTaylorLondon
Copy link
Contributor

See #902

@vietlq
Copy link

vietlq commented Apr 10, 2022

@BaiBaiHi this is an exciting PR. Do you want to carry on and test against fix #902?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants