Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make LDAResults._expElogbeta() method constant #54

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ApproximateIdentity
Copy link
Contributor

The method was being computed each time it was called, but all data used
to do the computation is constant throughout the life of the class.

The method was being computed each time it was called, but all data used
to do the computation is constant throughout the life of the class.
@ApproximateIdentity
Copy link
Contributor Author

By the way this isn't a minor issue. This speeds up my calls to the predict method https://github.com/ApproximateIdentity/rosetta/blob/4aa21a62719be5bb32f17ea1e7454742365257c2/rosetta/text/vw_helpers.py#L471 by ~15 times on real world data.

@dkrasner
Copy link
Contributor

@ApproximateIdentity this looks fine

did you run an integration test just for the sake of sanity?

@ApproximateIdentity
Copy link
Contributor Author

ApproximateIdentity commented Sep 22, 2017

First regarding your question:

So actually I didn't and there is a new failure, but I think it has more to do with how the test is written than anything. This test fails:

https://github.com/columbia-applied-data-science/rosetta/blob/master/rosetta/tests/test_text.py#L326-L334

But really what I'm wondering is, should the following even be allowed: https://github.com/columbia-applied-data-science/rosetta/blob/master/rosetta/tests/test_text.py#L330-L331

I.e. should there ever be a situation when the user sets the _lambda_word_sums attribute directly? That attribute is dependent on the topic data passed in: https://github.com/ApproximateIdentity/rosetta/blob/constExpElogbeta/rosetta/text/vw_helpers.py#L321

So basically I feel like the test here itself is using an internal API in a way that's inconsistent with the class. Personally I would just strip out the test entirely, but I'm not sure if that would be a popular way to go.

Secondly I pushed a bunch more commits because there are different tests that aren't working right. My fixes don't fix all tests...there are still issues with a couple tests, but I haven't quite sorted out why they're failing yet.

edit: Actually the more I mess with this it seems like it's almost easiest to just pin pandas<=0.16 or whatever in requirements.

@ApproximateIdentity ApproximateIdentity force-pushed the constExpElogbeta branch 2 times, most recently from fd8d7c8 to a574a0e Compare October 2, 2017 10:34
@ApproximateIdentity
Copy link
Contributor Author

I removed my commits that were updating tests for the newer versions of pandas and instead pinned the pandas version to 0.16.2 in the requirements file. I've verified that the tests all pass with that version (except for the test that I removed).

Personally I think this is now ready to be merged.

There have been backwards-incompatible changes in later versions of
pandas for which rosetta has not been updated. Until rosetta is updated,
the version should be pinned to reflect this.
This test was using unexposed internal class methods in a way which was
incompatible with legitimate usage. It worked before due to how the
internals of the class were desiged, but no longer due to the fact that
the LDAResults._expElogbeta() method is const.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants