Random number weirdness #12

jmschrei · 2014-05-07T19:46:48Z

I've been having this "bug" for a little bit, wanted to see if anyone else knew about it.

When I write test code, I seed random.seed(0). I will then randomly generate a sequence to test, with the assumption that the sequence will be the same each time, since I set the seed.

Occasionally, what will happen is that the first time I run a program, I will get sequence A, then every other time I will get sequence B, just by rerunning the code. All yahmm operations function appropriately, it's just that the random seed different. If I modify yahmm.pyx in any way (even to add comments), I will get sequence A again, then sequence B every other time.

Any thoughts?

The text was updated successfully, but these errors were encountered:

nipunbatra · 2014-05-08T07:02:45Z

Would you want to try setting the seed using numpy and see if you get the same behavior.

jmschrei · 2014-05-08T07:03:47Z

I do set the seed using random. If there were an issue where the seed was changed every iteration, I wouldn't get B constantly after the first trial.

adamnovak · 2014-07-21T22:54:15Z

OK, I've been looking at this issue today. I was trying to add support for running the proposed nose tests with python setup.py test as well as through nose directly with nosetests. Depending on which way I ran the tests, I would get different results for the things that depend on random numbers. The two approaches were building the Cython module slightly differently, and producing slightly different .so files, but the differences in the C code were all in obscure macro arguments and didn't look to have much to do with randomness.

My conclusion is that the global state of the random module is the problem, and that it somehow manages to not be properly shared between Python and Cython, amybe in a way that somehow depends on import order. I put a seed call in the actual Cython model sample function, and that alleviated the first-run-after-deleting-the-built-library-vs-other-runs problem for at least one of the test execution methods. But the different methods still gave different results.

I think if we want this to work properly, we need to move away from the Python random module. It might be best to use something that doesn't use global state for the RNG, for that matter.

We could also try making sure that all the functions called in the course of sampling are pure Python, for which we'd probably have to move them outside the .pyx file. This would probably make sampling super slow.

tlnagy · 2014-07-30T14:06:15Z

Would it be possible to stick to rand from stdlib for all of yahmm's random number usage and just add a convenience function to seed this from python (using srand)?

adamnovak · 2014-07-30T15:52:02Z

It would be possible, but we'd have to re-work some of the distribution
implementations. We rely on the Python random library's implementations for
sampling from standard things like normal distributions.

On Wed, Jul 30, 2014 at 7:06 AM, Tamas Nagy [email protected]
wrote:

Would it be possible to stick to rand from stdlib for all of yahmm's
random number usage and just add a convenience function to seed this from
python (using srand)?

—
Reply to this email directly or view it on GitHub
#12 (comment).

jmschrei added bug labels Jul 19, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random number weirdness #12

Random number weirdness #12

jmschrei commented May 7, 2014

nipunbatra commented May 8, 2014

jmschrei commented May 8, 2014

adamnovak commented Jul 21, 2014

tlnagy commented Jul 30, 2014

adamnovak commented Jul 30, 2014

Random number weirdness #12

Random number weirdness #12

Comments

jmschrei commented May 7, 2014

nipunbatra commented May 8, 2014

jmschrei commented May 8, 2014

adamnovak commented Jul 21, 2014

tlnagy commented Jul 30, 2014

adamnovak commented Jul 30, 2014