Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit TimeConstrained to one per eval. #1202

Closed
wants to merge 11 commits into from
Closed

Conversation

rocky
Copy link
Member

@rocky rocky commented Dec 2, 2024

Allowing a TimeConstrained evaluation to contain another TimeConstrained evalution is tricky. We would have on the second Timeconstrained while one is already pending would force us to not blindly set SIGALRM but instead figure out which ALARM comes earlier and when that is hit handle the TimeConstrained and queue another SIGLARM the amout of time of the second evaluation.

So for now, punt and just don't allow a second TimeConstrained to run but instead return failure.

@rocky rocky requested a review from mmatera December 2, 2024 16:14
@rocky rocky force-pushed the at-most-one-TimeConstrained branch from 0cd97b0 to af3fc4b Compare December 2, 2024 17:22
done = False
if self.is_running_TimeConstrained:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way to check if we have already set a time constraint is to check the length of evaluation.timeout_queue. I remember that with the previous implementation, this kind of nested constraint used to work.

@mmatera
Copy link
Contributor

mmatera commented Dec 2, 2024

At least locally, with the version in the master branch,

TimeConstrained[TimeConstrained[Integrate[Sin[x] ^ 1000, {x,0,Pi}],.1],.1],
TimeConstrained[TimeConstrained[Integrate[Sin[x] ^ 1000, {x,0,Pi}],10],.1],

TimeConstrained[TimeConstrained[Integrate[Sin[x] ^ 1000, {x,0,Pi}],0.1],10]
and
TimeConstrained[TimeConstrained[Integrate[Sin[x] ^ 1000, {x,0,Pi}],10],10]

work as expected. The issue seems to happen when both wall times are similar, and coincides with the time that
the evaluation actually takes. For example, I see in my system that

In[2]:= TimeConstrained[TimeConstrained[Integrate[Sin[x] ^ 1000, {x,0,Pi}],.1],.1]
Exception ignored in: <function WeakSet.__init__.<locals>._remove at 0x760efa204dc0>
Traceback (most recent call last):
  File "/home/mauricio/.conda/envs/pystonmathics/lib/python3.8-pyston2.3/_weakrefset.py", line 38, in _remove
stopit.utils.TimeoutException: 
Out[2]= 4223253764772446398681479587905863679627375131977316984490513673541022323523784279665820061320349533833790720078461657887534527344541873711717097182804976178494773191621120833690038501245904489755696471267492150071422577902441998133040640534678543005733060259424013478163819189207764154121872206505 Pi / 167423219872854268898191413915625282900219501828989626163085998182867351738271269139562246689952477832436667643367679191435491450889424069312259024604665231311477621481628609147204290704099549091843034096141351171618467832303105743111961624157454108040174944963852221369694216119572256044331338563584

produces the exception one-third of the time. Probably, the way to handle this is just by capturing that exception.

@rocky
Copy link
Member Author

rocky commented Dec 2, 2024

At least locally, with the version in the master branch,

"At least locally" isn't good enough. Recall that we have also been seeing failures in CI tests with TimeConstraint.

Right now, my goal is to have something that unblocks @aravindh-krishnamoorthy. For this, there does not have to be a long-term solution. If this branch works, it can stay a branch and not get merge.

If checking evaluation.timeout_queue instead of the variable is_running_TimeConstrained works better, great! let's use that then. It is probably more reliable.

produces the exception one-third of the time. Probably, the way to handle this is just by capturing that exception.

Looking for specific exceptions is hacky and fragile. A solution like this is likely to break on different Python versions, Python implementations, and Operating Systems. This kind of thinking (it seems to work here - I just need to hack some special cases) leads us down a rabbit hole that I don't think we'll be ever able to climb back out of.

What I'd like to see is something general and simple, and that can work with Rubi. If we have to sacrifice powerfulness, and Rubi searching quality that's okay. Once we have something working we can try to improve things.

@rocky rocky marked this pull request as draft December 2, 2024 21:40
@rocky rocky force-pushed the at-most-one-TimeConstrained branch from af3fc4b to 390f7ea Compare December 2, 2024 21:47
@mmatera
Copy link
Contributor

mmatera commented Dec 2, 2024

@rocky , what I was trying to understand is what is the actual problem with nested TimeConstraint expressions. Is an issue of the stopit module, or is about how we use it? Is it related with this random failures in the tests?

@rocky
Copy link
Member Author

rocky commented Dec 2, 2024

@rocky , what I was trying to understand is what is the actual problem with nested TimeConstraint expressions. Is an issue of the stopit module, or is about how we use it? Is it related with this random failures in the tests?

glenfant/stopit#17 suggests that others have had a problem when nesting in the same thread.

@aravindh-krishnamoorthy
Copy link
Collaborator

Hello @rocky. Firstly, thank you very much for identifying the underlying cause and unblocking me. Indeed, with this fix, Rubi 1 Algebraic functions tests run within a reasonable timeframe. However, unfortunately, there's still a problem.

This fix breaks the function ValidAntiderivative in Test.m, which validates "suboptimal" results that have a higher leafcount. (Presently, we generate many "suboptimal" antiderivatives and I see optimising them as Step #2. So a fix will take some time). These "suboptimal" results are currently marked as invalid.

I'll try to get ValidAntiderivative working based on your fix in this PR. Once I find a solution, I'll push my changes to this PR for review.

@rocky
Copy link
Member Author

rocky commented Dec 3, 2024

This fix breaks the function ValidAntiderivative in Test.m, which validates "suboptimal" results that have a higher leafcount. (Presently, we generate many "suboptimal" antiderivatives and I see optimising them as Step #2. So a fix will take some time). These "suboptimal" results are currently marked as invalid.

I suspect we can get a better TimeConstrained function by having it create a new thread (up to some limit) for each expression that is to be time constrained. However, please let's not do this right now but, as you suggest, leave this for later as a second step.

We have lots out-right bugs in the code and missing functionality in certain built-in functions. If we could remove more of those first and get some sort of Rubi subset going, this would be great.

I'll try to get ValidAntiderivative working based on your fix in this PR. Once I find a solution, I'll push my changes to this PR for review.

Thanks. But also please, look for how we can break this large task into smaller well-defined pieces. Maybe just the first two or three sections of 1 Algebraic functions. The list in
Mathics3/Mathics3-Rubi#2 is already enough to start working on.

@mmatera
Copy link
Contributor

mmatera commented Dec 3, 2024

@rocky , what I was trying to understand is what is the actual problem with nested TimeConstraint expressions. Is an issue of the stopit module, or is about how we use it? Is it related with this random failures in the tests?

glenfant/stopit#17 suggests that others have had a problem when nesting in the same thread.

Yep. The old code I wrote handle this by keeping a queue of calls, and using different threads on each call. Maybe during my holidays I can try to propose a fix for stopit, but I need to study better how is currently implemented.

@aravindh-krishnamoorthy
Copy link
Collaborator

aravindh-krishnamoorthy commented Dec 3, 2024

In the above commits (16de5ec, be4fe46), during a recursive call, instead of returning failexpr when a TimeConstrained call is already underway, the time constraint is ignored and expr is evaluated as usual. Hence, only the outermost time constraint is honoured.

With this change, a mini-test with $76$ cases runs to completion as expected.

I'll now run the full 1 Algebraic functions suit (will take a day or two to finish) and continue with the rest of the items on the Rubi PR.

@rocky
Copy link
Member Author

rocky commented Dec 3, 2024

Yep. The old code I wrote handle this by keeping a queue of calls, and using different threads on each call. Maybe during my holidays I can try to propose a fix for stopit, but I need to study better how is currently implemented.

Getting a fix/PR for in the stopit repository would be awesome. That code hasn't significantly changed in about 6 years or when Python 3.6 was around. Python has probably changed threading since then and different kinds of threads used in Python is on the horizon.

@aravindh-krishnamoorthy
Copy link
Collaborator

aravindh-krishnamoorthy commented Dec 4, 2024

Unfortunately, there still seems to be an issue. The issue is not immediately apparent.

When running long Rubi tests, TimeConstrained randomly? stops timing out, even if it's the top level call. I am not sure if this is related to missing the exception due to the "unraisable exception" issue or due to self.is_running_TimeConstrained = False being missed in some rare branch of execution...

@rocky
Copy link
Member Author

rocky commented Dec 4, 2024

Unfortunately, there still seems to be an issue. The issue is not immediately apparent.

When running long Rubi tests, TraceEvaluation randomly? stops timing out, even if it's the top level call. I am not sure if this is related to missing the exception due to the "unraisable exception" issue or due to self.is_running_TimeConstrained = False being missed in some rare branch of execution...

I need more information here. If you have logs that show both what what you invoked and what you got, that would be helpful.

Why is TraceEvaluation needed? Finally, I should mention that signal handling was added to the Mathics debugger.

Here again is how to use this:

image

Without the debugger, but with trepan3k installed, you can use Breakpoint[], and issue the handle command. You won't get as nice of a traceback, but it should still work.

If you want to go into the debugger and look at is_running_TimeConstained change no-stop to stop.

@aravindh-krishnamoorthy
Copy link
Collaborator

Why is TraceEvaluation needed? Finally, I should mention that signal handling was added to the Mathics debugger.

Very sorry for this, @rocky. I actually meant TimeConstrained[] and not TraceEvaluation[]. I mixed them up. So, the right paragraph is...

When running long Rubi tests, TimeConstrained randomly? stops timing out, even if it's the top level call. I am not sure if this is related to missing the exception due to the "unraisable exception" issue or due to self.is_running_TimeConstrained = False being missed in some rare branch of execution...

I'm working on finding a small reproducible example. But this seems to happen randomly, which makes debugging a bit difficult.

@axkr
Copy link

axkr commented Dec 4, 2024

I'm working on finding a small reproducible example. But this seems to happen randomly, which makes debugging a bit difficult.

An idea which I haven't tested yet. Maybe you can implement TimeRemaining[] and test nested TimeConstrained like this:

@rocky
Copy link
Member Author

rocky commented Jan 17, 2025

Currently using another hacky solution.

@rocky rocky closed this Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants