Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement combinator that forbids descheduling #180

Open
1 task done
jasagredo opened this issue Oct 8, 2024 · 29 comments
Open
1 task done

Implement combinator that forbids descheduling #180

jasagredo opened this issue Oct 8, 2024 · 29 comments
Labels
enhancement New feature or request

Comments

@jasagredo
Copy link
Contributor

jasagredo commented Oct 8, 2024

Is your feature request related to a problem? Please describe.

Some actions are expected to be performed immediately after others, with prompt execution. I can think of two specific cases:

  • When running an async action, the TVar with the result is written, then the thread finishes. If this step is not promptly executed, the thread can remain alive leading to SloppyShutdowns.
  • When observing values from an IOSim action, it is often done via a TChan or TQueue to which we push messages when some step completed. If these are delayed, the values sent to the channel can be inconsistent with what we would like to observe from the other side of the channel.

Describe the solution you'd like

I would like to have a combinator that specifies an action cannot be descheduled, i.e. that there can't be a Yield or a Reschedule in between two actions. Something like the following:

foo :: IOSim s a
bar :: IOSim s b

promptly :: IOSim s a -> IOSim s b -> IOSim s (a, b)

baz :: IOSim s (a, b)
baz = do 
  foo `promptly` bar 

or

foo :: IOSim s a
bar :: IOSim s ()

promptly :: IOSim s a -> IOSim s () -> IOSim s a

baz :: IOSim s a
baz = do 
  foo `promptly` bar 

When evaluating foo we could look ahead and if the next action was marked promptly then forbid a reschedule.

Describe alternatives you've considered

Additional context

Could be used in the QSM PR #179. Now I am doing a hack which is checking that before sending the message to the channel that same thread id has done some action that resembles some "work" (see hasDoneWork), which was just an ad-hoc solution.

Are you willing to implement it?

  • Are you? 😃 Yes with some mentoring 😄
@jasagredo jasagredo added the enhancement New feature or request label Oct 8, 2024
@bolt12
Copy link
Contributor

bolt12 commented Oct 8, 2024

From my small understanding of IOSimPOR internals this should be doable without much complexity. IIRC there's a function that deals with the race reversal logic and that would be the place to start looking

@coot
Copy link
Collaborator

coot commented Oct 8, 2024

If these are delayed

Would it help if you don't mark the thread which observes the value as racy? Then IOSimPOR will not take it into account and reorder its steps relative to other threads.

I wonder if you can use traceTVar for your purpose. It has the semantics you're looking for: it is executed always in the same step as the stm transaction.

As for implementation. You'd need to add a constructor to SimA type with the right type, e.g.

promptly :: m a -> m b -> m b

(that's slightly more general what you suggest).
Then we will need Promplty :: SimA s a -> SimA s b -> SimA s b constructor of SimA or something similar (I am using GADT notation).

@jasagredo
Copy link
Contributor Author

The problem is not the thread that observes the value but the one that emits it. I would like to make sure that the emission is promptly executed.

@coot
Copy link
Collaborator

coot commented Oct 9, 2024

And what about traceTVar or something similar? That's what allows to observe committed values to TVars (we could also add it for other data structures, since all shared variables are now modeled by TVars, it should be doable - we already do that 😁).

@coot
Copy link
Collaborator

coot commented Oct 9, 2024

I think I got what you want to do in the QSM PR, basically:

  cresp <- semantics sut ccmd
           `promptly`
           atomically (writeTQueue tq (ccmd, cresp))

That's why the type signature you want is IOSim s a -> IOSim s () -> IOSim s a. To answer myself: that's not possible with traceTVar.

If we do:

promptly :: IOSim s a -> IOSim s b -> IOSim s (a,b)

Then it could be used to also to readTVar inside the action (an alternative for traceTVar, but only available for IOSim).

It's implicitly included in your request just to add it to IOSim without a type class in io-classes - which I think is fine.

It might be simpler to implement:

promptlySTM :: IOSim s a -> STM s b -> IOSim s (a,b)

You'd want the second argument to be executed atomically. In the case of IOSim s b, this will require heavily modifying the scheduler (which could be error prone concerning POR), while STM is already atomic. This would work for both async and your PR.

@coot
Copy link
Collaborator

coot commented Oct 10, 2024

@jasagredo we came up with an idea to add an operator:

withRaces :: Sim s a -> Sim s a

which enables race discovery for thread that runs the computation (and all thread that are forked from it), but only limited to the Sim s a passed to it.

@coot
Copy link
Collaborator

coot commented Oct 10, 2024

Hmm, I actually need to look at the default IOSimPOR scheduler if it will allow us to solve your issue.

@coot
Copy link
Collaborator

coot commented Oct 10, 2024

I looked at my presentation. Each thread is rescheduled in these conditions (regardless if a thread is racy or not)

  • forking a new thread
  • thread termination
  • setting the masking state to interruptible
  • popping masking frame (which resets masking state)
  • thread delays
  • execution of an STM transaction
  • blocking throwTo

It seems the only option is to add promptlySTM.

@bolt12
Copy link
Contributor

bolt12 commented Oct 10, 2024

Why do you say that? If you don't call exploreRaces IOSimPOR won't find races right?

@coot
Copy link
Collaborator

coot commented Oct 11, 2024

But it will deschedule the current thread and it might pick another one, isn't it?

@coot
Copy link
Collaborator

coot commented Oct 11, 2024

IOSim scheduler runs a thread until it blocks, IOSimPOR uses different scheduler.

@bolt12
Copy link
Contributor

bolt12 commented Oct 11, 2024

Yes and that's okay, if the thread blocks we can't run it anyway, it would be a problem if this combinator would cause deadlocks.

@coot
Copy link
Collaborator

coot commented Oct 14, 2024

If we have promptly :: IOSim s a -> IOSim s b -> IOSim s (a, b) then the second argument, e.g. IOSim s b, could be non-atomic (e.g. run multiple steps). Which would be surprising from the user perspective. We couldn't guarantee that its all steps are promptly executed.

If we have promptlySTM :: IOSim s a -> STM s b -> IOSim s (a, b) then the stm action is atomic, we only need to make sure the stm action is promptly executed. Btw, promptness has nothing to do with race discovery; we need to make sure that if the next step is executed through it, we don't reschedule the thread when we start to run it. We can leave the racy flag on. Still, if the STM s b action is blocking, then it wouldn't be executed promptly. We could include a warning in the trace for such a case, since this is a user error.

@bolt12
Copy link
Contributor

bolt12 commented Oct 14, 2024

Does IOSimPOr only chooses to delay a racy thread, or does that happen for non-racy ones? I think if main will run until it blocks/terminates if it doesn't fork any thread, for example. If main threads forks a thread then that child thread is scheduled ahead of main from the point it is forked. This is a problem for promptly and also as you say has nothing to do with promptness this is a scheduling issue.

We could have a combinator priorise that would make sure that thread has priority over others, in essence a combinator that in some way exposes the scheduling algorithm to the user. What we want is that the thread runs a given code block until completion (or it blocks) without any other thread being scheduled ahead of it during those action steps.

@coot
Copy link
Collaborator

coot commented Oct 14, 2024

IOSimPOR is using:

type RunQueue   = OrdPSQ (Down IOSimThreadId) (Down IOSimThreadId) ()

to pick the next thread to run. If a thread was forked in the previous step, we'll run that thread instead.

@bolt12
Copy link
Contributor

bolt12 commented Oct 14, 2024

I know, I am just saying it is easy to change that

@rjmh
Copy link
Contributor

rjmh commented Oct 16, 2024

I know, I am just saying it is easy to change that

Careful! It's easy to break the good shrinking behaviour of IOSimPOR too; the default scheduling strategy is a key part of that.

@rjmh
Copy link
Contributor

rjmh commented Oct 16, 2024

One risk I see with promptly is that it's unclear how to implement it in the normal IO monad. And I suspect its purpose is to perform logging actions "promptly" in system code; that is, it might end up being used in production code. The risk is future developers might use promptly to write production code that works in simulation, but not in reality.

@rjmh
Copy link
Contributor

rjmh commented Oct 16, 2024

It might be tempting to try to give a thread highest priority temporarily, to ensure that logging actions immediately follow whatever they are logging. That's a bit difficult in IOSimPOR as is, because a thread's priority is its thread Id, and that can't change. One could add a new kind of thread Id, of course, with priority higher than all existing threads, and then spawn a new thread of this type to perform the action-to-be-logged and the logging. The question is what to do if such a thread blocks? Then another thread might run, and create another highest-priority thread, and then both of them might become runnable at the same time. Would we then have two threads with the same thread Id? Not possible, with the existing IOSim(POR) data structures. Would one 'highest priority' thread have priority over the other? In that case the immediacy 'guarantee' would be broken. I have the feeling that it should be an error for such a thread to block AT ALL.

Hmm, perhaps one could add a boolean flag unyielding to the state of a thread. Then deschedule Yield would simply reschedule an unyielding thread. There could be a combinator
unyielding :: m a -> m a
which sets and unsets the unyeilding flag around an action. It would need to catch exceptions and unset the unyeilding flag in the handler. Arguably any other call of deschedule should result in an error if the unyielding flag is set--that is, unyeilding code would be forbidden to block. It could be implemented in the IO monad by taking a global lock. Nasty and inefficient, but you can't have atomicity by optimistic concurrency and non-transactional effects at the same time.

@rjmh
Copy link
Contributor

rjmh commented Oct 17, 2024

Does IOSimPOr only chooses to delay a racy thread, or does that happen for non-racy ones?

It only delays racy threads. That's the difference between them. The idea is that non-racy threads are part of the test code, not the system under test, so races between them are uninteresting. The schedule exploration that IOSimPOR does is exponential in the number of races to reverse, so getting rid of races with the test threads makes things much, much faster.

@dcoutts
Copy link
Contributor

dcoutts commented Oct 17, 2024

Would it not solve 90% of our problems if we could do simulation tracing from within an STM transaction?

That way we can ensure that a sim event log message is atomic with respect to other actions in a STM transaction.

It would be straightforward to provide a sim-only primitive for doing tracing from within STM rather than jut from within IO.

@bolt12
Copy link
Contributor

bolt12 commented Oct 17, 2024

For tracing yes, but this issue is not about logging but rather about too much granularity on the observable events. @jasagredo wants to have more control on the granularity of observable events so that he can avoid IOSimPOR finding races that he doesn't care about

@jasagredo
Copy link
Contributor Author

To be precise, the exact use case I wanted this for (logging exactly the actions in the order they occurred) is no longer necessary. For my purposes I can just search for the existence of an interleaving in which the logs would make sense, so it is fine by me to not add this combinator.

@rjmh
Copy link
Contributor

rjmh commented Oct 17, 2024

Hmm. I wonder whether we really want to tell IOSimPOR that there are some effects we don't care about. For example, if logging updates a TVar then the order of logging operations may affect the order of entries in the log--and IOSimPOR will try to reverse logging operations to the same log, because they race with each other. That could lead to logging operations being moved away from the events they are logging. But if we don't care about the order of entries in the log, then we could say those operations DON'T race, and so IOSimPOR would not try to reverse them. I take logging just as a concrete example here, but there could be other operations such as inserting elements into a Set in a TVar; IOSimPOR will treat two separate insertions as racing, but in fact the order doesn't matter (as long as no other thread observes the intermediate states). It's a bit subtle to get right, but it might be that providing some user control over which higher-level operations are considered to race would be helpful. If it reduced the number of interleavings to be considered, then that could really be useful.

@bolt12
Copy link
Contributor

bolt12 commented Oct 17, 2024

Hmm. I wonder whether we really want to tell IOSimPOR that there are some effects we don't care about. For example, if logging updates a TVar then the order of logging operations may affect the order of entries in the log--and IOSimPOR will try to reverse logging operations to the same log, because they race with each other. That could lead to logging operations being moved away from the events they are logging. But if we don't care about the order of entries in the log, then we could say those operations DON'T race, and so IOSimPOR would not try to reverse them. I take logging just as a concrete example here, but there could be other operations such as inserting elements into a Set in a TVar; IOSimPOR will treat two separate insertions as racing, but in fact the order doesn't matter (as long as no other thread observes the intermediate states). It's a bit subtle to get right, but it might be that providing some user control over which higher-level operations are considered to race would be helpful. If it reduced the number of interleavings to be considered, then that could really be useful.

Yes I agree with this reasoning and think it would be very nice to debug large pieces of software as well if we could turn off/or enable race discovery only for particular places in the code.

@coot
Copy link
Collaborator

coot commented Oct 18, 2024

There are two subtly different things:

  1. forcing the scheduler to run something right after an action (which can be seen as extending the duration of a step)
  2. not reversing races between some of the steps (IOSimPOR 'thinks' in steps rather than in IO actions)

We have a form of the first one, for observing committed values in TVars. But the API we provide isn't enough for @jasagredo's use case.

So far I thought that 1. is stronger than 2. but now I think that under the current IOSimPOR scheduler it's not the case. This is because IOSimPOR schedulers threads based on Down ThreadId order (so steps from the same thread are rescheduled until the thread blocks). But this won't be the case if we add different scheduler policies (e.g. a random one, which has some advantages). For this reason, a future-proof solution should be based on 1 rather than 2.

@coot
Copy link
Collaborator

coot commented Oct 18, 2024

@dcoutts we already have a form of tracing in IOSim's STM monad: traceSTM.

@coot
Copy link
Collaborator

coot commented Oct 18, 2024

We also have MonadSay instance for STMSim monad.

Unfortunately, it's not discoverable in haddocks - at least I cannot find a list of instances for associated types.

@rjmh
Copy link
Contributor

rjmh commented Oct 18, 2024

IOSimPOR schedulers threads based on Down ThreadId order (so steps from the same thread are rescheduled until the thread blocks). But this won't be the case if we add different scheduler policies (e.g. a random one, which has some advantages).

Not quite. IOSimPOR is careful to yield after every step that might unblock another thread, because that thread might have a higher priority. If one thread unblocks a higher priority one, then IOSimPOR switches to the latter, even though the former has not blocked.

Note that a random scheduler plays extremely badly with shrinking. Been there, done that. The counterexamples you get are pretty much useless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants