TASK: Prevent multiple catchup runs #4751

kitsunet · 2023-11-14T21:22:15Z

Will use caches to try avoiding to even start async catchups if for that projection one is still running.
Also adds a simple single slot queue with incremental backup to ensure a requested catchup definitely happens.

bwaidelich

This is great already, thanks a lot for your efforts!
I find it a bit hard to understand the queueing logic (even though we came up with it together *g)
Maybe that can be formulated a bit more explicitly such that it is easier to find potential issues.
I added some first suggestions inline

...toryRegistry/Classes/Factory/ProjectionCatchUpTrigger/SubprocessProjectionCatchUpTrigger.php

bwaidelich

This looks so much cleaner and easier to comprehend, thank you!
Two more comments that I wasn't aware before..
Let me know if you could use a rubber duck :)

Neos.ContentRepositoryRegistry/Classes/Command/SubprocessProjectionCatchUpCommandController.php

Neos.ContentRepositoryRegistry/Classes/Service/AsynchronousCatchUpRunnerState.php

bwaidelich

@kitsunet I thought about this one again realized that CATCHUPTRIGGER_ENABLE_SYNCHRONOUS_OPTION is a bad naming choice since this is not so much about sync vs async but more about whether we expect multiple parallel threads or not.
For example: I would like to be able to simply switch the catch up trigger to the synchronous one for individual CRs for performance reasons – but they would still have to be queued in order to prevent concurrent processes from running into the "Failed to acquire checkpoint" exception.

I have a, somewhat radical, suggestion:
What if we replaced AsynchronousCatchUpRunnerState by some central authority that can be used to queue catchups independently from their implementation?

final readonly class CatchUpQueue {

    public function __construct(
        private ContentRepositoryId $contentRepositoryId,
        private FrontendInterface $catchUpLock,
    ) {}

    public function queueCatchUp(ProjectionInterface $projection, \Closure $catchUpTrigger): void {
        // here goes your logic from SubprocessProjectionCatchUpTrigger and  AsynchronousCatchUpRunnerState
    }

    public function releaseCatchUpLock(string $projectionClassName): void {
        // to be called from the part that invokes ContentRepository::catchUp()
    }
}

I think that the change is smaller than it might first appear (since you already solved all of the logical issues) but it would allow us to use this for all implementations (for testing context we could consider replacing the cache with a NullBackend).

E.g. in SubprocessProjectionCatchUpTrigger::triggerCatchUp():

$catchUpQueue = $this->catchUpQueueFactory->build($contentRepositoryId);
foreach ($projections as $projection) {
    $catchUpQueue->queueCatchUp($projection, $this->startCatchUp(...));
}

Sorry for the forth and back, but I think we're getting there and I'm happy to help or take over!

bwaidelich · 2023-11-18T12:33:04Z

The example code above is not correct yet because it would block catchups in the foreach, so maybe it has to be adjusted to:

final readonly class CatchUpQueue {
     // ...
    public function queueCatchUp(array $projections, \Closure $catchUpTrigger): void {
        // keep looping over $projections until there are no more running remaining
    }
}

kitsunet · 2023-11-18T16:53:27Z

Oh, that's a very cool idea, and I think this all makes sense, I'll see how far I get tomorrow and then we can see if you want to take over but at least I get your idea and know this part of the code already so it makes sense for me to go on.

kitsunet · 2023-11-26T20:00:07Z

This is a though nut, I thought it would be nicer to wrap the debouncer - as I named it now - around the ProjectionCatchUpTriggerInterface also implementing the interface, this is nice as it just needs slight adjustments in \Neos\ContentRepositoryRegistry\ContentRepositoryRegistry::buildProjectionCatchUpTrigger, this seemed cleanest, but regardless of how we do it, we somehow need to "unlock" after a catchup is finished even in async. The clean way IMHO would be to use a CatchUpHook that triggers on onAfterCatchUp BUT that gets no arguments so I neither know which repository nor which projection was caught up :sadpanda: (I figured the repository is known to the factory creating the hook, so that is fine)
I guess I will extend the interface but hat will make this a breaking change if someone implemented the interface...

kitsunet · 2023-11-26T21:06:15Z

No, this also seems like a bad idea, because the hooks are per projection and I can't know from the outside which projection has said hook active. It feels unwise to have a global debouncer but a per projection (and repository) unlock of it (via the hook).

SignalSlot might be an option but then it becomes messy to get hold of the debouncer to unlock the state.

kitsunet · 2023-11-27T12:26:16Z

I guess I would also like a rubber duck here :)

bwaidelich · 2023-11-27T13:46:50Z

Using the CatchUpHook seems like a nice – although I did not fully understand why this is needed if we have a custom CLI handler to trigger the catch up.

I guess I will extend the interface but hat will make this a breaking change if someone implemented the interface...

I think that's fine. It would only break for users that actually created a custom hook implementation – and it's really easy to fix

SignalSlot might be an option

I would like the implementation to be independent from Flow as much as possible because something like this will be needed in other places, too..

I guess I would also like a rubber duck here :)

I'm up for it – tomorrow?

Will use caches to try avoiding to even start async catchups if for that projection one is still running. Also adds a simple single slot queue with incremental backup to ensure a requested catchup definitely happens.

…UpTrigger/SubprocessProjectionCatchUpTrigger.php Co-authored-by: Bastian Waidelich <[email protected]>

bwaidelich

@kitsunet This looks awesome already, thanks for holding on!!

Beware: Most if not all of my inline comments are rather nitpicky – that's because I couldn't find any bigger issues ;)

Except for a two potential ones:

1. Global Queue concept

With the CR registry we tried to avoid our "global service" thinking, but now all CRs would share the same CatchUpDeduplicationQueue, conceptually, i.e. sharing the same cache backend (my point: they could internally, but now they have to)
That might not be an issue in practice, but if we were to add a factory like

class CatchUpDeduplicationQueueFactory {

   // ...

   public function build(ContentRepositoryId $contentRepositoryId): CatchUpDeduplicationQueue
   {
        // ...
    }
}

And with that, maybe, maybe it makes sense to take the one extra step and do introduce an interface for it..

To be honest: While writing it down, I'm not sure what issue there could be.. Maybe we can just postpone this part

2. Tests

Annoying one, but since this is a crucial part for the system to work, I would suggest we add some tests.
We could at least add a functional test, creating two instances and see how they interact?
But the best would be a test that covers this with multiple threads. I made some great experience with paratestphp/paratest (we use that for the consistency tests for neos/eventstore-doctrineadapter for example).

I'm happy to help with this, and we can – of course – add those in a separate PR as well (to get this one in asap)

Neos.ContentRepository.Core/Classes/Projection/Projections.php

Neos.ContentRepositoryRegistry/Classes/Command/SubprocessProjectionCatchUpCommandController.php

bwaidelich · 2023-11-29T10:11:14Z

Neos.ContentRepositoryRegistry/Classes/Service/CatchUpDeduplicationQueue.php

+    {
+        $queuedProjections = $this->triggerCatchUpAndReturnQueued($projections);
+        $attempts = 0;
+        /** @phpstan-ignore-next-line */


Just out of curiosity: What was PHPStan complaining about?

the new Projections::emptyis in an internal class if you have suggestions I am happy, we could make Projections not internal or I could not use it, but it seems very convenient.

Neos.ContentRepositoryRegistry/Classes/Service/CatchUpDeduplicationQueue.php

bwaidelich · 2023-12-04T12:09:52Z

A few more potential issues/considerations (in addition to the potential race condition mentioned above):

Currently the queue is responsible for multiple subscribers which makes the code a bit hard to follow. One Queue per subscriber might simplify the implementation a lot (and provide more flexibility for future features like projection priorities etc)
ContentRepository::catchUpProjection() currently ignores the lock (by definition because it is meant as the low-level trigger). That could be error prone if people would use this instead of the coordinated catchUp. Maybe we can turn things around to only allow for queueing the catch up via CR API and invoke the actual catchup via some internal API
The lock is currently not released upon exceptions AFAIS

kitsunet · 2023-12-04T13:20:26Z

* Currently the queue is responsible for multiple subscribers which makes the code a bit hard to follow. One Queue per subscriber might simplify the implementation a lot (and provide more flexibility for future features like projection priorities etc)

I should have tried this as I thought the same, but I think that we might end up in a concurrent lock situation if we do not tackle the whole package at once (as then you end up in multiple while loops waiting for catchups to finish)

* `ContentRepository::catchUpProjection()` currently ignores the lock (by definition because it is meant as the low-level trigger). That could be error prone if people would use this instead of the coordinated catchUp. Maybe we can turn things around to only allow for queueing the catch up via CR API and invoke the actual catchup via some internal API

We could check if the lock is acquired in there and otherwise throw an exception (or just do nothing) ?

* The lock is currently not released upon exceptions AFAIS

Right that needs to be fixed for sure!

Using symfony/lock ensures atomic locking, which should really prevent duplication even under load.

kitsunet · 2023-12-06T08:38:24Z

Step one, replacing cache with lock, the overall logic is the same still.

kitsunet · 2023-12-20T17:33:27Z

We now have two separate lock steps, a "run" lock around the catchUp of a specific projection, preventing that same projection catchUp to be run multiple times in parallel, and the deduplication "queue" lock that will check if a catchUp is already running and create a queue lock to prevent multiple processes to wait for another catchup run and all having to potentially spawn background processes. This second queue lock is not atomic as it depends on the "run" lock. This should be fine, it's there to prevent runaway spawning of parallel subrequests in busy installation and for that it should work fine with it's randomized back off.

mhsdesign · 2024-02-22T11:38:03Z

Neos.ContentRepositoryRegistry/Tests/Unit/Service/CatchUpDeduplicationQueueTest.php

+/**
+ * This is not a regular unit test, it won't run with the rest of the testsuite.
+ * the following two commands after another would run the parallel tests and then the validation of results:
+ * requires "brianium/paratest": "^6.11"


i see because the latest para test latest requires phpunit 10 but flow blocks that.

mhsdesign · 2024-12-17T08:18:57Z

I think this is now superseded by #5321

github-actions bot added Task 9.0 labels Nov 14, 2023

kitsunet requested a review from bwaidelich November 14, 2023 21:25

bwaidelich reviewed Nov 15, 2023

View reviewed changes

kitsunet marked this pull request as ready for review November 15, 2023 17:52

bwaidelich reviewed Nov 17, 2023

View reviewed changes

Neos.ContentRepositoryRegistry/Classes/Command/SubprocessProjectionCatchUpCommandController.php Outdated Show resolved Hide resolved

Neos.ContentRepositoryRegistry/Classes/Service/AsynchronousCatchUpRunnerState.php Outdated Show resolved Hide resolved

bwaidelich reviewed Nov 18, 2023

View reviewed changes

kitsunet and others added 3 commits November 28, 2023 20:32

TASK: Prevent multiple catchup runs

f52910f

Will use caches to try avoiding to even start async catchups if for that projection one is still running. Also adds a simple single slot queue with incremental backup to ensure a requested catchup definitely happens.

Update Neos.ContentRepositoryRegistry/Classes/Factory/ProjectionCatch…

5843e5e

…UpTrigger/SubprocessProjectionCatchUpTrigger.php Co-authored-by: Bastian Waidelich <[email protected]>

TASK: Code cleanup and refactoring

1a7990e

kitsunet force-pushed the task/prevent-duplicate-catchup-runs branch 2 times, most recently from 060e8ac to 242b5f6 Compare November 29, 2023 09:48

Rewrite to central deduplication queue

387f409

kitsunet force-pushed the task/prevent-duplicate-catchup-runs branch from 242b5f6 to 387f409 Compare November 29, 2023 10:19

bwaidelich reviewed Nov 29, 2023

View reviewed changes

kitsunet added 2 commits November 29, 2023 14:29

Fix style issues

1f33d3c

Stop waiting for queued catchups after 20 seconds

5e31584

Implement CatchUpDeduplication with locks

5299fcc

Using symfony/lock ensures atomic locking, which should really prevent duplication even under load.

Split catchUp queue and run lock

20ccab1

Fix style issues

e731e31

mhsdesign reviewed Feb 22, 2024

View reviewed changes

bwaidelich mentioned this pull request Apr 19, 2024

WIP: FEATURE: Rework CR CatchUp mechanism #4988

Closed

4 tasks

mhsdesign mentioned this pull request May 16, 2024

META: Current PR Dependency graph #5057

Closed

10 tasks

mhsdesign mentioned this pull request Oct 13, 2024

Split-up "Rework CR CatchUp mechanism" #5285

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TASK: Prevent multiple catchup runs #4751

TASK: Prevent multiple catchup runs #4751

kitsunet commented Nov 14, 2023

bwaidelich left a comment

bwaidelich left a comment

bwaidelich left a comment •

edited

Loading

bwaidelich commented Nov 18, 2023

kitsunet commented Nov 18, 2023

kitsunet commented Nov 26, 2023 •

edited

Loading

kitsunet commented Nov 26, 2023 •

edited

Loading

kitsunet commented Nov 27, 2023

bwaidelich commented Nov 27, 2023 •

edited

Loading

bwaidelich left a comment •

edited

Loading

bwaidelich Nov 29, 2023

kitsunet Nov 29, 2023

bwaidelich commented Dec 4, 2023

kitsunet commented Dec 4, 2023

kitsunet commented Dec 6, 2023

kitsunet commented Dec 20, 2023 •

edited

Loading

mhsdesign Feb 22, 2024

mhsdesign commented Dec 17, 2024

TASK: Prevent multiple catchup runs #4751

Are you sure you want to change the base?

TASK: Prevent multiple catchup runs #4751

Conversation

kitsunet commented Nov 14, 2023

bwaidelich left a comment

Choose a reason for hiding this comment

bwaidelich left a comment

Choose a reason for hiding this comment

bwaidelich left a comment • edited Loading

Choose a reason for hiding this comment

bwaidelich commented Nov 18, 2023

kitsunet commented Nov 18, 2023

kitsunet commented Nov 26, 2023 • edited Loading

kitsunet commented Nov 26, 2023 • edited Loading

kitsunet commented Nov 27, 2023

bwaidelich commented Nov 27, 2023 • edited Loading

bwaidelich left a comment • edited Loading

Choose a reason for hiding this comment

1. Global Queue concept

2. Tests

bwaidelich Nov 29, 2023

Choose a reason for hiding this comment

kitsunet Nov 29, 2023

Choose a reason for hiding this comment

bwaidelich commented Dec 4, 2023

kitsunet commented Dec 4, 2023

kitsunet commented Dec 6, 2023

kitsunet commented Dec 20, 2023 • edited Loading

mhsdesign Feb 22, 2024

Choose a reason for hiding this comment

mhsdesign commented Dec 17, 2024

bwaidelich left a comment •

edited

Loading

kitsunet commented Nov 26, 2023 •

edited

Loading

kitsunet commented Nov 26, 2023 •

edited

Loading

bwaidelich commented Nov 27, 2023 •

edited

Loading

bwaidelich left a comment •

edited

Loading

kitsunet commented Dec 20, 2023 •

edited

Loading