Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

!!! FEATURE: Subscription Engine #5321

Merged
merged 161 commits into from
Dec 16, 2024
Merged

Conversation

bwaidelich
Copy link
Member

@bwaidelich bwaidelich commented Oct 24, 2024

Related: #4746
Resolves: #4152
Resolves: #4908

migration to migrate the checkpoints to subscribers

./flow migrateevents:migrateCheckpointsToSubscriptions

TODO

  • SubscriptionEngine::reset() (i.e. projection replay)
  • Tests
    • spying testing catchup is correctly applied for example
    • parallel tests to ensure subscription locking works
  • Fine tuning
  • inline docs
    • subscription status docs
  • Credit patchlevel (are inline doc comments for all classes underneath Neos\ContentRepository\Core\Subscription enough?)
  • Optional: Allow to target specific groups/subscriptions via CLI commands (the SubscriptionEngine already supports this)
  • Postgres support for the engine -> remove platform options, simple string fields.
  • use DateTimeImmutable::ATOM for date formatting instead? But it clashes with datetime_immutable (use Types::DATETIMETZ_IMMUTABLE in schema) -> not possible as DATETIMETZ is not supported by mysql
  • use Status->value instead of name field!!!
  • wrap projection->setUp in transaction to ensure errors (like in a migration) will be fully rolled back -> Cannot be rolled back
  • discuss if the setup code is multi thread safe? Because it doesnt use a transaction? What if there is a catchup ongoing?
    • discoverNewSubscriptions doesnt have a transaction?
  • ensure the NEW subscriptions are never catched up! (dont reset NEW subscriptions, and dont reset detached because that fails)
  • todo will the CatchupHadErrors exception disturb the codeflow from dbal repos and doctrine orm things that are rolled back?
  • refine logging exceptions to only first throwable
    • context aware log files? Development ...
    • todo add sequence number and event type to catchup error?
    • overhaul getIteratorAggregate for errors collection? Ensure errors are logged correctly by the throwable storage!
  • if replay fails with error denote that its only a partially made replay because of batching
  • Document deprecations / changes:
    • Interface of catchup hook different
    • projectionReplayAll deprecated (use subscription commands instead)
    • content repository does not contain setup and status, use content repository maintainer instead
    • catchup --until for debugging was removed
  • questions
    • should a detachment of a subscriber be noticed in the ProcessResult as error? Should the cr throw?
    • TODO pass the error subscription status to onAfterCatchUp, so that in case of an error it can be prevented that mails f.x. will be sent?
    • introduce custom exception if catchup failed in the cr, maybe use the CatchUpFailed ... but rather rename that and also SubscriptionEngineAlreadyProcessingException to CatchUpError and then have a ErrorDuringCatchUp?
    • rename booting to booted? -> we agree that the ING is not a state, as it needs to be rather was.. but we dont have a better name
    • rename subscription status to state?
    • Make subscription now api and return it in the status collection as its not mutable?
    • do we need to prepare the option to deactivate catchup hooks at runtime?

Related: #4746
@github-actions github-actions bot added the 9.0 label Oct 24, 2024
@bwaidelich bwaidelich changed the title WIP CatchUp Subscription Engine Oct 24, 2024
@bwaidelich bwaidelich changed the title CatchUp Subscription Engine WIP: CatchUp Subscription Engine Oct 24, 2024
@mhsdesign
Copy link
Member

I had a quick look and id love to help with the wiring and sticking it all together :) We should probably really get a simple draft running of the catchups just getting their projection state first :) As that will require some work already

mhsdesign added a commit to mhsdesign/neos-development-collection that referenced this pull request Oct 27, 2024
…n they are registered for

A catchup doesn't have access to the full content repository, as it would allow full recursion via handle and accessing other projections
state is not safe as the other projection might not be behind - the order is undefined.

This will make it possible to catchup projections from outside of the cr instance as proposed here: neos#5321
mhsdesign added a commit to mhsdesign/neos-development-collection that referenced this pull request Nov 2, 2024
…n they are registered for

A catchup doesn't have access to the full content repository, as it would allow full recursion via handle and accessing other projections
state is not safe as the other projection might not be behind - the order is undefined.

This will make it possible to catchup projections from outside of the cr instance as proposed here: neos#5321
@mhsdesign
Copy link
Member

Okay i thought further about our little content graph projection vs projection states vs event handlers dilemma and i think the solution is not necessary exposing just the $projection on ProjectionEventHandler ... as that would also undo the explicit graph projection wiring from #5272 but instead pass a "bigger" object than "just" the Subscribers into the content repository factory.

in my eyes this object, being build by the content repository registry in some way, must reflect

  • what is guaranteed to be the ContentGraphProjectionInterface and its state ContentGraphReadModelInterface, not part of some collection but distinct
  • what are the additional projection states
  • what are all the subscribers (with their catchup-hooks) that are just directly passed to the subscription engine without doing anything further with it. (e.g. no public accessors on the ProjectionEventHandler for anything)

that could look like:

final readonly class ContentRepositoryGraphProjectionAndSubscribers
{
    public function __construct(
        public ContentGraphProjectionInterface $contentGraphProjection,
        public Subscribers $subscribers, // must contain a subscriber for the $contentGraphProjection
        public ProjectionStates $additionalProjectionStates, // must not contain the $contentGraphProjection state
    ) {
    }
}

or maybe a little more explicit so the factories dont have to deal with all the logic and we have control over the subscription ids:

final readonly class ProjectionsAndCatchupHooksBetterVersionZwo
{
    public function __construct(
        public ContentGraphProjectionInterface $contentGraphProjection,
        private Projections $additionalProjections,
        private Subscribers $additionalSubscriber,
        private array $catchUpHooksByProjectionClass
    ) {
    }

    public function getSubscribers(): Subscribers
    {
        $subscribers = iterator_to_array($this->additionalSubscriber);

        $subscribers[] = new Subscriber(
            SubscriptionId::fromString('contentGraphProjection'),
            SubscriptionGroup::fromString('default'),
            RunMode::FROM_BEGINNING,
            new ProjectionEventHandler(
                $this->contentGraphProjection,
                $this->getCatchUpHooksForProjectionClass(ContentGraphProjectionInterface::class),
            ),
        );
        
        foreach ($this->additionalProjections as $projection) {
            $subscribers[] = new Subscriber(
                SubscriptionId::fromString(substr(strrchr($projection::class, '\\'), 1)),
                SubscriptionGroup::fromString('default'),
                RunMode::FROM_BEGINNING,
                new ProjectionEventHandler(
                    $projection,
                    $this->getCatchUpHooksForProjectionClass($projection::class),
                ),
            );
        }
        
        return Subscribers::fromArray($subscribers);
    }
    
    public function getAdditionalProjectionStates(): ProjectionStates
    {
        return ProjectionStates::fromArray(array_map(
            fn ($projection) => $projection->getState(),
            iterator_to_array($this->additionalProjections)
        ));
    }

    private function getCatchUpHooksForProjectionClass(string $projectionClass): ?CatchUpHookInterface
    {
        return $this->catchUpHooksByProjectionClass[$projectionClass] ?? null;
    }
}

but for things that will belong to the future ProjectionService like, replayProjection, replayAllProjections, resetAllProjections we might still need to expose all projections here, unless the subscription engine will lear that itself: $this->subscriptionEngine->reset()

@bwaidelich
Copy link
Member Author

@mhsdesign thanks for your input!
Ands in a class name always make me suspicious..

That's my current draft of the ContentRepositoryFactory constructor:

public function __construct(
    private readonly ContentRepositoryId $contentRepositoryId,
    EventStoreInterface $eventStore,
    NodeTypeManager $nodeTypeManager,
    ContentDimensionSourceInterface $contentDimensionSource,
    Serializer $propertySerializer,
    private readonly UserIdProviderInterface $userIdProvider,
    private readonly ClockInterface $clock,
    SubscriptionStoreInterface $subscriptionStore,
    ContentGraphProjectionFactoryInterface $contentGraphProjectionFactory,
    CatchUpHookFactoryInterface $contentGraphCatchUpHookFactory,
    private readonly ContentRepositorySubscribersFactoryInterface $additionalSubscribersFactory,
) {
// ...
}

@mhsdesign
Copy link
Member

As discussed that looks good ❤️ my idea had a flaw because i assumed the projection instance could be build by the registry which it CANNOT because we need factory dependencies.... and the thing with iterating over the event handlers to fetch their state via getState or something is weird but okay in as that projectionState is now a little tunnel through space as well :) So definitely okay to do that little quirk.

neos-bot pushed a commit to neos/contentrepository-core that referenced this pull request Nov 4, 2024
…n they are registered for

A catchup doesn't have access to the full content repository, as it would allow full recursion via handle and accessing other projections
state is not safe as the other projection might not be behind - the order is undefined.

This will make it possible to catchup projections from outside of the cr instance as proposed here: neos/neos-development-collection#5321
neos-bot pushed a commit to neos/contentrepositoryregistry that referenced this pull request Nov 4, 2024
…n they are registered for

A catchup doesn't have access to the full content repository, as it would allow full recursion via handle and accessing other projections
state is not safe as the other projection might not be behind - the order is undefined.

This will make it possible to catchup projections from outside of the cr instance as proposed here: neos/neos-development-collection#5321
neos-bot pushed a commit to neos/neos that referenced this pull request Nov 4, 2024
…n they are registered for

A catchup doesn't have access to the full content repository, as it would allow full recursion via handle and accessing other projections
state is not safe as the other projection might not be behind - the order is undefined.

This will make it possible to catchup projections from outside of the cr instance as proposed here: neos/neos-development-collection#5321
# Conflicts:
#	Neos.ContentRepository.Core/Classes/ContentRepository.php
#	Neos.ContentRepository.Core/Classes/Factory/ContentRepositoryFactory.php
#	Neos.ContentRepositoryRegistry/Classes/ContentRepositoryRegistry.php
#	Neos.ContentRepositoryRegistry/Classes/Service/ProjectionService.php
#	Neos.ContentRepositoryRegistry/Classes/Service/ProjectionServiceFactory.php
#	phpstan-baseline.neon
Copy link
Member

@mhsdesign mhsdesign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kitsunet
Copy link
Member

kitsunet commented Dec 15, 2024

Funny, the failing functional test shows just the exactly once problem I am still concerned about...

I debugged into it locally, as the error output is not quite ideal. So what happens:

We have two projections, one is "nice" and would just work, the other has a "saboteur" that throws right after the event was written to the projection DB.
This error is caught in the sub engine and the subscription set to error with sequence number 0 because the error happened on catching up the first event, BUT note that the projection actually ran a query to it's state for the first event.
Now we check the status and it's correctly in error state, we kill the saboteur ensuring the projection can continue exception free....
But wait, the projection table contains data for the first event while the state says it's on 0, and this leads to a DB constraint error when trying to catchup this projection after removing the error state. tl;dr; this projection is not safe for "not exactly once" delivery. The question is what do we do... Always replay from the start when a subscription was in error state (that seems most safe) or do we just change the test so that the projection doesn't fail when getting the same event twice?

I have given this a good look now and I think we need to solve this question but otherwise seems ready to me.

The concept of reactivation in case of errors was rather experimental. For core projections we would rather recommend a full replay, and its unlikely that other third party projections will profit from this now.
Instead, if deemed necessary the reactivation api can be reintroduced later with little effort.

Also, the case for detached projections is considered an edge-case and a simple replay should suffice.

The change was done to minimalize confusion and simplify the api.
@kitsunet
Copy link
Member

;) Alright that works I guess

@mhsdesign
Copy link
Member

Ah yes thanks i forgot that i had to adjust this test as well. The tests shows what is probably very unlikely. Still as we dont rollback on ERROR now reactivation of a projection is more unlike to succeed than before. Instead - as we want to promote to replay for our core projections on errors, i removed my fancy subscription:reactivate again in 525fcc4

Maybe we will fancy something like this as a debug tool later after all: Replay until X and then apply the next event again and again while debugging why it fails - but that is clearly an internal tool then and we dont need to confuse anyone yet with it ^^

@kitsunet kitsunet merged commit f0c6111 into 9.0 Dec 16, 2024
9 checks passed
@kitsunet
Copy link
Member

🎉 Not quite sure I am in love with all this, but more so than the previous iteration of things.

mhsdesign added a commit to neos/neos-setup that referenced this pull request Dec 16, 2024
@mhsdesign mhsdesign changed the title FEATURE: Subscription Engine !!! FEATURE: Subscription Engine Dec 16, 2024
neos-bot pushed a commit to neos/contentgraph-doctrinedbaladapter that referenced this pull request Dec 16, 2024
…to the subscription store

... which technically only coincidentally uses the same connection and dbal instance

see neos/neos-development-collection#5321 (comment)
neos-bot pushed a commit to neos/contentgraph-postgresqladapter that referenced this pull request Dec 16, 2024
…to the subscription store

... which technically only coincidentally uses the same connection and dbal instance

see neos/neos-development-collection#5321 (comment)
neos-bot pushed a commit to neos/contentrepository-core that referenced this pull request Dec 16, 2024
neos/neos-development-collection#5321 (comment)

> Anyways, I think with the removed retry strategy we should just get rid of the automatic retry altogether right now.
  It's quite unlikely that a retry suddenly works without other changes. So I'd be fully OK if it was only possible to manually retry failed subscriptions for 9.0
neos-bot pushed a commit to neos/contentrepository-core that referenced this pull request Dec 16, 2024
…to the subscription store

... which technically only coincidentally uses the same connection and dbal instance

see neos/neos-development-collection#5321 (comment)
neos-bot pushed a commit to neos/contentrepository-core that referenced this pull request Dec 16, 2024
see neos/neos-development-collection#5321 (comment)

the save-point will only be used for REAL projection errors now and never rolled back if catchup errors occur.

With that change in code the save-points are less important because a real projection error should better be thrown at the start before any statements, and even if some statements were issued and a full rollback is done its unlikely that a reactivateSubscription helps that case.

Instead, to repair projections you should replay
neos-bot pushed a commit to neos/contentrepositoryregistry that referenced this pull request Dec 16, 2024
neos/neos-development-collection#5321 (comment)

> Anyways, I think with the removed retry strategy we should just get rid of the automatic retry altogether right now.
  It's quite unlikely that a retry suddenly works without other changes. So I'd be fully OK if it was only possible to manually retry failed subscriptions for 9.0
neos-bot pushed a commit to neos/contentrepositoryregistry that referenced this pull request Dec 16, 2024
…to the subscription store

... which technically only coincidentally uses the same connection and dbal instance

see neos/neos-development-collection#5321 (comment)
neos-bot pushed a commit to neos/contentrepositoryregistry that referenced this pull request Dec 16, 2024
see neos/neos-development-collection#5321 (comment)

the save-point will only be used for REAL projection errors now and never rolled back if catchup errors occur.

With that change in code the save-points are less important because a real projection error should better be thrown at the start before any statements, and even if some statements were issued and a full rollback is done its unlikely that a reactivateSubscription helps that case.

Instead, to repair projections you should replay
neos-bot pushed a commit to neos/neos that referenced this pull request Dec 16, 2024
…to the subscription store

... which technically only coincidentally uses the same connection and dbal instance

see neos/neos-development-collection#5321 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure exceptions during CatchUpHook are properly handled Consider using a single table for checkpoints
3 participants