Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: Invalidate caches correctly after node move changes have been discarded #4291

Conversation

grebaldi
Copy link
Contributor

@grebaldi grebaldi commented May 24, 2023

NOTE: This PR needs to be tested in combination with its companion PR over at neos-ui: neos/neos-ui#3503

The overall problem

This one's a bit complex and it stretches over two repositories. The basic scenario is:

  1. You move some (document) node(s) to a position below a different parent node (so, just sorting them within their own hierarchical level doesn't count)
  2. You discard those changes
  3. Chaos ensues

neos/neos-ui#3184 describes the issues that happen only partially. During my investigation I found several problems that I need to disect piece by piece.

(I) The UI does not (entirely) recognize that nodes have been moved to a position below a different parent node

The problem

Peek.2023-05-21.15-14.-.3184.what.happens.on.discard.mp4

In the video you can see that after the two document nodes have been moved, the change is not made visible by the usual orange change indicator on the left hand side of the tree nodes. Also, on the attempt to discard the current set of changes, an error occurs that reads:

Call to a member function isRemoved() on null - Check logs for details

Both phenomena are related, because - as it turns out - the UpdateWorkspaceInfo feedback object, that is supposed to inform the UI about pending changes, delivers the wrong node context paths for the nodes that have just been moved (the context paths are still the old ones).

Because the context paths are now out of sync, the UI is unable to associate the pending change with the respective tree node. It then also uses the stale workspace information as payload to the discard command, which leads to the above error at the following line:

https://github.com/neos/neos-ui/blob/5c52e08b8a1effc985911390a7124579c4018c25/Classes/Controller/BackendServiceController.php#L272

How come the context paths are incorrect after the nodes have been moved?

After some investigation I found that the Neos\ContentRepository\Domain\Factory\NodeFactory class memoizes stale data - as opposed to e.g. ContentContext which gets its in-memory cache flushed when nodes were moved:

if (!isset($this->nodes[$internalNodeIdentifier])) {

When the PublishingService is asked for unpublished nodes via getUnpublishedNodes, it receives cache hits within NodeFactory->createFromNodeData for the nodes that have just been moved - thus the old context paths.

EDIT: Not at all true 😅! I investigated further after the functional tests over here failed (hooray for functional tests!). The actual reason is as follows:

The Neos UI API uses a slightly different content context configuration than getUnpublishedNodes. So, NodeFactory actually keeps track of two variants of the moved nodes. The ones that getUnpublishedNodes receives are not the ones that the move operation has been performed on.

The solution

The solution for this is implemented over here: neos/neos-ui#3503

(II) Discarded move changes are not properly reflected in the document tree

The problem

Problem (I) can be circumvented by hard-reloading the UI (after that, the workspace info will be correct again). But, there's still some strangeness going on...

Peek.2023-05-21.15-15.-.3184.what.happens.on.discard.after.reload.mp4

In the video you can see that the tree actually reflects the discarded changes correctly for a brief moment there. It then quickly jumps to a broken state in which the nodes that should be at their original positions just disappear.

This problem persists even if you hard-reload the UI after discarding:

Peek.2023-05-23.17-37.-.3184.what.happens.on.hard.reload.mp4

Now, if you use the reload button of the document tree to manually reload the tree, the nodes reappear:

Peek.2023-05-23.17-38.-.3184.what.happens.on.tree.reload.mp4

(But also: If you hard-reload the UI again, the nodes will once again flash briefly and then disappear)

How does this happen?

In those videos, the parent document that originally contained the moved nodes is focused and open in the guest frame. After discard, it should contain those nodes again. The UI reloads the guest frame, but the document is now rendered with stale node metadata. After the guest frame finished loading, the stale metadata (which still thinks the nodes have been moved elsewhere) overwrites the node data in the UI redux store.

This is why the correct state shows up for a brief moment. It gets overwritten after a short delay when the guest frame is loaded. (Also: The nodes do not disappear if you focus a different document and hard-reload the UI).

Looking at the cache configuration for the node metadata:

https://github.com/neos/neos-ui/blob/5c52e08b8a1effc985911390a7124579c4018c25/Resources/Private/Fusion/Prototypes/Page.fusion#L26-L46

... one should actually assume that the data shouldn't be stale (Neos.Caching.descendantOfTag(documentNode) should do the trick). It turns out though, that the Neos\Neos\Fusion\Cache\ContentCacheFlusher class - in case of discard - will only flush tags related to a node's current workspace. We actually need to have all tags flushed in the base workspace as well to cover the DescendantOf_*-tag of the original parent node.

The solution

I'm not entirely sure about this. I did two things:

  1. I modified the nodeDiscarded signal so that it carries the base workspace of each discarded node

The reason behind this is that the ContentCacheFlusher listens to both the nodeDiscarded and the nodePublished signal with its method registerNodeChange. nodePublished carries the target workspace of the publishing operation, which is accepted as a second argument by registerNodeChange. nodeDiscarded used to not carry such "target workspace" information, so that the ContentCacheFlusher had no chance to actually flush tags for this workspace.

It feels to me like a bit of a stretch on semantics to just add the base workspace to the nodeDiscarded signal in such fashion, but I found it to be the least invasive yet most effective solution.

  1. I made sure that the registerAllTagsToFlushForNodeInWorkspace method of the ContentCacheFlusher actually deals with the given node in the given workspace.

registerAllTagsToFlushForNodeInWorkspace takes a node and a workspace as arguments. It used to just assume that the given node is actually its variant in the given workspace, which seems to be working in a lot of cases, but in the case of discarded node move changes, the method makes false assumptions. It goes up the parent chain of the given node to find all DescendantOf_*-tags that need to be flushed. The given node however has the wrong parent.

I therefore added some code at the start of the method, that replaces the $node variable with the given node's variant in the given workspace, if the respective workspace names differ.

Related Commit(s): f97268a, 184692d

(III) Discarding a node move while having a moved document node open in the guest frame results in an error page

The problem

Peek.2023-05-22.14-30.-.3184.Discard.Page.Move.While.On.Page.mp4

The video shows that when you're currently editing a moved document and then discard the move change, the guest frame reloads and shows a misleading fusion error. This is because the guest frame tries to render a document node that doesn't exist anymore.

A similar situation would be a document that has just been created. If you stay on that document and then discard it, the UI behaves correctly and redirects you to the next-higher document:

Peek.2023-05-23.17-46.-.3184.discard.new.node.mp4

Here, the problem lies within the discardAction of the Neos\Neos\Ui\Controller\BackendServiceController which does not recognize discarded move changes and thus misses to inform the UI that it needs to remove the nodes at their former positions and re-insert them at their original positions.

The solution

The solution for this is implemented over here: neos/neos-ui#3503

All solutions combined

Here's what it looks like when the PRs in neos-ui and neos-development-collection are combined:

Peek.2023-05-23.17-48.-.3184.all.changes.combined.mp4

@grebaldi grebaldi force-pushed the bugfix/ui-3184/proper-cache-invalidation-on-discard branch from 47e4a35 to 184692d Compare May 24, 2023 18:42
@grebaldi grebaldi marked this pull request as ready for review May 24, 2023 19:32
@@ -146,6 +146,20 @@ public function registerNodeChange(NodeInterface $node, Workspace $targetWorkspa
*/
protected function registerAllTagsToFlushForNodeInWorkspace(NodeInterface $node, Workspace $workspace): void
{
// Ensure that we're dealing with the variant of the given node that actually
// lives in the given workspace
if ($node->getWorkspace()->getName() !== $workspace->getName()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grebaldi do you think this could be a performance problem when working with a large number >1000 nodes?
Like publishing a whole workspace and for some reason creating a large number of contexts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question :)

Context creation itself doesn't strike me as very expensive, so I guess that would not be a problem. $workspaceContext->getNodeByIdentifier however will be quite expensive given an n=1000 iteration. It cannot be avoided though (IIRC).

Copy link
Member

@mhsdesign mhsdesign Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fyi i was curious when this Workspace $workspace parameter will actually different.

And i followed a bit the path an it seems its only set when publishing (and then its likely the base workspace of course):

public function emitNodePublished(NodeInterface $node, Workspace $targetWorkspace = null)

The ContextFactory::create seems to be cached but i agree the getNodeByIdentifier in 158 below might be more expensive.
... But then again isnt this cached? Yes. The firstLevelNodeCache actually caches the nodes by identifier ... huh .. only identifier not workspace? So my thesis is that the node returned is actually the same as passed (as that one likely exists in the firstLevelNodeCache.
The only thing to avoid this cache layer is to use the nodeDataRepository directly.

Maybe we should even consider extending the test flushingANodeWithAnAdditionalTargetWorkspaceWillAlsoResolveThatWorkspace to test this and be sure about it. ^^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah i see each Context has its own firstLevelNodeCache, that makes the caching of course workspace ware, which makes sense. And as we create the context just at that very second, the cache is completely empty so indeed we will query the content repository.

Copy link
Member

@mhsdesign mhsdesign Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No wait please ignore the following stuff i think its none sense :D

I dont fully understand this yet, but if we only want to ensure that $node->getWorkspace()->getName() === $workspace->getName()
we can cheat by building our own node:

$workspaceContext = $this->contextFactory->create(
    array_merge(
        $node->getContext()->getProperties(),
        ['workspaceName' => $workspace->getName()]
    )
);
$correctNode = $this->nodeFactory->createFromNodeData($node->getNodeData(), $workspaceContext);

but in case that is not our sole goal but we want to really ask the database, so the if ($node === null) return; will be used, that cheating will of course not work.

@mhsdesign mhsdesign changed the base branch from 7.3 to 8.3 February 11, 2024 18:22
@grebaldi grebaldi force-pushed the bugfix/ui-3184/proper-cache-invalidation-on-discard branch from 184692d to c99cd8b Compare February 15, 2024 11:15
@grebaldi
Copy link
Contributor Author

Rebased on 8.3

grebaldi added a commit to grebaldi/PackageFactory.Guevara that referenced this pull request Feb 15, 2024
Copy link
Member

@kitsunet kitsunet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So expensive but I also don't see another way to do this, thanks!

* @return void
* @Flow\Signal
* @api
*/
public function emitNodeDiscarded(NodeInterface $node)
public function emitNodeDiscarded(NodeInterface $node, ?Workspace $baseWorkspace = null)
Copy link
Member

@mhsdesign mhsdesign Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi due to this this change needs a proxy cache flush as the Neos Neos PublishingService must be rebuild.

Declaration of Neos\Neos\Service\PublishingService::emitNodeDiscarded(Neos\ContentRepository\Domain\Model\NodeInterface $node) must be compatible with Neos\ContentRepository\Domain\Service\PublishingService::emitNodeDiscarded(Neos\ContentRepository\Domain\Model\NodeInterface $node, ?Neos\ContentRepository\Domain\Model\Workspace $baseWorkspace = null)

Copy link
Member

@mhsdesign mhsdesign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About how this depends on the Neos Ui part i wrote a conclusion here neos/neos-ui#3503 (comment) - so its not a problem if a project updates only the ui or neos and not both though to fix all bugs both should be updated for sure.

Tested in a big project as patch and could not find hint about a performance impact.

Copy document into other (932 changes)

Neos 8.3.8

copy and discard
| 25s | 2.1 |
| 13s | 2.0 |
| 19s | 2.1 |

copy and publish
| 23.9s | 7.6 |
| 20.5s | 7.5 |

delete copy and publish
| 9.9s | 9.7 |
| 8.9s | 7.8 |

Neos 8.3.8 + #4291

copy and discard
| 13.4 | 2.2 |
| 23.7 | 2.2 |
| 21.7 | 2.2 |

copy and publish
| 17.5 | 6.7 |
| 23.5 | 8.5 |

delete copy and publish
| 8.3 | 9.0 |
| 10.6 | 9.9 |


to move two document with children to other place - 1043 change - and discard

Neos 8.3.8

| 3.0s | 31.0 |

Neos 8.3.8 + #4291

| 5.3s | 30.5 |
| 2.3 | 15 |
| 3.5 | 28.8 |

@mhsdesign mhsdesign merged commit 4c3568a into neos:8.3 May 14, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants