Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: make tethered node ids deterministic #4313

Closed
wants to merge 3 commits into from

Conversation

mhsdesign
Copy link
Member

@mhsdesign mhsdesign commented Jun 6, 2023

This was @bwaidelich idea as he saw me fooling around with pre-filing them randomly - you dont need to prefil random ids if you can determine them.

The logic was copied from Neos 8.3's Neos\ContentRepository\Utility::buildAutoCreatedChildNodeIdentifier

/**
* Generate a stable identifier for auto-created child nodes
*
* This is needed if multiple node variants are created through "createNode" with different dimension values. If
* child nodes with the same path and different identifiers exist, bad things can happen.
*
* @param string $childNodeName
* @param string $identifier Identifier of the node where the child node should be created
* @return string The generated UUID like identifier
*/
public static function buildAutoCreatedChildNodeIdentifier($childNodeName, $identifier)
{
$hex = md5($identifier . '-' . $childNodeName);
return substr($hex, 0, 8) . '-' . substr($hex, 8, 4) . '-' . substr($hex, 12, 4) . '-' . substr($hex, 16, 4) . '-' . substr($hex, 20, 12);
}

Related: #4343
Related: #4375

Comment on lines 205 to 207
// Write the auto-created descendant node aggregate ids back to the command;
// so that when rebasing the command, it stays fully deterministic.
$command = $command->withTetheredDescendantNodeAggregateIds($descendantNodeAggregateIds);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can remove this step now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.. yes, I think so

);

// handle multiple levels of autocreated child nodes
foreach ($childNodeType->tetheredNodeAggregateIds($childNodeAggregateId)->getNodeAggregateIds() as $nestedChildNodeName => $nodeAggregateId) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

howdy there, my name is re cursion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

howdy there, I think there should be some kind of endless-recursion-check :)
I mean.. If you have something like

'Some:NodeType':
  children:
    'foo':
      type: 'Some:NodeType'

it probably won't work anyways. But maybe we should detect those cases anyways to provide a better error message than "memory limit exceeded..."

@mhsdesign mhsdesign changed the title Task: make tethered node ids deterministic 9.0 Task: make tethered node ids deterministic Jun 6, 2023
@crydotsnake crydotsnake changed the title 9.0 Task: make tethered node ids deterministic 9.0 TASK: make tethered node ids deterministic Jun 10, 2023
@bwaidelich bwaidelich changed the title 9.0 TASK: make tethered node ids deterministic BUGFIX: make tethered node ids deterministic Jun 13, 2023
@github-actions github-actions bot added the Bug label Jun 13, 2023
Copy link
Member

@bwaidelich bwaidelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks.
Just a couple of mini nitpicks and some comment re endless recursions

@@ -58,6 +58,9 @@ final class CreateNodeAggregateWithNode implements CommandInterface
* using this assignment registry.
* Since tethered child nodes may have tethered child nodes themselves,
* this registry is indexed using relative node paths to the node to create in the first place.
*
* Specifying them is optional, and otherwise they are calculated deterministic based on the $nodeAggregateId and the tethered nodeName.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition! Just a mini nitpick

Suggested change
* Specifying them is optional, and otherwise they are calculated deterministic based on the $nodeAggregateId and the tethered nodeName.
* Specifying them is optional, and otherwise they are calculated deterministically based on the $nodeAggregateId and the tethered nodeName.

Comment on lines 205 to 207
// Write the auto-created descendant node aggregate ids back to the command;
// so that when rebasing the command, it stays fully deterministic.
$command = $command->withTetheredDescendantNodeAggregateIds($descendantNodeAggregateIds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.. yes, I think so

@@ -490,11 +493,34 @@ public function hasAutoCreatedChildNode(NodeName $nodeName): bool
*/
public function getTypeOfAutoCreatedChildNode(NodeName $nodeName): ?NodeType
{
$this->initialize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha actually not the only missing 😂
#4333

will revert this and must be fixed correctly in another pr

);

// handle multiple levels of autocreated child nodes
foreach ($childNodeType->tetheredNodeAggregateIds($childNodeAggregateId)->getNodeAggregateIds() as $nestedChildNodeName => $nodeAggregateId) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

howdy there, I think there should be some kind of endless-recursion-check :)
I mean.. If you have something like

'Some:NodeType':
  children:
    'foo':
      type: 'Some:NodeType'

it probably won't work anyways. But maybe we should detect those cases anyways to provide a better error message than "memory limit exceeded..."

@mhsdesign
Copy link
Member Author

should we then remove (as a follow up) tetheredDescendantNodeAggregateIds?

$hex = md5($parentNodeAggregateId->value . '-' . $tetheredNodeName->value);
return new self(
substr($hex, 0, 8) . '-' . substr($hex, 8, 4) . '-' . substr($hex, 12, 4) . '-' . substr($hex, 16, 4) . '-' . substr($hex, 20, 12)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to discuss: does this have to be a UUID pattern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to follow up on this:
node ids used to be UUIDs and that is the reason for this algorithm.
But now they can be anything basically (as long as they match the regex /^([a-z0-9\-]{1,64})$/).

I would suggest to just do $parentNodeAggregateId->value . '-' . $tetheredNodeName->value really (with some deterministic changes to satisfy the regex)

@@ -50,6 +50,23 @@ public static function fromString(string $value): self
return new self($value);
}

/**
* Calculates a deterministic nodeAggregateId.
*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used for new nodes, but previously created tethered node ids were random

comment

* NodeAggregateId::fromParentNodeAggregateIdAndNodeName($parentNodeAggregateId, NodeName::fromString('main'))
*
*/
public static function fromParentNodeAggregateIdAndNodeName(NodeAggregateId $parentNodeAggregateId, NodeName $tetheredNodeName): self
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to NodeName

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this should be moved because it contains a concept that does not belong to the Node Aggregate ID itself.. It's not too bad though IMO and I'm not sure if NodeName is a better place.. @skurfuerst do you have an idea where to put this (and please don't say "utility class" ;))

Copy link
Member

@bwaidelich bwaidelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments after discussing this in our weekly.

And some more thoughts:

tetheredDescendantNodeAggregateIds should be removed from the CreateNodeAggregateWithNode command, but this has some implications and can be done in a separate changeset (see #4375)

In the long(!) run we would like to get rid of Node::nodeName and all the path related complexity. As a result, a tethered node would not know its own name. To retrieve a tethered node we could use ContentSubgraphInterface::findChildNodeConnectedThroughEdgeName() like today.

*
* Used for determining the nodeAggregateIds of tethered nodes.
*
* e.g. in case you want to get the main content collection node aggregate id:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* e.g. in case you want to get the main content collection node aggregate id:
* e.g. in case you want to get the main content collection node aggregate id:
*
* NOTE: Previously ids of tethered nodes were _not_ deterministic, so this method should not be used to look up tethered nodes
* Instead {@see \Neos\ContentRepository\Core\Projection\ContentGraph\ContentSubgraphInterface::findChildNodeConnectedThroughEdgeName()} can be used

* NodeAggregateId::fromParentNodeAggregateIdAndNodeName($parentNodeAggregateId, NodeName::fromString('main'))
*
*/
public static function fromParentNodeAggregateIdAndNodeName(NodeAggregateId $parentNodeAggregateId, NodeName $tetheredNodeName): self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this should be moved because it contains a concept that does not belong to the Node Aggregate ID itself.. It's not too bad though IMO and I'm not sure if NodeName is a better place.. @skurfuerst do you have an idea where to put this (and please don't say "utility class" ;))

$hex = md5($parentNodeAggregateId->value . '-' . $tetheredNodeName->value);
return new self(
substr($hex, 0, 8) . '-' . substr($hex, 8, 4) . '-' . substr($hex, 12, 4) . '-' . substr($hex, 16, 4) . '-' . substr($hex, 20, 12)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to follow up on this:
node ids used to be UUIDs and that is the reason for this algorithm.
But now they can be anything basically (as long as they match the regex /^([a-z0-9\-]{1,64})$/).

I would suggest to just do $parentNodeAggregateId->value . '-' . $tetheredNodeName->value really (with some deterministic changes to satisfy the regex)

@bwaidelich
Copy link
Member

@mhsdesign I'm going through open 9.0 PRs right now and was wondering: Are you still working on this one?

@mhsdesign
Copy link
Member Author

Not actively since June ^^ as you had some legit points, they need to be implemented and #4313 (comment) also needs to be discussed (its currently a bit over my head ^^)

So feel free to continue my work - otherwise ill pick it up in half a month or something ;)

@bwaidelich
Copy link
Member

@mhsdesign half a month is over now :)
just kidding, but we want to discuss this topic on Friday – would be cool if you could join us <3

…eIdByParent($parentId)`

instead of `NodeAggregateId::fromParentNodeAggregateIdAndNodeName`
@mhsdesign
Copy link
Member Author

mhsdesign commented Sep 1, 2023

Important points of our (extensive ^^) discussions from the last weeklies:

A future without node.nodeName? -> no!

... at least not by leveraging this algorithm ... (which would be possible at first)

The deterministic node ids will be mostly a good to have perk but we wont allow it to replace the concept of the nodeNames for tethered nodes, as relying on the algorithm for the id for finding the childNode of a certain name (main) will give us horrible trouble once we change the algorithm.

To better understand what we thought let me demonstrate:

We have the parent node lady-eleonode with the tethered childNode sir-david-nodenborough connected through the edge main.

Fetching the childNode of name main is possible via the subgraph:

$childNode = $subgraph->findChildNodeConnectedThroughEdgeName($parent, NodeName::fromString("main"));

But our - now discarded - idea was that at one point, we could remove the nodeName from the node model and altogether from the database and only leverage the algorithm to find the node of name main.

$childNodeIdGuessedByAlgorithm = NodeName::fromString('main')->tetheredNodeAggregateIdByParent($parentNode->nodeAggregateId);
$childNode = $subgraph->findNodeById($childNodeIdGuessedByAlgorithm);

The code will obviously only work, if the childNode doesnt have the id sir-david-nodenborough but something like a hash of "lady-eleonode" + "main". This gives us the first problem: What about node structures which have been migrated or were created before the deterministic node ids? Also another point is: what if the nodeName main changes (due to some structure adjustment or nodeMigration) - now the id must be recalculated too.

All in all we decided against this approach as this would make the algorithm too important and the cr structure too fragile.

Further points ...

  • To guarantee a successful replay - even if we change the algorithm - we should still save the ids in the events.

  • Remove the tetheredDescendantNodeAggregateIds from the constructor of CreateNodeAggregateWithNode but still allow setting it as less important option with a with method. See issue -> Remove tetheredDescendantNodeAggregateIds from CreateNodeAggregateWithNode constructor #4375

  • Adjust the algorithm to something we prefer more like sha1 or even simply parentId + name?

@skurfuerst
Copy link
Member

Thanks so much ❤️❤️

@mhsdesign
Copy link
Member Author

Hmm upsi - i just thought of a case were these deterministic id's wont be fun:

What should happen when the tethered node "main" is renamed to "foo"? The algorithm will encode that the name is still "main" - but what if we actually now want to later reintroduce a "main" childNode? The "correct" id is already in use now :/

@bwaidelich
Copy link
Member

Good point. What happens today in that case?

@kitsunet
Copy link
Member

kitsunet commented Sep 2, 2023

I guess strictly speaking it's not a use case we ever cared about? But the repair command at least should reassign new ids based on the new name AFAIK? So currently it could be solved that way.

@bwaidelich
Copy link
Member

the repair command at least should reassign new ids based on the new name AFAIK?

That would defeat the purpose of allowing custom ids to be specified via command.

@mhsdesign
Copy link
Member Author

hmm maybe keeping it totally random then is still better than trying to bring to much structure into the ids?

@mhsdesign
Copy link
Member Author

because of the fact that a migration to rename auto created nodes will give us trouble with the node id: #4313 (comment)

we will close this pr and keep the node ids random to avoid any conflicts.

but to make things like node templates easier, we want a “helper“ to create all random node ids before the command is issued

@mhsdesign mhsdesign closed this Sep 5, 2023
mhsdesign added a commit to Flowpack/Flowpack.NodeTemplates that referenced this pull request Oct 18, 2023
… tethered node ids

The pr neos/neos-development-collection#4313 was declined and tethered node ids cannot be calculated deterministically but must be generated and specified in the command beforehand.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants