Allow organizing documents in a tree structure #516

sampaccoud · 2024-12-18T10:48:35Z

Purpose

We want to allow nesting documents in a tree structure.

Proposal

This pull request introduces a hierarchical organization for documents by implementing a tree structure. It integrates the django-treebeard library to manage the tree hierarchy, allowing documents to be nested within parent documents. This enhancement will turn Docs into a knowledge management platform.

Key Changes:

Add the django-treebeard library to dependencies to support tree structures for documents
Update document models to include parent-child relationships.
Modify and optimize the document retrieval logic to take into account the tree structure, ensuring that only the first visible parent document for a user is shown in list views.
Add soft delete to allow trashbin restore (important when deleting complete subtrees)
Improve admin interface and activate treebeard's nested views
Modify the test suites to take into account the tree structure of documents

⚠️ Note: Existing documents are transformed to root nodes in the new tree structure.

AntoLC

That is nice!!

It is a bit hard to review from a frontend side perspective without too much context, but it looks what we need to do our tree structure.
For me, if you put back abilities, we could merge it because except that, it does not seems to impact negatively the frontend; then we will iterate when we will start to integrate the tree on the frontend.

src/backend/core/tests/documents/test_api_documents_list.py

src/backend/core/models.py

src/backend/core/api/serializers.py

AntoLC · 2024-12-19T09:48:08Z

src/backend/demo/management/commands/create_demo.py

            queue.push(
                models.Document(
+                    depth=1,


Is it possible to add some children docs, with different depth as well ?

src/backend/core/api/serializers.py

src/backend/core/tests/documents/test_api_documents_children_list.py

We choose to use Django-treebeard for its quality, performance and stability. Adding tree structure to documents is as simple as inheriting from the MP_Node class.

…uments

The test_api_documents_list file was getting too long. We can extract tests on filters and ordering.

Now that we have a tree structure, we should only include parents of a visible subtree in list results.

A document should inherit the access rights a user has on any of its ancestors.

…cestors

This information is useful for the frontend to display the document tree structure and is cheap to expose.

This endpoint is nested under a document's detail endpoint.

We add a POST method to the existing children endpoint.

Including the content field in the list view is not efficient as we need to query the object storage to retrieve it. We want to display an excerpt of the content on the list view so we should store it in database. We let the frontend compute it and save it for us in the new "excerpt" field because we are not supposed to have access to the content (E2EE feature coming)

…excerpt

virgile-dev · 2025-01-02T13:09:54Z

Linked : #435

user roles were already computed as an annotation on the query for performance as we must look at all the document's ancestors to determine the roles that apply recursively. We can easily expose them as readonly via the serializer.

AntoLC

Except the collation problem, all seems ok.

It exists the extension pg_stat_statements, it could be maybe nice to add it, it would help us to track slow queries. Can be in another PR though.

src/backend/core/tests/documents/test_api_documents_retrieve.py

AntoLC · 2025-01-03T11:50:24Z

src/backend/core/migrations/0014_set_path_on_existing_documents.py

+        migrations.AlterField(
+            model_name='document',
+            name='path',
+            field=models.CharField(max_length=255, unique=True),


During my tests I had a problem during the creation of a doc, about the path unicity, the alphabet that we set uses lowercase values, so it seems to be needed to update the collation accordingly:

Suggested change

field=models.CharField(max_length=255, unique=True),

field=models.CharField(max_length=255, unique=True, db_collation="C"),

You need to have a certain amount of document to see the error, the algo has to reach the lowercase values.
If you add this test, it will showcase the problem:

def test_api_documents_create_authenticated_success_check_path(): """ Check that the path of the document is correctly set when creating a document. """ user = factories.UserFactory() client = APIClient() client.force_login(user) # Create 50 docs for i in range(50): response = client.post( "/api/v1.0/documents/", { "title": f"my document {i}", }, format="json", ) assert response.json() != ['Document with this Path already exists.'] assert response.status_code == 201

Yep, had the same behavior on numerique-gouv/people#560

Warning, the alphabet must be fixed also, because uppercase are "before" lowercase when sorting.

ALPHABET = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

see

@classmethod def get_root_nodes(cls): """:returns: A queryset containing the root nodes in the tree.""" return get_result_class(cls).objects.filter(depth=1).order_by('path')

@sampaccoud I suggest to add a test like

def test_models_documents_tree_alphabet(): """Test the creation of teams with treebeard methods.""" models.Document.load_bulk([ { "data": { "title": f"document-{i}", } } for i in range(len(models.Document.alphabet) * 2) ])

Which will assert the alphabet is good + the collation is properly configured.

src/backend/core/api/viewsets.py

Now that we have introduced a document tree structure, it is not possible to allow deleting documents anymore as it impacts the whole subtree below the deleted document and the consequences are too big. We introduce soft delete in order to give a second though to the document's owner (who is the only one to be allowed to deleted a document). After a document is soft deleted, the owner can still see it in the trashbin (/api/v1.0/documents/?is_deleted=true). After a grace period (30 days be default) the document disappears from the trashbin and can't be restored anymore. Note that even then it is still kept in database. Cleaning the database to erase deleted document after the grace period can be done as a maintenance script.

Only owners can see and restore deleted documents. They can only do it during the grace period before the document is considered hard deleted and hidden from everybody on the API.

Only administrators or owners of a document can move it to a target document for which they are also administrator or owner. We allow different moving modes: - first-child: move the document as the first child of the target - last-child: move the document as the last child of the target - first-sibling: move the document as the first sibling of the target - last-sibling: move the document as the last sibling of the target - left: move the document as sibling ordered just before the target - right: move the document as sibling ordered just after the target The whole subtree below the document that is being moved, moves as well and remains below the document after it is moved.

qbey

I guess the only thing to fix is the collation and the alphabet, otherwise this looks good to me :) I took few ideas of this PR for mine ^^

qbey · 2025-01-07T14:57:44Z

src/backend/core/api/viewsets.py

+        return (
+            serializers.ListDocumentSerializer
+            if self.action == "list"
+            else self.serializer_class


I know this is noop, but since you made a change, don't you want to consider using SerializerPerActionMixin (

docs/src/backend/core/api/viewsets.py

Line 109 in 0279304

class SerializerPerActionMixin:

) or better use the same as in people https://github.com/numerique-gouv/people/blob/42d7d00772baf30ff93dc2b8bfca57ff3e6dccb1/src/backend/core/api/client/viewsets.py#L81 ?

qbey · 2025-01-07T15:00:29Z

src/backend/core/admin.py

    model = models.DocumentAccess
    extra = 0


 @admin.register(models.Document)
-class DocumentAdmin(admin.ModelAdmin):
+class DocumentAdmin(TreeAdmin):


Not mandatory, but as stated in the documentation you might add

form = movenodeform_factory(models.Document)

to add the "tree" representation in the admin list (maybe you did not add it on purpose)

qbey · 2025-01-07T15:05:33Z

src/backend/core/models.py

@@ -358,6 +359,10 @@ class Document(BaseModel):

    _content = None

+    # Tree structure
+    steplen = 7  # nb siblings max: 78,364,164,096 / max depth: 255/7=36
+    node_order_by = None  # Manual ordering


The default for MPNode is [], why do you need to override it?

qbey · 2025-01-08T09:47:10Z

src/backend/core/migrations/0014_set_path_on_existing_documents.py

+        migrations.AlterField(
+            model_name='document',
+            name='path',
+            field=models.CharField(max_length=255, unique=True),


@sampaccoud I suggest to add a test like

def test_models_documents_tree_alphabet(): """Test the creation of teams with treebeard methods.""" models.Document.load_bulk([ { "data": { "title": f"document-{i}", } } for i in range(len(models.Document.alphabet) * 2) ])

Which will assert the alphabet is good + the collation is properly configured.

qbey · 2025-01-08T09:51:36Z

src/backend/core/tests/documents/test_api_document_versions.py

+    if via == USER:
+        models.DocumentAccess.objects.create(
+            document=grand_parent,
+            user=user,
+            role=random.choice(models.RoleChoices.choices)[0],
+        )
+    elif via == TEAM:
+        mock_user_teams.return_value = ["lasuite", "unknown"]
+        models.DocumentAccess.objects.create(
+            document=grand_parent,
+            team="lasuite",
+            role=random.choice(models.RoleChoices.choices)[0],
+        )


I'm not a big fan of "logic" in tests, but while this reduces the number of lines a lot, I guess it's ok ^^

qbey · 2025-01-08T09:51:50Z

src/backend/core/tests/documents/test_api_document_versions.py

+        models.DocumentAccess.objects.create(
+            document=grand_parent,
+            user=user,
+            role=random.choice(models.RoleChoices.choices)[0],


Nit

Suggested change

role=random.choice(models.RoleChoices.choices)[0],

role=random.choice(models.RoleChoices.values),

qbey · 2025-01-08T09:52:04Z

src/backend/core/tests/documents/test_api_document_versions.py

+        models.DocumentAccess.objects.create(
+            document=grand_parent,
+            team="lasuite",
+            role=random.choice(models.RoleChoices.choices)[0],


Nit

Suggested change

role=random.choice(models.RoleChoices.choices)[0],

role=random.choice(models.RoleChoices.values),

sampaccoud requested review from qbey, AntoLC and PanchoutNathan December 18, 2024 10:48

sampaccoud changed the title ~~Add treebeard for document trees~~ Allow organizing documents in a tree structure Dec 18, 2024

AntoLC reviewed Dec 19, 2024

View reviewed changes

src/backend/core/api/serializers.py Show resolved Hide resolved

AntoLC reviewed Dec 19, 2024

View reviewed changes

src/backend/core/tests/documents/test_api_documents_children_list.py Show resolved Hide resolved

sampaccoud added 11 commits December 27, 2024 21:06

✨(backend) add django-treebeard to allow tree structure on documents

a12834c

We choose to use Django-treebeard for its quality, performance and stability. Adding tree structure to documents is as simple as inheriting from the MP_Node class.

fixup! ✨(backend) add django-treebeard to allow tree structure on doc…

5d659f6

…uments

🚚(backend) split test files to make place for tests on tree structure

8f15a23

The test_api_documents_list file was getting too long. We can extract tests on filters and ordering.

✨(backend) list only the first visible parent document for a user

f5b7934

Now that we have a tree structure, we should only include parents of a visible subtree in list results.

✨(backend) retrieve & update a document taking into account ancestors

79d8651

A document should inherit the access rights a user has on any of its ancestors.

fixup! ✨(backend) retrieve & update a document taking into account an…

31e20e1

…cestors

✨(backend) add depth, path and numchild to serialized document

eb20ecb

This information is useful for the frontend to display the document tree structure and is cheap to expose.

✨(backend) add API endpoint to list a document's children

a31eaba

This endpoint is nested under a document's detail endpoint.

✨(backend) add API endpoint to create children for a document

013db7d

We add a POST method to the existing children endpoint.

fixup! ♻️(backend) remove content from list serializer and introduce …

5c2dd70

…excerpt

sampaccoud force-pushed the add-treebeard-for-document-trees branch from f9de1eb to 3485d66 Compare January 2, 2025 22:18

sampaccoud self-assigned this Jan 2, 2025

sampaccoud requested a review from AntoLC January 2, 2025 22:25

sampaccoud mentioned this pull request Jan 2, 2025

Subpages #435

Open

AntoLC reviewed Jan 3, 2025

View reviewed changes

src/backend/core/api/viewsets.py Outdated Show resolved Hide resolved

sampaccoud added 3 commits January 3, 2025 17:56

✨(backend) add API endpoint action to restore a soft deleted document

f0be8ba

Only owners can see and restore deleted documents. They can only do it during the grace period before the document is considered hard deleted and hidden from everybody on the API.

sampaccoud force-pushed the add-treebeard-for-document-trees branch from 3485d66 to 3201612 Compare January 3, 2025 16:56

qbey reviewed Jan 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow organizing documents in a tree structure #516

Allow organizing documents in a tree structure #516

sampaccoud commented Dec 18, 2024 •

edited

Loading

AntoLC left a comment

AntoLC Dec 19, 2024

virgile-dev commented Jan 2, 2025 •

edited

Loading

AntoLC left a comment

AntoLC Jan 3, 2025 •

edited

Loading

AntoLC Jan 6, 2025

qbey Jan 6, 2025 •

edited

Loading

qbey Jan 8, 2025

qbey left a comment

qbey Jan 7, 2025

qbey Jan 7, 2025

qbey Jan 7, 2025

qbey Jan 8, 2025

qbey Jan 8, 2025

qbey Jan 8, 2025

qbey Jan 8, 2025

	field=models.CharField(max_length=255, unique=True),
	field=models.CharField(max_length=255, unique=True, db_collation="C"),

	role=random.choice(models.RoleChoices.choices)[0],
	role=random.choice(models.RoleChoices.values),

Allow organizing documents in a tree structure #516

Are you sure you want to change the base?

Allow organizing documents in a tree structure #516

Conversation

sampaccoud commented Dec 18, 2024 • edited Loading

Purpose

Proposal

Key Changes:

AntoLC left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

virgile-dev commented Jan 2, 2025 • edited Loading

AntoLC left a comment

Choose a reason for hiding this comment

AntoLC Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qbey Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qbey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sampaccoud commented Dec 18, 2024 •

edited

Loading

virgile-dev commented Jan 2, 2025 •

edited

Loading

AntoLC Jan 3, 2025 •

edited

Loading

qbey Jan 6, 2025 •

edited

Loading