Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow organizing documents in a tree structure #516

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

sampaccoud
Copy link
Member

@sampaccoud sampaccoud commented Dec 18, 2024

Purpose

We want to allow nesting documents in a tree structure.

Proposal

This pull request introduces a hierarchical organization for documents by implementing a tree structure. It integrates the django-treebeard library to manage the tree hierarchy, allowing documents to be nested within parent documents. This enhancement will turn Docs into a knowledge management platform.

Key Changes:

  • Add the django-treebeard library to dependencies to support tree structures for documents
  • Update document models to include parent-child relationships.
  • Modify and optimize the document retrieval logic to take into account the tree structure, ensuring that only the first visible parent document for a user is shown in list views.
  • Add soft delete to allow trashbin restore (important when deleting complete subtrees)
  • Improve admin interface and activate treebeard's nested views
  • Modify the test suites to take into account the tree structure of documents

⚠️ Note: Existing documents are transformed to root nodes in the new tree structure.

@sampaccoud sampaccoud changed the title Add treebeard for document trees Allow organizing documents in a tree structure Dec 18, 2024
Copy link
Collaborator

@AntoLC AntoLC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is nice!!

It is a bit hard to review from a frontend side perspective without too much context, but it looks what we need to do our tree structure.
For me, if you put back abilities, we could merge it because except that, it does not seems to impact negatively the frontend; then we will iterate when we will start to integrate the tree on the frontend.

src/backend/core/models.py Show resolved Hide resolved
src/backend/core/api/serializers.py Show resolved Hide resolved
queue.push(
models.Document(
depth=1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add some children docs, with different depth as well ?

We choose to use Django-treebeard for its quality, performance and
stability. Adding tree structure to documents is as simple as
inheriting from the MP_Node class.
The test_api_documents_list file was getting too long. We can extract
tests on filters and ordering.
Now that we have a tree structure, we should only include parents of
a visible subtree in list results.
A document should inherit the access rights a user has on any of its
ancestors.
This information is useful for the frontend to display the document
tree structure and is cheap to expose.
This endpoint is nested under a document's detail endpoint.
We add a POST method to the existing children endpoint.
Including the content field in the list view is not efficient as we need
to query the object storage to retrieve it. We want to display an excerpt
of the content on the list view so we should store it in database. We
let the frontend compute it and save it for us in the new "excerpt" field
because we are not supposed to have access to the content (E2EE feature coming)
@virgile-dev
Copy link
Collaborator

virgile-dev commented Jan 2, 2025

Linked : #435

user roles were already computed as an annotation on the query for
performance as we must look at all the document's ancestors to determine
the roles that apply recursively. We can easily expose them as readonly
via the serializer.
@sampaccoud sampaccoud force-pushed the add-treebeard-for-document-trees branch from f9de1eb to 3485d66 Compare January 2, 2025 22:18
@sampaccoud sampaccoud self-assigned this Jan 2, 2025
@sampaccoud sampaccoud requested a review from AntoLC January 2, 2025 22:25
@sampaccoud sampaccoud mentioned this pull request Jan 2, 2025
Copy link
Collaborator

@AntoLC AntoLC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except the collation problem, all seems ok.

It exists the extension pg_stat_statements, it could be maybe nice to add it, it would help us to track slow queries. Can be in another PR though.

migrations.AlterField(
model_name='document',
name='path',
field=models.CharField(max_length=255, unique=True),
Copy link
Collaborator

@AntoLC AntoLC Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During my tests I had a problem during the creation of a doc, about the path unicity, the alphabet that we set uses lowercase values, so it seems to be needed to update the collation accordingly:

Suggested change
field=models.CharField(max_length=255, unique=True),
field=models.CharField(max_length=255, unique=True, db_collation="C"),

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to have a certain amount of document to see the error, the algo has to reach the lowercase values.
If you add this test, it will showcase the problem:

def test_api_documents_create_authenticated_success_check_path():
    """
    Check that the path of the document is correctly set when creating a document.
    """
    user = factories.UserFactory()

    client = APIClient()
    client.force_login(user)

    # Create 50 docs
    for i in range(50):
        response = client.post(
            "/api/v1.0/documents/",
            {
                "title": f"my document {i}",
            },
            format="json",
        )

        assert response.json() != ['Document with this Path already exists.']
        assert response.status_code == 201 

Copy link
Collaborator

@qbey qbey Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, had the same behavior on numerique-gouv/people#560

Warning, the alphabet must be fixed also, because uppercase are "before" lowercase when sorting.

ALPHABET = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

see

    @classmethod
    def get_root_nodes(cls):
        """:returns: A queryset containing the root nodes in the tree."""
        return get_result_class(cls).objects.filter(depth=1).order_by('path')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sampaccoud I suggest to add a test like

def test_models_documents_tree_alphabet():
    """Test the creation of teams with treebeard methods."""
    models.Document.load_bulk([
        {
            "data": {
                "title": f"document-{i}",
            }
        }
        for i in range(len(models.Document.alphabet) * 2)
    ])

Which will assert the alphabet is good + the collation is properly configured.

Now that we have introduced a document tree structure, it is not
possible to allow deleting documents anymore as it impacts the whole
subtree below the deleted document and the consequences are too big.

We introduce soft delete in order to give a second though to the
document's owner (who is the only one to be allowed to deleted a
document). After a document is soft deleted, the owner can still
see it in the trashbin (/api/v1.0/documents/?is_deleted=true).
After a grace period (30 days be default) the document disappears
from the trashbin and can't be restored anymore. Note that even
then it is still kept in database. Cleaning the database to erase
deleted document after the grace period can be done as a maintenance
script.
Only owners can see and restore deleted documents. They can only do
it during the grace period before the document is considered hard
deleted and hidden from everybody on the API.
Only administrators or owners of a document can move it to a target
document for which they are also administrator or owner.

We allow different moving modes:
- first-child: move the document as the first child of the target
- last-child: move the document as the last child of the target
- first-sibling: move the document as the first sibling of the target
- last-sibling: move the document as the last sibling of the target
- left: move the document as sibling ordered just before the target
- right: move the document as sibling ordered just after the target

The whole subtree below the document that is being moved, moves as
well and remains below the document after it is moved.
@sampaccoud sampaccoud force-pushed the add-treebeard-for-document-trees branch from 3485d66 to 3201612 Compare January 3, 2025 16:56
Copy link
Collaborator

@qbey qbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only thing to fix is the collation and the alphabet, otherwise this looks good to me :) I took few ideas of this PR for mine ^^

Comment on lines +417 to +420
return (
serializers.ListDocumentSerializer
if self.action == "list"
else self.serializer_class
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is noop, but since you made a change, don't you want to consider using SerializerPerActionMixin (

class SerializerPerActionMixin:
) or better use the same as in people https://github.com/numerique-gouv/people/blob/42d7d00772baf30ff93dc2b8bfca57ff3e6dccb1/src/backend/core/api/client/viewsets.py#L81 ?

model = models.DocumentAccess
extra = 0


@admin.register(models.Document)
class DocumentAdmin(admin.ModelAdmin):
class DocumentAdmin(TreeAdmin):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not mandatory, but as stated in the documentation you might add

form = movenodeform_factory(models.Document)

to add the "tree" representation in the admin list (maybe you did not add it on purpose)

@@ -358,6 +359,10 @@ class Document(BaseModel):

_content = None

# Tree structure
steplen = 7 # nb siblings max: 78,364,164,096 / max depth: 255/7=36
node_order_by = None # Manual ordering
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default for MPNode is [], why do you need to override it?

migrations.AlterField(
model_name='document',
name='path',
field=models.CharField(max_length=255, unique=True),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sampaccoud I suggest to add a test like

def test_models_documents_tree_alphabet():
    """Test the creation of teams with treebeard methods."""
    models.Document.load_bulk([
        {
            "data": {
                "title": f"document-{i}",
            }
        }
        for i in range(len(models.Document.alphabet) * 2)
    ])

Which will assert the alphabet is good + the collation is properly configured.

Comment on lines +209 to +221
if via == USER:
models.DocumentAccess.objects.create(
document=grand_parent,
user=user,
role=random.choice(models.RoleChoices.choices)[0],
)
elif via == TEAM:
mock_user_teams.return_value = ["lasuite", "unknown"]
models.DocumentAccess.objects.create(
document=grand_parent,
team="lasuite",
role=random.choice(models.RoleChoices.choices)[0],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of "logic" in tests, but while this reduces the number of lines a lot, I guess it's ok ^^

models.DocumentAccess.objects.create(
document=grand_parent,
user=user,
role=random.choice(models.RoleChoices.choices)[0],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
role=random.choice(models.RoleChoices.choices)[0],
role=random.choice(models.RoleChoices.values),

models.DocumentAccess.objects.create(
document=grand_parent,
team="lasuite",
role=random.choice(models.RoleChoices.choices)[0],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
role=random.choice(models.RoleChoices.choices)[0],
role=random.choice(models.RoleChoices.values),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants