-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow organizing documents in a tree structure #516
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is nice!!
It is a bit hard to review from a frontend side perspective without too much context, but it looks what we need to do our tree structure.
For me, if you put back abilities
, we could merge it because except that, it does not seems to impact negatively the frontend; then we will iterate when we will start to integrate the tree on the frontend.
queue.push( | ||
models.Document( | ||
depth=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add some children docs, with different depth as well ?
We choose to use Django-treebeard for its quality, performance and stability. Adding tree structure to documents is as simple as inheriting from the MP_Node class.
The test_api_documents_list file was getting too long. We can extract tests on filters and ordering.
Now that we have a tree structure, we should only include parents of a visible subtree in list results.
A document should inherit the access rights a user has on any of its ancestors.
This information is useful for the frontend to display the document tree structure and is cheap to expose.
This endpoint is nested under a document's detail endpoint.
We add a POST method to the existing children endpoint.
Including the content field in the list view is not efficient as we need to query the object storage to retrieve it. We want to display an excerpt of the content on the list view so we should store it in database. We let the frontend compute it and save it for us in the new "excerpt" field because we are not supposed to have access to the content (E2EE feature coming)
Linked : #435 |
user roles were already computed as an annotation on the query for performance as we must look at all the document's ancestors to determine the roles that apply recursively. We can easily expose them as readonly via the serializer.
f9de1eb
to
3485d66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except the collation problem, all seems ok.
It exists the extension pg_stat_statements
, it could be maybe nice to add it, it would help us to track slow queries. Can be in another PR though.
migrations.AlterField( | ||
model_name='document', | ||
name='path', | ||
field=models.CharField(max_length=255, unique=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During my tests I had a problem during the creation of a doc, about the path unicity, the alphabet
that we set uses lowercase values, so it seems to be needed to update the collation accordingly:
field=models.CharField(max_length=255, unique=True), | |
field=models.CharField(max_length=255, unique=True, db_collation="C"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to have a certain amount of document to see the error, the algo has to reach the lowercase values.
If you add this test, it will showcase the problem:
def test_api_documents_create_authenticated_success_check_path():
"""
Check that the path of the document is correctly set when creating a document.
"""
user = factories.UserFactory()
client = APIClient()
client.force_login(user)
# Create 50 docs
for i in range(50):
response = client.post(
"/api/v1.0/documents/",
{
"title": f"my document {i}",
},
format="json",
)
assert response.json() != ['Document with this Path already exists.']
assert response.status_code == 201
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, had the same behavior on numerique-gouv/people#560
Warning, the alphabet must be fixed also, because uppercase are "before" lowercase when sorting.
ALPHABET = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
see
@classmethod
def get_root_nodes(cls):
""":returns: A queryset containing the root nodes in the tree."""
return get_result_class(cls).objects.filter(depth=1).order_by('path')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sampaccoud I suggest to add a test like
def test_models_documents_tree_alphabet():
"""Test the creation of teams with treebeard methods."""
models.Document.load_bulk([
{
"data": {
"title": f"document-{i}",
}
}
for i in range(len(models.Document.alphabet) * 2)
])
Which will assert the alphabet is good + the collation is properly configured.
Now that we have introduced a document tree structure, it is not possible to allow deleting documents anymore as it impacts the whole subtree below the deleted document and the consequences are too big. We introduce soft delete in order to give a second though to the document's owner (who is the only one to be allowed to deleted a document). After a document is soft deleted, the owner can still see it in the trashbin (/api/v1.0/documents/?is_deleted=true). After a grace period (30 days be default) the document disappears from the trashbin and can't be restored anymore. Note that even then it is still kept in database. Cleaning the database to erase deleted document after the grace period can be done as a maintenance script.
Only owners can see and restore deleted documents. They can only do it during the grace period before the document is considered hard deleted and hidden from everybody on the API.
Only administrators or owners of a document can move it to a target document for which they are also administrator or owner. We allow different moving modes: - first-child: move the document as the first child of the target - last-child: move the document as the last child of the target - first-sibling: move the document as the first sibling of the target - last-sibling: move the document as the last sibling of the target - left: move the document as sibling ordered just before the target - right: move the document as sibling ordered just after the target The whole subtree below the document that is being moved, moves as well and remains below the document after it is moved.
3485d66
to
3201612
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the only thing to fix is the collation and the alphabet, otherwise this looks good to me :) I took few ideas of this PR for mine ^^
return ( | ||
serializers.ListDocumentSerializer | ||
if self.action == "list" | ||
else self.serializer_class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is noop, but since you made a change, don't you want to consider using SerializerPerActionMixin
(
docs/src/backend/core/api/viewsets.py
Line 109 in 0279304
class SerializerPerActionMixin: |
people
https://github.com/numerique-gouv/people/blob/42d7d00772baf30ff93dc2b8bfca57ff3e6dccb1/src/backend/core/api/client/viewsets.py#L81 ?
model = models.DocumentAccess | ||
extra = 0 | ||
|
||
|
||
@admin.register(models.Document) | ||
class DocumentAdmin(admin.ModelAdmin): | ||
class DocumentAdmin(TreeAdmin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not mandatory, but as stated in the documentation you might add
form = movenodeform_factory(models.Document)
to add the "tree" representation in the admin list (maybe you did not add it on purpose)
@@ -358,6 +359,10 @@ class Document(BaseModel): | |||
|
|||
_content = None | |||
|
|||
# Tree structure | |||
steplen = 7 # nb siblings max: 78,364,164,096 / max depth: 255/7=36 | |||
node_order_by = None # Manual ordering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default for MPNode
is []
, why do you need to override it?
migrations.AlterField( | ||
model_name='document', | ||
name='path', | ||
field=models.CharField(max_length=255, unique=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sampaccoud I suggest to add a test like
def test_models_documents_tree_alphabet():
"""Test the creation of teams with treebeard methods."""
models.Document.load_bulk([
{
"data": {
"title": f"document-{i}",
}
}
for i in range(len(models.Document.alphabet) * 2)
])
Which will assert the alphabet is good + the collation is properly configured.
if via == USER: | ||
models.DocumentAccess.objects.create( | ||
document=grand_parent, | ||
user=user, | ||
role=random.choice(models.RoleChoices.choices)[0], | ||
) | ||
elif via == TEAM: | ||
mock_user_teams.return_value = ["lasuite", "unknown"] | ||
models.DocumentAccess.objects.create( | ||
document=grand_parent, | ||
team="lasuite", | ||
role=random.choice(models.RoleChoices.choices)[0], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of "logic" in tests, but while this reduces the number of lines a lot, I guess it's ok ^^
models.DocumentAccess.objects.create( | ||
document=grand_parent, | ||
user=user, | ||
role=random.choice(models.RoleChoices.choices)[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
role=random.choice(models.RoleChoices.choices)[0], | |
role=random.choice(models.RoleChoices.values), |
models.DocumentAccess.objects.create( | ||
document=grand_parent, | ||
team="lasuite", | ||
role=random.choice(models.RoleChoices.choices)[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
role=random.choice(models.RoleChoices.choices)[0], | |
role=random.choice(models.RoleChoices.values), |
Purpose
We want to allow nesting documents in a tree structure.
Proposal
This pull request introduces a hierarchical organization for documents by implementing a tree structure. It integrates the django-treebeard library to manage the tree hierarchy, allowing documents to be nested within parent documents. This enhancement will turn Docs into a knowledge management platform.
Key Changes: