Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.msg file upload #4961

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 80 additions & 82 deletions src/openforms/conf/locale/nl/LC_MESSAGES/django.po
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please exclude backend translations from PRs? They get sorted out during the release preparation.

They're a frequent source of merge conflicts otherwise.

Large diffs are not rendered by default.

16 changes: 15 additions & 1 deletion src/openforms/formio/api/validators.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,8 @@ def __init__(self, allowed_mime_types: Iterable[str] | None = None):

def __call__(self, value: UploadedFile) -> None:
head = value.read(2048)
ext = value.name.split(".")[-1]
file_name_parts = value.name.split(".")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this way of grabbing the extension is actually quite flawed (it already was): try for example a file name foo/bar.baz/example.png :)

Instead, what is more correct to use is pathlib:

from pathlib import Path

ext = Path(value.name).suffix[1:]

which gives you the actual extension (without dot)

ext = file_name_parts[-1]
mime_type = magic.from_buffer(head, mime=True)

# gh #2520
Expand All @@ -76,6 +77,14 @@ def __call__(self, value: UploadedFile) -> None:
_("The provided file is not a valid file type.")
)

if len(file_name_parts) == 1:
raise serializers.ValidationError(
_(
"Could not determine the file type. Please make sure the file name "
"has an extension."
)
)
Comment on lines +80 to +86
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(file_name_parts) == 1:
raise serializers.ValidationError(
_(
"Could not determine the file type. Please make sure the file name "
"has an extension."
)
)
if not ext:
raise serializers.ValidationError(
_(
"Could not determine the file type. Please make sure the file name "
"has an extension."
)
)

this file_name_parts check is really a convoluted way to say "I couldn't determine an extension".


# Contents is allowed. Do extension or submitted content_type agree?
if value.content_type == "application/octet-stream":
m = magic.Magic(extension=True)
Expand Down Expand Up @@ -111,6 +120,11 @@ def __call__(self, value: UploadedFile) -> None:
"image/heif",
):
return
# 4795
# The sdk cannot determine the file type of .msg files, which result into
# content_type "". So we have to validate these for ourselves
elif mime_type == "application/vnd.ms-outlook" and ext == "msg":
return
Comment on lines +123 to +127
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than adding another hardcoded check here, we can do better, but we need to restructure the validator code a bit!

First, start with figuring out the provided content type:

provided_content_type = value.content_type or "application/octet-stream"

and replace all value.content_type checks with provided_content_type.

The or clause is important here - in situations where the mime type could not be determined by the browser and it returns an empty string, the backend will fall back to the generic binary data mime type. But, the interesting part is that in the generic binary data check:

if value.content_type == "application/octet-stream": ...

the extensions for the binary data are looked up and checked against the provided file extension. So, we invert the process here - given the binary data, what should the extension be, and validate this.


# gh #4658
# Windows use application/x-zip-compressed as a mimetype for .zip files, which
Expand Down
9 changes: 1 addition & 8 deletions src/openforms/formio/components/vanilla.py
Original file line number Diff line number Diff line change
Expand Up @@ -338,14 +338,7 @@ class FileSerializer(serializers.Serializer):
originalName = serializers.CharField(trim_whitespace=False)
size = serializers.IntegerField(min_value=0)
storage = serializers.ChoiceField(choices=["url"])
type = serializers.CharField(
error_messages={
"blank": _(
"Could not determine the file type. Please make sure the file name "
"has an extension."
),
}
)
type = serializers.CharField(required=False, allow_blank=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type = serializers.CharField(required=False, allow_blank=True)
type = serializers.CharField(required=True, allow_blank=True)

I would definitely still require the key to be present, but we can accept empty strings and process them during validation indeed.

url = serializers.URLField()
data = FileDataSerializer() # type: ignore

Expand Down
Binary file added src/openforms/formio/tests/files/test.msg
Binary file not shown.
16 changes: 16 additions & 0 deletions src/openforms/formio/tests/test_validators.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,22 @@ def test_allowed_mime_types_for_csv_files(self):

validator(sample)

def test_allowed_mime_types_for_msg_files(self):
valid_type = "application/vnd.ms-outlook"
msg_file = TEST_FILES / "test.msg"
validator = validators.MimeTypeValidator(allowed_mime_types=[valid_type])

# 4795
# The sdk cannot determine the content_type for .msg files correctly.
# Because .msg is a windows specific file, and linux and MacOS don't know it.
# So we simulate the scenario where content_type is unknown
sample = SimpleUploadedFile(
"test.msg",
msg_file.read_bytes(),
)
Comment on lines +214 to +217
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in the file having content_type="text/plain" because it's the default of SimpleUploadedFile. You need to explicitly reproduce the case where it's empty as provided by the frontend:

Suggested change
sample = SimpleUploadedFile(
"test.msg",
msg_file.read_bytes(),
)
sample = SimpleUploadedFile(
"test.msg",
msg_file.read_bytes(),
content_type="",
)


validator(sample)

def test_validate_files_multiple_mime_types(self):
"""Assert that validation of files associated with multiple mime types works

Expand Down
4 changes: 2 additions & 2 deletions src/openforms/formio/tests/validation/test_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -602,10 +602,10 @@ def test_attach_upload_validates_unknown_file_type(self):
}

is_valid, errors = validate_formio_data(component, data, submission=submission)
error = extract_error(errors["foo"][0], "type")
error = extract_error(errors["foo"][0], "non_field_errors")

self.assertFalse(is_valid)
self.assertEqual(error.code, "blank")
self.assertEqual(error.code, "invalid")
self.assertEqual(
error,
_(
Expand Down
Binary file added src/openforms/tests/e2e/data/test.msg
Binary file not shown.
61 changes: 61 additions & 0 deletions src/openforms/tests/e2e/test_file_upload.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,64 @@ def setUpTestData():
await expect(
page.get_by_text("Een moment geduld", exact=False)
).to_be_visible()

async def test_form_with_msg_file_upload(self):
# If using the ci.py settings locally, the SDK_RELEASE variable should be set to 'latest', otherwise the
# JS/CSS for the SDK will not be found (since they will be expected to be in the folder
# openforms/static/sdk/<SDK version tag> instead of openforms/static/sdk
@sync_to_async
def setUpTestData():
# set up a form
form = FormFactory.create(
name="Form with file upload",
slug="form-with-file-upload",
generate_minimal_setup=True,
formstep__form_definition__name="First step",
formstep__form_definition__slug="first-step",
formstep__form_definition__configuration={
"components": [
{
"type": "file",
"key": "fileUpload",
"label": "File Upload",
"storage": "url",
"validate": {
"required": True,
},
}
]
},
translation_enabled=False, # force Dutch
ask_privacy_consent=False,
ask_statement_of_truth=False,
)
return form

form = await setUpTestData()
form_url = str(
furl(self.live_server_url)
/ reverse("forms:form-detail", kwargs={"slug": form.slug})
)

with patch("openforms.utils.validators.allow_redirect_url", return_value=True):
async with browser_page() as page:
await page.goto(form_url)

await page.get_by_role("button", name="Formulier starten").click()

async with page.expect_file_chooser() as fc_info:
await page.get_by_text("blader").click()

file_chooser = await fc_info.value
await file_chooser.set_files(TEST_FILES / "test.msg")

await page.wait_for_load_state("networkidle")

uploaded_file = page.get_by_role("link", name="test.msg")
await expect(uploaded_file).to_be_visible()

await page.get_by_role("button", name="Volgende").click()
await page.get_by_role("button", name="Verzenden").click()
await expect(
page.get_by_text("Een moment geduld", exact=False)
).to_be_visible()
6 changes: 3 additions & 3 deletions src/openforms/tests/e2e/test_input_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -940,7 +940,7 @@ def test_unknown_file_type(self):
# The frontend validation will *not* create a TemporaryFileUpload,
# as the frontend will block the upload because of the invalid file type.
# However the user could do an handcrafted API call.
# For this reason, we manually create an invalid TemporaryFileUpload
# For this reason, we manually try to create an invalid TemporaryFileUpload
# and use it for the `api_value`:

with open(TEST_FILES / "unknown-type", "rb") as infile:
Expand All @@ -966,8 +966,8 @@ def test_unknown_file_type(self):
],
)

# Make sure the frontend did not create one:
self.assertEqual(TemporaryFileUpload.objects.count(), 1)
# Make sure that no temporary files were created
self.assertEqual(TemporaryFileUpload.objects.count(), 0)
Comment on lines +969 to +970
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what changed here? Why was there initially a temporary file upload and now not anymore?



class SingleAddressNLTests(ValidationsTestCase):
Expand Down
Loading