You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At TU Graz, one of our users attempted to upload a filename containing a # character to our invenio-instance.
Upload of such filenames-containing-# currently fails.
Other invenio-instances have the same issue: zenodo-rdm issue #981
in the javascript-function, startUploadURL is set to the link built as per RDMFileDraftServiceConfig.file_links_item["content"]'s URI-template
the then-called this._doUpload eventually calls axiosWithConfig.put(URLPassedToDoUpload)
To see the issue with filenames-containing-# let's walk through steps 1.-4. with a file named foo#spam.txt that a user attempts to upload to record-id asdfg-hjk42:
this._initializeUpload works and the backend correctly initializes the file foo#spam.txt for record-id asdfg-hjk42
backend builds the "content"-FileLink to "/api/records/asdfg-hjk42/draft/files/foo#spam.txt/content"
this link is received by frontend and made available under initializedMetadata.links.content
_doUpload(...) calls axiosWithConfig.put("/api/records/asdfg-hjk42/draft/files/foo#spam.txt/content"), but the outgoing request has everything after # cut off, causing the upload to fail at this point
Background Information
URI-templates
URI-templates (RFC 6570) are used by invenio to build URIs.
In a URI-template, {key} signifies to replace itself with the value of key, percent-encoding some characters when doing so.
The + in {+key} turns off percent-encoding (for some characters) when expanding key.
e.g. for key = "foo#spam.text", {key} expands to foo%25spam.txt whereas {+key} expands to foo#spam.txt.
When to percent-encode?
URI-syntax is defined in RFC3986.
In section 2.2, the opening paragraph dictates when to percent-encode characters: "If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed."
flask's percent-decoding behavior
The above dictate on when to percent-encode means that compliantly built URIs will usually have their delimiting characters percent-encoded.
Hence, servers will often receive percent-encoded URIs and it would be sensible for frameworks to percent-decode them early on. flask does indeed do so:
Note that in the above image the outgoing GET request maintains percent-encoding, but the mirror-function receives the decoded URI either way.
The above uses the versions of flask and werkzeug that invenio is currently on.
However I did also test it with most recent versions of flask and werkzeug, the behavior is the same.
Further, stepping through flask-internals with a debugger shows that percent-decoding of requested URI:
is done before matching it against registered routes (i.e. before computing which @flask_app.route("/route/<key>") applies to the requested URI)
is done before the first line of invenio-code is executed
Reason {+key} contains the + to begin with
invenio used to use {key} (without the +) in URI-templates earlier on.
This was changed in invenio-records-resources' PR #535, which allows uploading filenames-with-/ by making two kinds of changes:
change routes from "/files/<key>" to "/files/<path:key>"
change URI-templates from "{+api}/records/{id}/files/{key}" to "{+api}/records/{id}/files/{+key}" (adding + in front of key)
(The + was also added to corresponding classes in invenio-rdm-records in another PR.)
Putting things together
As the filename is arguably one component of the URI, not percent-encoding it is non-compliant with RFC3986.
This suggests we ought to percent-encode the filename (by dropping the + in front of the URI-template's key).
In local testing this fixed uploading of filenames-containing-#.
Further, as URIs get percent-decoded by flask before any invenio-code is even executed, this difference isn't really observable to invenio backend code.
Percent-encoding the filename-part of the URI thus shouldn't break anything.
As for the PR that introduced the + to the URI-template, I think that this addition was unnecessary to achieve its stated goal of fixing upload of filenames-containing-/.
As stated above, that PR did two kinds of changes and I think that:
changing routes from "/files/<key>" to "/files/<path:key>" was sufficient to fix uploading filenames-containing-/
changing URI-templates from "{+api}/records/{id}/files/{key}" to "{+api}/records/{id}/files/{+key}" breaks uploading of filenames-containing-# for seemingly no gain
Sadly, I have no easy way of testing uploading of filenames-containing-/, so this would need to tested by someone with the setup for it...
Suggested Changes
Remove + in {+key} for all file-upload related URI-templates:
+ seems inconsistently used across invenio's URI-templates
For example, media-files already don't include + for their key in file-upload URI-templates (see here).
Other issues caused by incorrect choice of {thing} vs {+thing} in URI-templates might be lurking in the codebase...
path: in routes makes some filenames match more than one route
Here's two routes after path: was added to them:
"item": "/files/<path:key>"
"item-content": "/files/<path:key>/content"
Now a file named something/content would match both:
the "item" route with key = "something/content"
the "item-content" route with key = "something"
content and commit (from the "item-commit" route) seem like somewhat generic names that may eventually pop up
Seems lower priority to me, so this might be more of an FYI should you ever run into this...
The text was updated successfully, but these errors were encountered:
Motivation
At TU Graz, one of our users attempted to upload a filename containing a
#
character to our invenio-instance.Upload of such filenames-containing-
#
currently fails.Other invenio-instances have the same issue: zenodo-rdm issue #981
Issue Investigation
To summarize the the code involved in the issue:
key
is the filename)startUploadURL
is set to the link built as perRDMFileDraftServiceConfig.file_links_item["content"]
's URI-templatethis._doUpload
eventually callsaxiosWithConfig.put(URLPassedToDoUpload)
To see the issue with filenames-containing-
#
let's walk through steps 1.-4. with a file namedfoo#spam.txt
that a user attempts to upload to record-idasdfg-hjk42
:this._initializeUpload
works and the backend correctly initializes the filefoo#spam.txt
for record-idasdfg-hjk42
"content"
-FileLink to"/api/records/asdfg-hjk42/draft/files/foo#spam.txt/content"
initializedMetadata.links.content
_doUpload(...)
callsaxiosWithConfig.put("/api/records/asdfg-hjk42/draft/files/foo#spam.txt/content")
, but the outgoing request has everything after#
cut off, causing the upload to fail at this pointBackground Information
URI-templates
URI-templates (RFC 6570) are used by invenio to build URIs.
In a URI-template,
{key}
signifies to replace itself with the value ofkey
, percent-encoding some characters when doing so.The
+
in{+key}
turns off percent-encoding (for some characters) when expandingkey
.e.g. for
key = "foo#spam.text"
,{key}
expands tofoo%25spam.txt
whereas{+key}
expands tofoo#spam.txt
.When to percent-encode?
URI-syntax is defined in RFC3986.
In section 2.2, the opening paragraph dictates when to percent-encode characters:
"If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed."
flask
's percent-decoding behaviorThe above dictate on when to percent-encode means that compliantly built URIs will usually have their delimiting characters percent-encoded.
Hence, servers will often receive percent-encoded URIs and it would be sensible for frameworks to percent-decode them early on.
flask
does indeed do so:Note that in the above image the outgoing
GET
request maintains percent-encoding, but themirror
-function receives the decoded URI either way.The above uses the versions of
flask
andwerkzeug
that invenio is currently on.However I did also test it with most recent versions of
flask
andwerkzeug
, the behavior is the same.Further, stepping through
flask
-internals with a debugger shows that percent-decoding of requested URI:@flask_app.route("/route/<key>")
applies to the requested URI)Reason
{+key}
contains the+
to begin withinvenio used to use
{key}
(without the+
) in URI-templates earlier on.This was changed in
invenio-records-resources
' PR #535, which allows uploading filenames-with-/
by making two kinds of changes:"/files/<key>"
to"/files/<path:key>"
"{+api}/records/{id}/files/{key}"
to"{+api}/records/{id}/files/{+key}"
(adding+
in front ofkey
)(The
+
was also added to corresponding classes ininvenio-rdm-records
in another PR.)Putting things together
As the filename is arguably one component of the URI, not percent-encoding it is non-compliant with RFC3986.
This suggests we ought to percent-encode the filename (by dropping the
+
in front of the URI-template'skey
).In local testing this fixed uploading of filenames-containing-
#
.Further, as URIs get percent-decoded by
flask
before any invenio-code is even executed, this difference isn't really observable to invenio backend code.Percent-encoding the filename-part of the URI thus shouldn't break anything.
As for the PR that introduced the
+
to the URI-template, I think that this addition was unnecessary to achieve its stated goal of fixing upload of filenames-containing-/
.As stated above, that PR did two kinds of changes and I think that:
"/files/<key>"
to"/files/<path:key>"
was sufficient to fix uploading filenames-containing-/
"{+api}/records/{id}/files/{key}"
to"{+api}/records/{id}/files/{+key}"
breaks uploading of filenames-containing-#
for seemingly no gainSadly, I have no easy way of testing uploading of filenames-containing-
/
, so this would need to tested by someone with the setup for it...Suggested Changes
Remove
+
in{+key}
for all file-upload related URI-templates:invenio-records-resources
it's these linesinvenio-rdm-records
it's these linesRelated Issues (outside of this RFCs scope)
+
seems inconsistently used across invenio's URI-templatesFor example,
media-files
already don't include+
for theirkey
in file-upload URI-templates (see here).Other issues caused by incorrect choice of
{thing}
vs{+thing}
in URI-templates might be lurking in the codebase...path:
in routes makes some filenames match more than one routeHere's two routes after
path:
was added to them:"item": "/files/<path:key>"
"item-content": "/files/<path:key>/content"
Now a file named
something/content
would match both:"item"
route withkey = "something/content"
"item-content"
route withkey = "something"
content
andcommit
(from the"item-commit"
route) seem like somewhat generic names that may eventually pop upSeems lower priority to me, so this might be more of an FYI should you ever run into this...
The text was updated successfully, but these errors were encountered: