Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Uploads world readable #4387

Closed
wants to merge 2 commits into from
Closed

Fix Uploads world readable #4387

wants to merge 2 commits into from

Conversation

adnrs96
Copy link
Member

@adnrs96 adnrs96 commented Apr 3, 2017

This PR contains the extension work to the #769. The work left of was

  • Speed up the tests to run upon latest master.
  • Investigate tests and finish enforcing the logic.
  • Update the serve_s3 to use logic we just built for validation.
  • Fix breaking tests.
  • Finally make django-sendfile serve the local files to finally enforce the security rules. We would like nginx to serve the files after authentication.
    All these have been achieved in the commits within this PR.
    This should fix: Uploads world readable #320 and close [WIP] Fix uploads being world readable #769.

@zulipbot
Copy link
Member

zulipbot commented Apr 3, 2017

Hello @zulip/server-misc members, this pull request references an issue with the "area: uploads" label, so you may want to check it out!

@smarx
Copy link

smarx commented Apr 3, 2017

Automated message from Dropbox CLA bot

@adnrs96, it looks like you've already signed the Dropbox CLA. Thanks!

@adnrs96
Copy link
Member Author

adnrs96 commented Apr 3, 2017

@timabbott I have been doubtful regarding bumping the provision_version since this adds a new dependency django-sendfile. Also I was not able to figure out the work you mentioned was needed on tests. I believe you have already added the tests for the cases that may arise. If anything is needed, could you please point out, i might not have been able to recognise some case that needs tests to be added.
I have tested in Dev environment and this works great i guess. Didn't had a production setup yet, but i have studied the nginx setup little bit from here and configuration looks good to me.

@adnrs96
Copy link
Member Author

adnrs96 commented Apr 3, 2017

Also just noticed we might wanna update the docs for this. Will give it an update.

@adnrs96
Copy link
Member Author

adnrs96 commented Apr 4, 2017

Just while investigating #291, realised that in production the nginx would still have been serving uploads as a static entity due to a bug in last commit. The last commit has been updated to check to that issue.
This in return also in theory makes it possible to rectify issue being discussed in #291 and close it.

@timabbott
Copy link
Member

The first commit didn't pass tests alone, so I squashed the first 2 commits and merged them. Thanks @adnrs96!

I think we can't merge the django-sendfile piece, because that library doesn't support the default nginx version of Ubuntu Trusty (1.4.6). See johnsensible/django-sendfile#58 and the things linked from there for context.

I think there are 2 options: (1) try to modify django-sendfile to work with either version or (2) ship an upgraded nginx for Trusty for Zulip production systems, which potentially means using https://launchpad.net/~nginx/+archive/ubuntu/stable so that we're not stuck maintaining an nginx release.

I would probably start with investigating whether we can fix that django-sendfile limitation, since that feels like a lot more convenient approach if it's doable. Can you look into that @adnrs96?

@adnrs96
Copy link
Member Author

adnrs96 commented Apr 8, 2017

The first commit not passing test's was strange. I checked them before pushing (Maybe forgot to test them after editing commits). Anyways thanks for squashing and merging.
Yeah i will start to look into this.

@adnrs96
Copy link
Member Author

adnrs96 commented Apr 10, 2017

Hey @timabbott,
I finished investigating the problem with the django-sendfile being incompatible with Nginx(1.4.6). For investigating, first i tried setting up a Zulip production env which did not turn out to be a great way to deal with this, so then i went to setup a very basic Django app which accepts URL's and then uses Django-sendfile, simulating the behaviour we want in our app.
So the investigation reveals:

  • According to me, Django-sendfile doesn't really need to drop support for older Nginx for supporting unicode's. This is in contrast to the statement in this comment. I believe that it is not the case that Django-sendfile either can support older Nginx release or Unicode Char's.
  • The above is revealed when we try to use Django-sendfile 0.3.11 with Nginx 1.4.6. This causes the Nginx to have 404's due to trying to access files which doesn't exist. As said in this comment the Nginx version before 1.5.9 does not expect the URL's to be quoted. Django-sendfile 0.3.11 sends quoted URL's and hence causing 404's with older versions.
  • The proposed fix to this issue is that make Django-sendfile send quoted URL's to NGINX(>=1.5.9) and non-quoted URL's to NGINX(<1.5.9). This won't be breaking the Unicode support for Django-sendfile because i believe having send quoted URL's to NGINX was never the actual change which enabled support for Unicode filenames.
    To test this, I copied down files from Django-sendfile 0.3.11 to the testing app and had a minor edit to it so as to send URL's which are not quoted. Since i was using Nginx 1.4.6 this in theory should work and it did. I was able to access Unicode names files with names as á.txt and Б.txt using Nginx 1.4.6 and an edit of Django-sendfile sending non-quoted URL's.
  • So this leads me to propose a solution for the problem.
    Possible solutions:
  • Try to make people at Django-sendfile agree to merge a change which starts to check NGINX version and quotes URL's according to the need.
  • Try to make people at Django-sendfile agree to merge a change which allows sendfile function to accept an extra parameter relating to choice of whether URL's to NGINX should be quoted or not. And then we can check NGINX version on our end and decide what to pass in that parameter.

Thoughts?

@timabbott
Copy link
Member

timabbott commented Apr 12, 2017

Try to make people at Django-sendfile agree to merge a change which starts to check NGINX version and quotes URL's according to the need.

Let's open a PR to django-sendfile implementing this approach. We'll need to be a bit thoughtful in figuring out how to check the nginx version without creating a perf issue (wouldn't want to be calling dpkg -l every time one serves a file (I suppose one could use caching). If they're interested in the change, but don't merge it for a while, we can always run Zulip off a temporary fork. If they're not, we can try the other strategy where we use a separate function / extra argument. It seems like that should be a good long-term solution.

@adnrs96
Copy link
Member Author

adnrs96 commented Apr 12, 2017

@timabbott I opened up a PR to django-sendfile. You can take a look at PR here.

@timabbott
Copy link
Member

Awesome.

fp_path_id = re.sub('/user_uploads/', '', uri)
body = "First message ...[zulip.txt](http://localhost:9991/user_uploads/" + fp_path_id + ")"
self.send_message("[email protected]", "test-subscribe", Recipient.STREAM, body, "test")
self.client_post('/accounts/logout/')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Django's test classes support self.logout().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@showell I doubt that. I tried using self.logout() which gave a error. I took a look at docs and no logout is documented there, just login. Am i doing something wrong that resulted in self.logout() to error out if it was supposed to function correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said self.logout(), but what Django actually supports isself.client.logout():

https://docs.djangoproject.com/en/1.11/topics/testing/tools/#django.test.Client.logout

I think what we should do here is parallel what we did with login. We have a login wrapper in ZulipTestCase that essentially calls self.client.login (with a few other minor details). We should have a similar wrapper for logout. This way tests that don't really care about the details of our logging-out mechanism don't need to specify the exact URL we use.

@timabbott
Copy link
Member

@adnrs96 I think it might be reasonable to update this PR to be based on the version of django-sendfile from your PR?

@adnrs96
Copy link
Member Author

adnrs96 commented Apr 16, 2017

@timabbott I have updated the PR to fetch Django-sendfile from my fork of Django sendfile. Please take a look.
Also gave the docs and production_template an update regarding the changes.

FYI: We ended up using the approach for Django-sendfile where one can specify NGINX version in django settings if required. Requirement is for Nginx <1.5.9.

@adnrs96
Copy link
Member Author

adnrs96 commented Apr 17, 2017

(Made and deleted a comment as figured it didn't made sense)

Copy link
Member

@timabbott timabbott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @adnrs96! I posted a few comments.

Is there a good way for us to add some tests for this feature? E.g. in the test, send SENDFILE_BACKEND and verify that the right nginx headers come back in the response?

@@ -86,6 +86,9 @@ def get_secret(key):
# Import prod_settings after determining the deployment/machine type
if PRODUCTION:
from .prod_settings import *
SENDFILE_BACKEND = 'sendfile.backends.nginx'
SENDFILE_ROOT = os.path.join(LOCAL_UPLOADS_DIR, 'files')
SENDFILE_URL = '/serve_uploads'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put this in another section of the file, probably DEFAULT_SETTINGS makes the most sense? Then users can override these in their production settings files. See http://zulip.readthedocs.io/en/latest/settings.html

response = FileResponse(open(local_path, 'rb'),
content_type = mimetypes.guess_type(filename))
return response
return sendfile(request, local_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be passing something based on NGINX_VERSION here? Or is NGINX_VERSION just checked in the sendfile code? I guess that makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NGINX_VERSION is checked in sendfile. sendfile automatically picks it from django settings.

@@ -166,21 +166,20 @@
# directly on the Zulip server. If file storage in Amazon S3 is
# desired, you can configure that as follows:
#
# (1) Set s3_key and s3_secret_key in /etc/zulip/zulip-secrets.conf to
# Set s3_key and s3_secret_key in /etc/zulip/zulip-secrets.conf to
# be the S3 access and secret keys that you want to use, and setting
# the S3_AUTH_UPLOADS_BUCKET and S3_AVATAR_BUCKET to be the S3 buckets
# you've created to store file uploads and user avatars, respectively.
# Then restart Zulip (scripts/restart-zulip).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document here, in the instructions, that they need to set NGINX_VERSION if the version installed on their server isn't the same as what nginx version is in front of Zulip.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess line 178 should be fine for that?

# Setup the version of NGINX which is being used for this installation.
# By default a NGINX version greater that 1.5.9 is assumed but if you are using
# a older version, it is necessary to specify it here.
# NGINX_VERSION = '1.4.6'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While django-sendfile doesn't want to auto-detect, I think we should auto-detect, because which nginx version is being used is mostly a function of whether it's Trusty or Xenial, and the software should know that. It's really fast to check if the server has nginx installed.

time nginx -v
nginx version: nginx/1.4.6 (Ubuntu)

real	0m0.009s
user	0m0.009s
sys	0m0.000s

What I'd do is something like this block in zproject/settings.py:

if settings.PRODUCTION and settings.LOCAL_UPLOADS_DIR:
    detected_nginx_version = subprocess.check_output(["nginx", "-v"]) # Needs further string processing to be correct.
else:
    detected_nginx_version = None
DEFAULT_SETTINGS["NGINX_VERSION"] = detected_nginx_version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timabbott I was worried about the case where nginx server is separate from Django server (maybe dockers is right example) then we might end up checking versions in wrong place where Nginx if installed might be different from actual Nginx serving the files.
What are your thoughts, should we just go ahead with checking versions as this might not a case that Zulip should worry about? (Having different servers for Nginx and Django also increase scope for scalability right?)
(Above example is same as what I initially did in PR for Django-sendfile and they suggested dockers as an example where this might be a problem)

@adnrs96
Copy link
Member Author

adnrs96 commented Apr 20, 2017

Let me think about the ways of adding tests.

@zulipbot
Copy link
Member

zulipbot commented May 8, 2017

@adnrs96, your pull request has developed a merge conflict! Please review the most recent commit (b8afd24) for conflicting changes and resolve your pull request's merge conflicts.

@zulipbot
Copy link
Member

zulipbot commented May 8, 2017

@adnrs96, your pull request has developed a merge conflict! Please review the most recent commit (d63fdf5) for conflicting changes and resolve your pull request's merge conflicts.

@adnrs96
Copy link
Member Author

adnrs96 commented May 13, 2017

@timabbott we cannot use same value of backend by setting it in zproject/test_settings because some of our tests expect a file in the response which is exactly what they get when backend is set to development but a NGINX backend will return a response with a X-Accel-Redirect header set which is checked by nginx when response reaches it using reverse proxies and then nginx replaces this response with the desired file but that cannot happen in testing environment.

Though upon further investigation it turns out that we can use the fact that importing same module in different files points to same copy of the imported module and reading django-sendfile code provided with the fact that there is a way to clear the cache it generated for backend settings, we can add testing with just minor modification to the approach used in function previously depicted.

PS: I have updated this PR with code to test whether correct headers are being returned. It might be ready for a merge now.

@adnrs96
Copy link
Member Author

adnrs96 commented May 13, 2017

Hold review on this one, looking into what broke.

@adnrs96
Copy link
Member Author

adnrs96 commented May 14, 2017

I investigated the issue, though i am unclear on calling on the culprit here(although i feel like Django's way of handling stuff might be) but the issue seems to originate due to the reason that Django Http Response class decodes and replace header value using latin-1 encoding when we are running things with python 3 and on the contrary it only decodes (using latin-1' encoding) and does not replace header value when things are run using python 2. Note that django-sendfile is already encoding header to utf-8 which is then passed onto django http response implementation which performs decoding actions.

Snippet from sendfile under consideration

response['X-Accel-Redirect'] = url.encode('utf-8')

Snippet from Django http responce implementation

if six.PY3:
                if isinstance(value, str):
                    # Ensure string is valid in given charset
                    value.encode(charset)
                else:
                    # Convert bytestring using given charset
                    value = value.decode(charset)
            else:
                if isinstance(value, str):
                    # Ensure string is valid in given charset
                    value.decode(charset)
                else:
                    # Convert unicode string to given charset
                    value = value.encode(charset)

This creates a difference in file names which are not ascii and apparently there seems no clear way to resolve this issue. This works good for python 2 but would fail in python 3 under non ascii file names.
Thoughts on how to proceed?

@adnrs96
Copy link
Member Author

adnrs96 commented May 14, 2017

ping @timabbott ^^^

This commit will be introducing support for using django-sendfile for
serving uploads (attachments) when using LocalStorage. For the dev
environment the django backend will be used but for the Production nginx
will be used to serve files but after django performs authentication
checks.

Fixes: zulip#320, zulip#291.
@timabbott
Copy link
Member

I feel like probably the right solution here is going to involve fixing Django. Is it possible to monkey-patch the function in question safely?

@rht
Copy link
Collaborator

rht commented May 17, 2017

@adnrs96 I tested with a unicode string,

>> s = u'πάντα χωρεῖ καὶ οὐδὲν μένει'
>> t = s.encode(s)
>> isinstance(t, str)
False  # True in py2
>> t.decode('latin-1')
'Ï\x80άνÏ\x84α Ï\x87Ï\x89Ï\x81εá¿\x96 καὶ οá½\x90δὲν μÎ\xadνει'
# '\xcf\x80\xce\xac\xce\xbd\xcf\x84\xce\xb1 \xcf\x87\xcf\x89\xcf\x81\xce\xb5\xe1\xbf\x96 \xce\xba\xce\xb1\xe1\xbd\xb6 \xce\xbf\xe1\xbd\x90\xce\xb4\xe1\xbd\xb2\xce\xbd \xce\xbc\xce\xad\xce\xbd\xce\xb5\xce\xb9' in py2```
>> t.decode('utf-8')
'πάντα χωρεῖ καὶ οὐδὲν μένει'

So it will be correct as long as the string is latin-1-encoded instead?
If so, you could further patch the django-sendfile lib to use that (at the expense of full utf-8).

@rht
Copy link
Collaborator

rht commented May 17, 2017

@timabbott it is because Django is complying to the HTTP/1.1 spec[1],

Words of *TEXT MAY contain characters from character sets other than
ISO-8859-1 [22] only when encoded according to the rules of RFC 2047[14]

The other choice for later, if we still want transferring filenames over http to have the unicode chars intact, could be to put the filename in the body instead (I'd have to survey the existing file-upload libs across the ecosystems of which choice they make, brb).

[1] https://www.ietf.org/rfc/rfc2616.txt

@timabbott
Copy link
Member

timabbott commented May 17, 2017

Let's open an issue with django-sendfile to see what they think. It's possible the fix here is that django-sendfile should be using the standard HTTP encoding, not UTF-8.

@rht
Copy link
Collaborator

rht commented May 17, 2017

The issue in question is johnsensible/django-sendfile#45 (it looks like it is better solved with johnsensible/django-sendfile#45 (comment), reusing Django's FileSystemStorage).

@adnrs96
Copy link
Member Author

adnrs96 commented May 31, 2017

I took a look into the FileSystemStorage approach and discovered that it won't be of any help to us. Basically the encoding issue is solved by FileSystemStorage by quoting the URL's it generates. Problem with that is we would be standing again at the place where NGINX 1.4.6 will not unquote the URL's and try to access invalid filenames. @rht
I guess this was the original situation that lead the django-sendfile drop support for NGINX <= 1.5.9 since it was either support python2/3 with unicode filenames by quoting url or support older version of NGINX (since django's responce class would be complying to the HTTP/1.1 spec[1] by keeping latin-1 support for headers). Supporting older version of NGINX has further two cases where we can choice to drop support for python 3 and still keep unicode support with python2 (This is what is currently happening in this PR, although unintentionally ).
I will try to make situation clear with following snippets:
This one depicts python 2 behaviour:

>>> s = 'π'
>>> s
'\xcf\x80'
>>> # We will use smart_text to convert to unicode
>>> smart_text(s).encode('latin-1') # To ensure we have char out of latin-1 range
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u03c0' in position 0: ordinal not in range(256)
>>> smart_text(s).encode('utf-8')
'\xcf\x80'
>>> print(smart_text(s).encode('utf-8')) # This is what is saved in x-accel-redirect header using py2
π
>>> 

Until now everything works fine since we have been dealing in python 2

This will depict py3 behaviour:

>>> s = 'π'
>>> s
'π'
>>> s.encode('latin-1') # To ensure we have char out of latin-1 range
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\u03c0' in position 0: ordinal not in range(256)
>>> s.encode('utf-8')
b'\xcf\x80'
>>> # Django decodes using latin-1 which leads to invalid filenames in python 3.
>>> print(s.encode('utf-8').decode('latin-1')) # This is what is saved in x-accel-redirect header using py3
�
>>> print(s.encode('utf-8'))  # We are bound to decode strings in py3 due to its method of representation.
b'\xcf\x80'
>>> print(s.encode('utf-8').decode('utf-8')) # This is what is required to support py3 with unicode
π
>>> 

The requirement to support py3 with Unicode can be achieved only by patching django in someway but that won't be possible unless django stops complying with the spec @rht has mentioned.(Which is unlikely to happen)
@timabbott This is the whole story I believe. The solution that I can see is:

  • Either we stop supporting unicode filenames and adopt policy of having only latin-1 filenames.
  • We find a way to patch Django (Eg. modify django-sendfile to extend the class httpresponce and override the method enforcing latin-1 decoding in case of py3.)
    What are your thoughts?

@zulipbot
Copy link
Member

Heads up @adnrs96, we just merged some commits (latest: e1f5a4e) that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/master branch and resolve your pull request's merge conflicts accordingly.

@zulipbot
Copy link
Member

Heads up @adnrs96, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/master branch and resolve your pull request's merge conflicts accordingly.

1 similar comment
@zulipbot-test
Copy link

Heads up @adnrs96, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/master branch and resolve your pull request's merge conflicts accordingly.

@rht
Copy link
Collaborator

rht commented Oct 11, 2017

ping

@adnrs96
Copy link
Member Author

adnrs96 commented Oct 11, 2017

Hey @rht I do have this in my mind and since Tim recently forked the django-sendfile, I believe this will start moving again soon. I am yet to look again at the django-sendfile to isolate a PR for making it work with py3 + Unicode's. As soon as that is achieved this can be updated.

@zulipbot
Copy link
Member

zulipbot commented Nov 21, 2017

Heads up @adnrs96, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/master branch and resolve your pull request's merge conflicts accordingly.

@timabbott
Copy link
Member

@adnrs96 since this PR needs a lot of updating and is pretty cluttered, I'm going to close it -- just open a new PR (and link to this one) when you have this ready for review.

@timabbott timabbott closed this Feb 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uploads world readable
7 participants