Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic uploader #25

Open
wants to merge 37 commits into
base: master
Choose a base branch
from

Conversation

gerbyzation
Copy link

@gerbyzation gerbyzation commented Jan 31, 2018

As promised in #23 (and #2), here is my work so far on adding support for generic file uploads. It's a little messy, but getting there

  • Add ability to upload generic files such as logo's to cloud storage. In order to do so I have refactored a few things such as the upload method.
  • Add support for google storage
  • Make request fail if secure urls is set to true, but unable to generate one
  • Add some testing

I still want to do some more work on the tests. Still figuring out how to properly do testing in CKAN. Initially I borrowed the testing approach from s3filestore, but they have a fake backend so need to work out if those tests are actually of any use when mocked out. I also want to write some more tests to make sure certain parts behave as expected. Everything (but especially the tests) need an overall clean as well.

And sorry for the somewhat polluted commit history, was trying to get a docker setup going to make things easy, ended up with https://github.com/okfn/docker-ckan which is quick to setup.

If there's any feedback please let me know!

@gerbyzation gerbyzation mentioned this pull request Jan 31, 2018
# Returning None here will use the default Uploader.
return None
# Custom uploader for generic file uploads
print('get uploader')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debugging print

@TkTech
Copy link
Owner

TkTech commented Jan 31, 2018

Looking good, I'll give it a test this weekend. Do you know if there are any open-source implementations of Google's API that can be used for testing, such as Minio for S3?

@gerbyzation
Copy link
Author

Not that I'm aware, but the google storage libcloud driver is using the S3 compatible API (see https://libcloud.readthedocs.io/en/latest/storage/drivers/google_storage.html). Not exactly sure what version of the S3 api that would be, but I would guess this might work with minio as long as you disable secure urls.

Bit of a shame I'm only finding out about minio now, looks amazing and could probably have saved me some work!

@gerbyzation
Copy link
Author

I forgot to mention how to authenticate with GCP, which I need to write up for the docs as well. For the driver options, the key is the service account ID with access to the bucket, secret is the path to the JSON key of this service account. It also needs a third option, project, which is the google project ID.

@gerbyzation
Copy link
Author

@TkTech did you get a chance to test this?

@gerbyzation
Copy link
Author

I've pushed some updates. Most noteworthy is the fact that the use_secure_urls is being ignored for generic files. I don't think there's much point in making these files private, especially as it is bad for performance as the browser won't be able to cache any assets.

Also I've fixed some tests that weren't working.

Would be great if you get a chance to have a look and let me know what you think.

@TkTech
Copy link
Owner

TkTech commented Mar 15, 2018

Currently everywhere a generic resource can be uploaded, it can be retrieved without being authenticated. From a security perspective I'm okay with this. However, many buckets that are using secure URLs entirely disable non-secure access to the bucket (versus a specific path using the advanced ACL rules). It would be better to have use_secure_urls_for_generics (?) as an option instead.

Please remove the cover report, we don't want that in git.

Otherwise, definitely looking good! 👍

Google Cloud Storage doesn't allow folder-level permissions,
so to have private resources but public generic files
we need to set the ACL on upload.

CKAN's redirect is a 302 with headers disallowing caching,
for generic files we want caching so this adds a manual
permanent redirect with headers allowing caching.
@gerbyzation
Copy link
Author

@TkTech Thanks. I've added the option. So now use_secure_urls only applies to resource uploads, and use_secure_urls_for_generics to the generic files. The first setting could do with renaming for clarification, but that would of course break backwards compatibility.

In the last few commits I've implemented a custom 301 redirect that is cacheable for public generic files, to prevent asset reloading on every new page.

Another thing worth mentioning is google's ACL. Google Cloud Storage doesn't do folder level permissions, so when the resource files are private and generics public, I'm setting the ACL on generics on upload.

I'm going to deploy this on some of my CKAN instances to test the plugin. Given I don't find any bugs I'm pretty much finished with this PR. Only other thing left to do is update documentation.

@jqnatividad
Copy link

Hi @gerbyzation , @TkTech , cc @akariv
Was just wondering where this PR stands.
Thanks!

@TkTech
Copy link
Owner

TkTech commented Apr 3, 2019

It's usable with a small conflict, if you need it right away @gerbyzation's work is 👍

v2 of cloudstorage is coming in the next few days as part of work funded by TBS open-data, and is a complete rewrite.

@gerbyzation
Copy link
Author

@jqnatividad Been running my version with the generic uploader for static files for a year on a bunch of instances now, haven't had any issues.

@gerbyzation gerbyzation changed the title Add generic uploader (WIP) Add generic uploader Apr 12, 2019
@amercader
Copy link

@TkTech any update on the new cloudstorage version? We can definitely help test or work on missing pieces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants