Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for private phyloreferences #38

Open
gaurav opened this issue Jun 25, 2018 · 4 comments
Open

Add support for private phyloreferences #38

gaurav opened this issue Jun 25, 2018 · 4 comments

Comments

@gaurav
Copy link
Member

gaurav commented Jun 25, 2018

As we start curating phyloreferences from unpublished sources, we will need to store and process them somewhere without making them public. I'm currently experimenting with the following setup:

  • A private but shareable directory in Dropbox to store curated phyloreferences, allowing multiple people to modify or view these files as needed.
  • A directory in this repository into which those files can be copied or linked. These files will not be committed into the Github repository, but will be tested when the test suite is executed locally.

We could also set up a private Github repository just for the private phyloreferences. This is probably not needed for now, but if we ever have multiple curators or want to track multiple branches of PHYX files, that might be useful.

@hlapp
Copy link
Member

hlapp commented Jun 25, 2018

I think having two very different data store technologies (Github and Dropbox) is not a good proposition. We're too small for that. I think the viable possibilities include the following:

  1. A private Git repo. Especially if the need for this is time-limited (e.g., until the Phylonyms volume is published), this may well be viable. On the downside, assuming that we would phyloreferences that have no barrier to being public to be stored in a public place, now we need to manage two repos for storing phyloreferences, one for the ones that are public, and a private one for the others. This is the same whether the private Git repo is on Github (where we'd have to pay for it), or on Gitlab (where it would be free).

  2. Encrypting embargoed phyloreferences. Encryption is routinely used to store secrets in repos, such as credentials or systems-specific information needed for testing or deployment, and would allow us to keep everything in a single repo. Lifting the embargo on a phyloreference would merely be decrypting it (and perhaps removing it from the list of embargoed and thus to be encrypted entries).

@ncellinese
Copy link

ncellinese commented Jun 25, 2018 via email

@gaurav
Copy link
Member Author

gaurav commented Jun 26, 2018

I like the idea of a private Git repo. This repository would contain just the embargoed phyloreferences, and we could use Git submodules to make the second repository slightly easier to manage. I think this would be easy to set up, would hide filenames, and would reduce the possibility of someone forgetting to encrypt a file. I don't think we're going to have other embargoed phyloreferences after the Phylonym ones, so I think the low-setup cost (especially if the repository is free on Gitlabs!) here works in our favour. I have a Developer account on Github, so I could also host a private repository there if needed.

If we do want to use encryption within our existing repository, we can use BlackBox, git-secret or git-crypt to do that. The first two look best to me -- they have similar workflows in which the user manually unencrypts and encrypts files as needed. git-crypt is automated: files marked as encrypted are automatically decrypted when looking at them locally and then are automatically encrypted when changes are being committed. If we decide to go for encrypting individual files, I'll try out BlackBox on our existing repository unless anyone would like to recommend another tool.

Note that encrypting files won't hide the names of the file -- people looking at our Github repository will be able to see the list of phyloreferences we have curated (and so the list of clade definitions in Phylonym), but they won't be able to read the clade definitions or any other metadata about the phyloreference. If this is a problem, we could easily rename them to opaque filenames (e.g. phyloref_15.json) to hide this.

@hlapp
Copy link
Member

hlapp commented Jun 26, 2018

There seem to be numerous potential problems with maintaining two Git repos as phyloreference storage instead of one, and I think the notion that we won't have embargoed phyloreferences after the Phylonym volume publication is both naïve and risky (what do we do if it turns out to be false? is that really so unlikely?).

Also, we have made very good experiences here with git-crypt, so I suggest that this be looked into first. Manual encryption and decryption seems fragile and error-prone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants