Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Commit

Permalink
Merge pull request #254 from danielquinn/mcronce-disable_encryption
Browse files Browse the repository at this point in the history
Allow encryption to be disabled
  • Loading branch information
danielquinn authored Jun 17, 2018
2 parents d5876cc + 631d316 commit 3b72d38
Show file tree
Hide file tree
Showing 22 changed files with 391 additions and 115 deletions.
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,8 @@ target/

# Stored PDFs
media/documents/*.gpg
media/documents/thumbnails/*.gpg
media/documents/originals/*.gpg
media/documents/thumbnails/*
media/documents/originals/*

# Sqlite database
db.sqlite3
Expand Down
5 changes: 3 additions & 2 deletions docker-compose.env.example
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Environment variables to set for Paperless
# Commented out variables will be replaced by a default within Paperless.

# Passphrase Paperless uses to encrypt and decrypt your documents
PAPERLESS_PASSPHRASE=CHANGE_ME
# Passphrase Paperless uses to encrypt and decrypt your documents, if you want
# encryption at all.
# PAPERLESS_PASSPHRASE=CHANGE_ME

# The amount of threads to use for text recognition
# PAPERLESS_OCR_THREADS=4
Expand Down
29 changes: 29 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,35 @@
Changelog
#########

2.0.0
=====

This is a big release as we've changed a core-functionality of Paperless: we no
longer encrypt files with GPG by default.

The reasons for this are many, but it boils down to that the encryption wasn't
really all that useful, as files on-disk were still accessible so long as you
had the key, and the key was most typically stored in the config file. In
other words, your files are only as safe as the ``paperless`` user is. In
addition to that, *the contents of the documents were never encrypted*, so
important numbers etc. were always accessible simply by querying the database.
Still, it was better than nothing, but the consensus from users appears to be
that it was more an annoyance than anything else, so this feature is now turned
off unless you explicitly set a passphrase in your config file.

Migrating from 1.x
------------------

Encryption isn't gone, it's just off for new users. So long as you have
``PAPERLESS_PASSPHRASE`` set in your config or your environment, Paperless
should continue to operate as it always has. If however, you want to drop
encryption too, you only need to do two things:

1. Run ``./manage.py migrate && ./manage.py change_storage_type gpg unencrypted``.
This will go through your entire database and Decrypt All The Things.
2. Remove ``PAPERLESS_PASSPHRASE`` from your ``paperless.conf`` file, or simply
stop declaring it in your environment.

1.4.0
=====

Expand Down
3 changes: 2 additions & 1 deletion docs/consumption.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ The primary method of getting documents into your database is by putting them in
the consumption directory. The ``document_consumer`` script runs in an infinite
loop looking for new additions to this directory and when it finds them, it goes
about the process of parsing them with the OCR, indexing what it finds, and
encrypting the PDF, storing it in the media directory.
encrypting the PDF (if ``PAPERLESS_PASSPHRASE`` is set), storing it in the
media directory.

Getting stuff into this directory is up to you. If you're running Paperless
on your local computer, you might just want to drag and drop files there, but if
Expand Down
2 changes: 1 addition & 1 deletion docs/migrating.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Backing Up
----------

So you're bored of this whole project, or you want to make a remote backup of
the unencrypted files for whatever reason. This is easy to do, simply use the
your files for whatever reason. This is easy to do, simply use the
:ref:`exporter <utilities-exporter>` to dump your documents and database out
into an arbitrary directory.

Expand Down
23 changes: 13 additions & 10 deletions docs/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,17 +63,18 @@ Standard (Bare Metal)

1. Install the requirements as per the :ref:`requirements <requirements>` page.
2. Within the extract of master.zip go to the ``src`` directory.
3. Copy ``../paperless.conf.example`` to ``/etc/paperless.conf`` also the virtual
envrionment look there for it and open it in your favourite editor.
Because this file contains passwords it should only be readable by user root
and paperless ! Set the values for:
3. Copy ``../paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
your favourite editor. Because this file contains passwords it should only
be readable by user root and paperless! Set the values for:

* ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
dumped to be consumed by Paperless.
* ``PAPERLESS_PASSPHRASE``: this is the passphrase Paperless uses to
encrypt/decrypt the original document.
* ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
will spawn to process document pages in parallel.
* ``PAPERLESS_PASSPHRASE``: this is only required if you want to use GPG to
encrypt your document files. This is the passphrase Paperless uses to
encrypt/decrypt the original documents. Don't worry about defining this
if you don't want to use encryption (the default).

4. Initialise the SQLite database with ``./manage.py migrate``.
5. Create a user for your Paperless instance with
Expand Down Expand Up @@ -139,7 +140,8 @@ Docker Method

``PAPERLESS_PASSPHRASE``
This is the passphrase Paperless uses to encrypt/decrypt the original
document.
document. If you aren't planning on using GPG encryption, you can just
leave this undefined.

``PAPERLESS_OCR_THREADS``
This is the number of threads the OCR process will spawn to process
Expand Down Expand Up @@ -265,10 +267,11 @@ Vagrant Method
3. Run ``vagrant ssh`` and once inside your new vagrant box, edit
``/etc/paperless.conf`` and set the values for:

* ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
* ``PAPERLESS_CONSUMPTION_DIR``: This is where your documents will be
dumped to be consumed by Paperless.
* ``PAPERLESS_PASSPHRASE``: this is the passphrase Paperless uses to
encrypt/decrypt the original document.
* ``PAPERLESS_PASSPHRASE``: This is the passphrase Paperless uses to
encrypt/decrypt the original document. It's only required if you want
your original files to be encrypted, otherwise, just leave it unset.
* ``PAPERLESS_EMAIL_SECRET``: this is the "magic word" used when consuming
documents from mail or via the API. If you don't use either, leaving it
blank is just fine.
Expand Down
4 changes: 2 additions & 2 deletions docs/utilities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,8 @@ for documents to parse and index. The process is pretty straightforward:
4. Attempt to automatically assign document attributes by doing some guesswork.
Read up on the :ref:`guesswork documentation<guesswork>` for more
information about this process.
5. Encrypt the document and store it in the ``media`` directory under
``documents/originals``.
5. Encrypt the document (if you have a passphrase set) and store it in the
``media`` directory under ``documents/originals``.
6. Go to #1.


Expand Down
16 changes: 8 additions & 8 deletions paperless.conf.example
Original file line number Diff line number Diff line change
Expand Up @@ -59,19 +59,19 @@ PAPERLESS_EMAIL_SECRET=""
#### Security ####
###############################################################################

# You must have a passphrase in order for Paperless to work at all. If you set
# this to "", GNUGPG will "encrypt" your PDF by writing it out as a zero-byte
# file.
#
# The passphrase you use here will be used when storing your documents in
# Paperless, but you can always export them in an unencrypted format by using
# document exporter. See the documentation for more information.
# Paperless can be instructed to attempt to encrypt your PDF files with GPG
# using the PAPERLESS_PASSPHRASE specified below. If however you're not
# concerned about encrypting these files (for example if you have disk
# encryption locally) then you don't need this and can safely leave this value
# un-set.
#
# One final note about the passphrase. Once you've consumed a document with
# one passphrase, DON'T CHANGE IT. Paperless assumes this to be a constant and
# can't properly export documents that were encrypted with an old passphrase if
# you've since changed it to a new one.
PAPERLESS_PASSPHRASE="secret"
#
# The default is to not use encryption at all.
#PAPERLESS_PASSPHRASE="secret"


# The secret key has a default that should be fine so long as you're hosting
Expand Down
1 change: 1 addition & 0 deletions src/documents/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .checks import changed_password_check
39 changes: 39 additions & 0 deletions src/documents/checks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import textwrap

from django.conf import settings
from django.core.checks import Error, register
from django.db.utils import OperationalError


@register()
def changed_password_check(app_configs, **kwargs):

from documents.models import Document
from paperless.db import GnuPG

try:
encrypted_doc = Document.objects.filter(
storage_type=Document.STORAGE_TYPE_GPG).first()
except OperationalError:
return [] # No documents table yet

if encrypted_doc:

if not settings.PASSPHRASE:
return [Error(
"The database contains encrypted documents but no password "
"is set."
)]

if not GnuPG.decrypted(encrypted_doc.source_file):
return [Error(textwrap.dedent(
"""
The current password doesn't match the password of the
existing documents.
If you intend to change your password, you must first export
all of the old documents, start fresh with the new password
and then re-import them."
"""))]

return []
31 changes: 18 additions & 13 deletions src/documents/consumer.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ class Consumer:
Loop over every file found in CONSUMPTION_DIR and:
1. Convert it to a greyscale pnm
2. Use tesseract on the pnm
3. Encrypt and store the document in the MEDIA_ROOT
3. Store the document in the MEDIA_ROOT with optional encryption
4. Store the OCR'd text in the database
5. Delete the document and image(s)
"""
Expand All @@ -50,6 +50,10 @@ def __init__(self, consume=settings.CONSUMPTION_DIR,

os.makedirs(self.scratch, exist_ok=True)

self.storage_type = Document.STORAGE_TYPE_UNENCRYPTED
if settings.PASSPHRASE:
self.storage_type = Document.STORAGE_TYPE_GPG

if not self.consume:
raise ConsumerError(
"The CONSUMPTION_DIR settings variable does not appear to be "
Expand Down Expand Up @@ -213,7 +217,8 @@ def _store(self, text, doc, thumbnail, date):
file_type=file_info.extension,
checksum=hashlib.md5(f.read()).hexdigest(),
created=created,
modified=created
modified=created,
storage_type=self.storage_type
)

relevant_tags = set(list(Tag.match_all(text)) + list(file_info.tags))
Expand All @@ -222,22 +227,22 @@ def _store(self, text, doc, thumbnail, date):
self.log("debug", "Tagging with {}".format(tag_names))
document.tags.add(*relevant_tags)

# Encrypt and store the actual document
with open(doc, "rb") as unencrypted:
with open(document.source_path, "wb") as encrypted:
self.log("debug", "Encrypting the document")
encrypted.write(GnuPG.encrypted(unencrypted))

# Encrypt and store the thumbnail
with open(thumbnail, "rb") as unencrypted:
with open(document.thumbnail_path, "wb") as encrypted:
self.log("debug", "Encrypting the thumbnail")
encrypted.write(GnuPG.encrypted(unencrypted))
self._write(document, doc, document.source_path)
self._write(document, thumbnail, document.thumbnail_path)

self.log("info", "Completed")

return document

def _write(self, document, source, target):
with open(source, "rb") as read_file:
with open(target, "wb") as write_file:
if document.storage_type == Document.STORAGE_TYPE_UNENCRYPTED:
write_file.write(read_file.read())
return
self.log("debug", "Encrypting")
write_file.write(GnuPG.encrypted(read_file))

def _cleanup_doc(self, doc):
self.log("debug", "Deleting document {}".format(doc))
os.unlink(doc)
Expand Down
119 changes: 119 additions & 0 deletions src/documents/management/commands/change_storage_type.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
import os

from django.conf import settings
from django.core.management.base import BaseCommand, CommandError
from termcolor import colored as coloured

from documents.models import Document
from paperless.db import GnuPG


class Command(BaseCommand):

help = (
"This is how you migrate your stored documents from an encrypted "
"state to an unencrypted one (or vice-versa)"
)

def add_arguments(self, parser):

parser.add_argument(
"from",
choices=("gpg", "unencrypted"),
help="The state you want to change your documents from"
)
parser.add_argument(
"to",
choices=("gpg", "unencrypted"),
help="The state you want to change your documents to"
)
parser.add_argument(
"--passphrase",
help="If PAPERLESS_PASSPHRASE isn't set already, you need to "
"specify it here"
)

def handle(self, *args, **options):

try:
print(coloured(
"\n\nWARNING: This script is going to work directly on your "
"document originals, so\nWARNING: you probably shouldn't run "
"this unless you've got a recent backup\nWARNING: handy. It "
"*should* work without a hitch, but be safe and backup your\n"
"WARNING: stuff first.\n\nHit Ctrl+C to exit now, or Enter to "
"continue.\n\n",
"yellow",
attrs=("bold",)
))
__ = input()
except KeyboardInterrupt:
return

if options["from"] == options["to"]:
raise CommandError(
'The "from" and "to" values can\'t be the same.'
)

passphrase = options["passphrase"] or settings.PASSPHRASE
if not passphrase:
raise CommandError(
"Passphrase not defined. Please set it with --passphrase or "
"by declaring it in your environment or your config."
)

if options["from"] == "gpg" and options["to"] == "unencrypted":
self.__gpg_to_unencrypted(passphrase)
elif options["from"] == "unencrypted" and options["to"] == "gpg":
self.__unencrypted_to_gpg(passphrase)

@staticmethod
def __gpg_to_unencrypted(passphrase):

encrypted_files = Document.objects.filter(
storage_type=Document.STORAGE_TYPE_GPG)

for document in encrypted_files:

print(coloured("Decrypting {}".format(document), "green"))

old_paths = [document.source_path, document.thumbnail_path]
raw_document = GnuPG.decrypted(document.source_file, passphrase)
raw_thumb = GnuPG.decrypted(document.thumbnail_file, passphrase)

document.storage_type = Document.STORAGE_TYPE_UNENCRYPTED

with open(document.source_path, "wb") as f:
f.write(raw_document)

with open(document.thumbnail_path, "wb") as f:
f.write(raw_thumb)

document.save(update_fields=("storage_type",))

for path in old_paths:
os.unlink(path)

@staticmethod
def __unencrypted_to_gpg(passphrase):

unencrypted_files = Document.objects.filter(
storage_type=Document.STORAGE_TYPE_UNENCRYPTED)

for document in unencrypted_files:

print(coloured("Encrypting {}".format(document), "green"))

old_paths = [document.source_path, document.thumbnail_path]
with open(document.source_path, "rb") as raw_document:
with open(document.thumbnail_path, "rb") as raw_thumb:
document.storage_type = Document.STORAGE_TYPE_GPG
with open(document.source_path, "wb") as f:
f.write(GnuPG.encrypted(raw_document, passphrase))
with open(document.thumbnail_path, "wb") as f:
f.write(GnuPG.encrypted(raw_thumb, passphrase))

document.save(update_fields=("storage_type",))

for path in old_paths:
os.unlink(path)
Loading

0 comments on commit 3b72d38

Please sign in to comment.