Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Read-Only archive-like file system backend #1649

Closed
mikedolx opened this issue Jul 12, 2022 · 4 comments
Closed

[Feature Request] Read-Only archive-like file system backend #1649

mikedolx opened this issue Jul 12, 2022 · 4 comments

Comments

@mikedolx
Copy link

mikedolx commented Jul 12, 2022

Hi,

as discussed in the matrix-chat, here is my feature request.

User story

I want to have a copy of my docspell documents stored within a file system, that is outside of docspell.

Export

All docspell documents shall be exported either periodically or directly after a successful import.

Storage

Documents shall be stored within the folders, that are provided by the item property "folder".

File names

Documents are named by defining an export file syntax. This syntax contains placeholder that are replaced by docspell during the export.
File names must be sanitized to match the target OS criteria in regards to illegal characters (see also: https://en.m.wikipedia.org/wiki/Filename).

Example:

# syntax
{item.date:yyyy-mm-dd} - {item.name} -- {item.tags}.pdf

# result
2022-01-14 - invoice_air_conditioner -- invoice,  paid, Bobs AC Company.pdf

Notes on the export syntax

  • the item properties in the listing are examples. The real docspell syntax might be different.
  • the item date can be formatted by using common placeholder for date and time (yyyy, mm, hh).
  • the item name is sanitized
  • the tags property in this example is an array. During serialization the values of the tags array are concatenated by a predefined seperator (example comma, semi-colon etc.). The seperator is configurable via UI or configuration file.

Updates

Changes in docspell items shall be reflected to the file system.
Example: An item has been tagged with a tag. According to the export file syntax this tag needs to be persitent in the file name. Docspell updates the exported file name in the export file system.

  • updates are always propagated from docspell to the filesystem
  • updates that are made in the file system are not propagated to docspell. Hence, the idea to make the file system read-only.

Item document export

  • As each item can have multiple documents, all documents shall be exported by using the export syntax as listed above.
  • If there are multiple versions of a document. The latest version is exported.
  • in any case - only the docspell generated pdf is exported. Either if the source document is of non-PDF type (csv, docx, xlsx) or the source document is a PDF (it might be missing the OCR-layer).

Am i missing something?

I am aware, that the dsc tool might provide a similar functionality. Nevertheless, i still think such a feature would be valueable to have integrated into docspell. Especially, the automatic file renaming is of high interest for me.

@eikek
Copy link
Owner

eikek commented Jul 13, 2022

Hi @mikedolx , thank you for the detailed description. It's only that I'm afraid it is a very different request from what I understood in the chat!

This is a big effort and then I'm not sure I would like it this way. There are some subtleties to it: it requires to keep track of the location of every file, so it can be renamed on metadata change. Docspell consists of possibly many servers, possibly comprised of multiple machines. It must be decided where actually to put it. But the nodes are "equal" which means it is not deterministic which one will do any long-running job. I don't think it is feasible to integrate this into the docspell application.

I currently think, this is better solved by a client application, like dsc for example. There is this ticket already: docspell/dsc#114 (original #1262). With this (and maybe together with the other similar requests), I think a very close feature is possible - it would mean to run this export periodically or you could use webhooks to propagate changes to trigger the "export". The advantage is that it is not restricted to run on the server, but you could choose the machine where the "export" is being done, and it could run multiple times etc.

@mikedolx
Copy link
Author

mikedolx commented Aug 9, 2022

Hi @eikek, my apoloogies for the late response.

I understand your argument. And in addition, it seems that the architecture if docspell is more complex than i thought. Therefore you may want to close this as "not feasible" or else.

Nevertheless, any automayed way of renaming the files in docspell would be beneficial for me, when i implement the export using the dsc tool. I think i have seen an open feature request describing this. I might comment on that.

Thanks

@eikek
Copy link
Owner

eikek commented Aug 10, 2022

Hi @mikedolx - no worries for being late! It is true that being a distributed architecture does make some things more complex. It has its own pros, but also cons obviously. There is no free lunch as they say…. :-) Theoretically this could be done, but it would mean that the feature is only for single machines and only for filesystem backends. The effort is big and changes would impact a lot of places even when the feature is not being used.

Doing it at the client side has the disadvantage of having two copies of the files, one in the database (or s3, internal for docspell) and the exported file structure. But I think space is not such a big problem. It may also allow for other useful scenarios, like running this on multiple machines. For dsc the ticket in question is docspell/dsc#114 - please comment there if you like.

@eikek eikek closed this as completed Aug 10, 2022
@madduck
Copy link
Contributor

madduck commented Sep 10, 2023

Linking #2270

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants