Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specification for File Naming Codes and File Repository Structure #48

Open
da2ce7 opened this issue Oct 7, 2021 · 3 comments · May be fixed by #49
Open

Specification for File Naming Codes and File Repository Structure #48

da2ce7 opened this issue Oct 7, 2021 · 3 comments · May be fixed by #49
Assignees
Labels
specification Generalisable Aspects of the Project

Comments

@da2ce7
Copy link
Contributor

da2ce7 commented Oct 7, 2021

Define a more complete specification for the file naming, and file repository folder structure conventions.

I have opened a draft pull request #49
This pull request has a rework of the current documentation. However it isn't a formal specification.

@da2ce7 da2ce7 linked a pull request Oct 7, 2021 that will close this issue
@josecelano
Copy link
Member

Hi @da2ce7 I was thinking about it when I was working on the first version of the filename validation: https://github.com/Nautilus-Cyberneering/chinese-ideographs/blob/main/.github/actions/validate-filenames/src/validation/validate_filenames.py

For this first milestone (POC01) I decided to keep it simple but I can share my initial thoughts about this topic. There are at least two obvious implementations and things to discuss:

Basic implementations

  1. Regular expressions
  2. Custom string parser (approximately what I did)

Requirements (how friendly we want the validation)

Regarding filename validation, I think, we first need to define the requirements. This is the simplest format for a basic filename:

{ARTWORK_ID}-{PURPOSE_CODE}.{TRANSFORMATION_CODE}.{TYPE_CODE}.{EXTENSION}

And a concrete example is: 000001-32.600.2.tif

The simplest error handling could be for example:

`000001-.600.2.tif` -> error in filename
`000001-32..2.tif` -> error in filename
`000001-999.600.2.tif` -> error in filename
`000001-32.999.2.tif` -> error in filename

But I wanted to have a better error scope like this:

`000001-.600.2.tif` -> error in filename: missing purpose_code at char number 6
`000001-32..2.tif` -> error in filename: missing transformation code at char number 9
`000001-999.600.2.tif` -> error in filename: invalid purpose_code at char 6. Valid codes are: bla, bla, bla
`000001-32.999.2.tif` -> error in filename: transformation at char number 9. Valid codes are: bla, bla, bla

We also need to know how we want to process the input string. I mean, we could validate the string and fail at the first error or we could try to generate a full report with all errors. I think the second one it's more complicated to implement and I do not think we need it.

Readability and maintainability

Regarding the obvious solution (RegExps) I think we could easily build it and we could try to write it in a way that is understandable but I'm not sure if we can get the exact error for the validation error (I suppose it's possible).

On the other hand, I think defining a formal grammar could help to define and maintain the rules. I wanted to have something like this:

Filename: artwork_if-(image|metadata|index)
image:transformation*.extenstion
transformation: purpose_code.transformation_code
purpose_code: (32|42)
transformation_code: (600,700)
extension:tif
...

I think with that kind of grammar we could create a parser and get the exact syntax error.

If we try this kind of solution there are plenty of options out there: https://tomassetti.me/parsing-in-python/

Links

Maybe there is another simplest solution. I stopped at that point in order to discuss which solution we need and in case we consider this a good one, when we should start with the implementation.

@josecelano
Copy link
Member

I've seen your PR now. Maybe you only wanted to define the complete specification but "by example" and not in a more formal way. Is that the intention for this issue? or do you think It could also be a good idea to write a formal specification?

@da2ce7
Copy link
Contributor Author

da2ce7 commented Oct 7, 2021

@josecelano

I've seen your PR now. Maybe you only wanted to define the complete specification but "by example" and not in a more formal way. Is that the intention for this issue? or do you think It could also be a good idea to write a formal specification?

For the moment I have taken "by example" approach. I think that I have included enough examples to illustrate the basic idea and form of the naming scheme proposed.

Later, a formal specification, (including some sort of abstract syntax tree template that formally defines the codes), would of course be a very welcome development. It is normal for such a specification to be written after the initial implementation.

@da2ce7 da2ce7 self-assigned this Oct 7, 2021
@da2ce7 da2ce7 added the specification Generalisable Aspects of the Project label Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
specification Generalisable Aspects of the Project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants