Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files should be tracked and stored as first-class objects with their own attributes. #15

Open
tskluzac opened this issue Apr 23, 2020 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@tskluzac
Copy link
Collaborator

Can borrow some of this thinking from Skluma --

Files should be treated as first-class entities along with groups (and families). For instance, we no longer check whether files should be inflated (because in MDF, some files are pushed into workflows as deflated objects). Moreover, we're not tracking individual file size, etc, throughout.

The reason this is important is because we should want a more-granular bookkeeping of files in the system in order to start extending towards the use of other extractors. Is the file compressed? Decompress it. Is it near other interesting files? Check its context.

Each file should have its own object with keyword args:

  • file_id
  • file_name
  • content (e.g., the bytes)
  • default_inflated?
  • inflated_size
  • deflated_size
  • metadata

Then we should also create a separate database of files and their necessary metadata so we can answer questions like "how many of this type of file do you have?"

@tskluzac tskluzac added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Apr 23, 2020
@tskluzac
Copy link
Collaborator Author

I think much of this is handled in the file list as part of a Group object. I think the key here is adding inflated_size and deflated_size in the case of inflated data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant