Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize the edited-file check across separate Jobs #1594

Open
ag-eitilt opened this issue Jul 10, 2024 · 0 comments
Open

Generalize the edited-file check across separate Jobs #1594

ag-eitilt opened this issue Jul 10, 2024 · 0 comments
Labels
backlog Task that isn't actively being worked on enhancement New feature or request

Comments

@ag-eitilt
Copy link
Collaborator

Split from #1589: If a user edits a file output by Wake, from outside of Wake, before rerunning the same Job which originally generated it, Wake detects the difference in content hash and panics without overwriting the changes:

$ wake -x 'makePlan "test" Nil "echo -n test > test.txt" | runJob' ; echo edit >> test.txt ; wake -x 'makePlan "test" Nil "echo -n test > test.txt" | runJob'
Job 9428
PANIC: The hashcode of output file 'test.txt' has changed from 928b20366943e2afd11ebc0eae2e53a93bf177a4fcf35bcc64d503704e65e202 (when wake last ran) to d319334a830d32f8ab3b4c9d641360c834f712a36bfa6909277217569ea4ca33 (when inspected this time). Presumably it was hand edited. Please move this file out of the way. Aborting the build to prevent loss of your data.
$ grep edit test.txt
testedit

However, if a different second Job is used which would also overwrite that file, the hash check is bypassed and any edits lost:

$ rm -f test.txt
$ wake -x 'makePlan "test" Nil "echo -n test > test.txt" | runJob' ; echo edit >> test.txt ; wake -x 'makePlan "test" Nil "echo -n test with different contents > test.txt" | runJob'
Job 9515
Job 9528
$ grep edit test.txt
$ echo $?
1

Our determination from the meeting is that this is a known limitation as the check deliberately only iterates over the files from the last files of the Job rather than the much larger space of every file in the database, thus letting it be O(n) rather than O(n^2) -- or more accurately O(nm) where m>n.

It would still definitely be desirable to run the hash check, but to our knowledge the combined table of files maintained by the database doesn't contain sufficient information to run the check, and we definitely wouldn't want to iterate through every Job to get every one of their files. Marked backlog until someone gets a chance to look into the database specifically for a better algorithm over the current schema, or we have a database-breaking change otherwise queued up which adding the information required for a better algorithm can ride in on the coattails of.

@ag-eitilt ag-eitilt added enhancement New feature or request backlog Task that isn't actively being worked on labels Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Task that isn't actively being worked on enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant