-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrofit of upload logic #178
Conversation
1cccadb
to
c79eda2
Compare
Ran out of time today. Will review tomorrow. |
MUST send metadata about the files | | X | | X | ||
|
||
Creates a packfile from uploaded files | | | | X | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"connector" is a Flywheel marketing term. The correct scitran terminology is "reaper". Please change globally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Looks good overall. Little confused by the two commits. I know we've talked about various optimization opportunities, but how usable is the current implementation? Is it usable for a realistic dataset on a realistic storage system (not an idle SSD)? @rentzso: Please also take a look. If you see obvious de-dup opportunities, especially with your existing code, please note them in another ticket. Maybe we could all get together on Friday to discuss next steps. Optimization, de-dup, schema unification. |
Yeah, that was meant to be squashed. I'll leave it until merge to avoid disturbing the review threads.
Not a clue; I think the easiest way to test is going to be in the browser. Alternatively, generating a very long curl command. This version doesn't allow repeated form fields - accessing a form field name that was repeated returns a list, and I don't handle that case - so it's not as easy as |
@dpuccetti: Please let us know what you find? |
if strategy == Strategy.targeted and len(file_fields) > 1: | ||
raise Exception("Targeted uploads can only send one file") | ||
|
||
for field in file_fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we processing all the file information in the metadata?
It should be possible in the engine to send metadata about a file without sending the binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point.
The two use cases hooked up in this PR, Targeted and Packfile, will work as designed - they will always send files. When I hook up Engine - ref #159 - the metadata processing should probably be in EnginePlacer.check()
or EnginePlacer.check()
such that metadata processing happens even if no files were uploaded.
Summary - no change required now, will be kept in mind for later in #159.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Just thinking that a more appropriate name for Placer.check
could be Placer.initialize
.
0fe75d1
to
26b83a1
Compare
All comments addressed; rebased and squashed. Ready for merge. |
LGTM |
Shouldn't the |
I chose not to do that because I need control over when that function is called, which might become important to accomplish a few things in #159. This reduces the likelihood that I'll have to break the interface while Renzo is working on things, etc. I'll add a point on #159 at the end to consider changing up the naming and such. |
} | ||
|
||
# Get or create a session based on the hierarchy and provided labels. | ||
s = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noticing that the packfile placer is not setting the group on the session. This is a required field as from the datamodel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait... okay help me reconcile that with the schemas.
The input session schema has no group key, but the mongo session schema does?
Are you asking that I grab the group ID from the project object and place it on the session object? Should I also do that to the acquisition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you need to get the group id from the project object. The input session schema doesn't have the group so that you just need the parent project to create a session inside it. I don't remember the reason why we need to store the group id in the session (Cc @gsfr) but it is a required field.
The acquisition doesn't need any other key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember the reason why we need to store the group id in the session (Cc @gsfr) but it is a required field.
The field is needed to easily filter the sessions list by group.
LGTM! |
This constitutes a significant overhaul of file upload processing into logic that can be deduplicated, made consistent, and eventually optimized.
Two of four upload scenarios are covered by this change: targeted (files to a single container), and a new mode packfile (files to a single zip archive). Connector and engine uploads have not been modified, but clear groundwork is laid for that transition.
Relates #159.