Retrofit of upload logic #178

kofalt · 2016-03-01T22:29:08Z

This constitutes a significant overhaul of file upload processing into logic that can be deduplicated, made consistent, and eventually optimized.

Two of four upload scenarios are covered by this change: targeted (files to a single container), and a new mode packfile (files to a single zip archive). Connector and engine uploads have not been modified, but clear groundwork is laid for that transition.

Relates #159.

gsfr · 2016-03-02T02:07:24Z

Ran out of time today. Will review tomorrow.

gsfr · 2016-03-02T16:31:39Z

api/upload.py

+        MUST send metadata about the files     |          |     X     |        |     X
+
+        Creates a packfile from uploaded files |          |           |        |     X
+    """


"connector" is a Flywheel marketing term. The correct scitran terminology is "reaper". Please change globally.

gsfr · 2016-03-02T16:49:34Z

Looks good overall. Little confused by the two commits.

I know we've talked about various optimization opportunities, but how usable is the current implementation? Is it usable for a realistic dataset on a realistic storage system (not an idle SSD)?

@rentzso: Please also take a look. If you see obvious de-dup opportunities, especially with your existing code, please note them in another ticket.

Maybe we could all get together on Friday to discuss next steps. Optimization, de-dup, schema unification.

kofalt · 2016-03-02T17:23:44Z

Little confused by the two commits.

Yeah, that was meant to be squashed. I'll leave it until merge to avoid disturbing the review threads.

I know we've talked about various optimization opportunities, but how usable is the current implementation? Is it usable for a realistic dataset on a realistic storage system (not an idle SSD)?

Not a clue; I think the easiest way to test is going to be in the browser. Alternatively, generating a very long curl command. This version doesn't allow repeated form fields - accessing a form field name that was repeated returns a list, and I don't handle that case - so it's not as easy as file@a, file@b, etc. I've passed that URL to Dan for testing.

gsfr · 2016-03-02T17:31:16Z

@dpuccetti: Please let us know what you find?

rentzso · 2016-03-02T20:13:37Z

api/upload.py

+    if strategy == Strategy.targeted and len(file_fields) > 1:
+        raise Exception("Targeted uploads can only send one file")
+
+    for field in file_fields:


Are we processing all the file information in the metadata?
It should be possible in the engine to send metadata about a file without sending the binary.

That's a great point.

The two use cases hooked up in this PR, Targeted and Packfile, will work as designed - they will always send files. When I hook up Engine - ref #159 - the metadata processing should probably be in EnginePlacer.check() or EnginePlacer.check() such that metadata processing happens even if no files were uploaded.

Summary - no change required now, will be kept in mind for later in #159.

Sounds good. Just thinking that a more appropriate name for Placer.check could be Placer.initialize.

kofalt · 2016-03-02T23:02:37Z

All comments addressed; rebased and squashed. Ready for merge.

rentzso · 2016-03-02T23:28:52Z

LGTM
Please consider renaming Placer.check to Placer.initialize

gsfr · 2016-03-03T02:33:05Z

Shouldn't the check() code simply go into __init__()?

kofalt · 2016-03-03T16:30:40Z

I chose not to do that because I need control over when that function is called, which might become important to accomplish a few things in #159. This reduces the likelihood that I'll have to break the interface while Renzo is working on things, etc.

I'll add a point on #159 at the end to consider changing up the naming and such.

rentzso · 2016-03-03T23:01:01Z

api/placer.py

+        }
+
+        # Get or create a session based on the hierarchy and provided labels.
+        s = {


just noticing that the packfile placer is not setting the group on the session. This is a required field as from the datamodel.

Wait... okay help me reconcile that with the schemas.
The input session schema has no group key, but the mongo session schema does?

Are you asking that I grab the group ID from the project object and place it on the session object? Should I also do that to the acquisition?

yeah you need to get the group id from the project object. The input session schema doesn't have the group so that you just need the parent project to create a session inside it. I don't remember the reason why we need to store the group id in the session (Cc @gsfr) but it is a required field.

The acquisition doesn't need any other key.

I don't remember the reason why we need to store the group id in the session (Cc @gsfr) but it is a required field.

The field is needed to easily filter the sessions list by group.

kofalt · 2016-03-04T18:45:24Z

@rentzso Check out 4fdbb93, which adds group id to the upserted session.
If that looks good to you then I'll merge :)

rentzso · 2016-03-04T18:48:04Z

LGTM!

Retrofit of upload logic

kofalt force-pushed the upload-unification branch 2 times, most recently from 1cccadb to c79eda2 Compare March 1, 2016 23:08

gsfr reviewed Mar 2, 2016
View reviewed changes

rentzso reviewed Mar 2, 2016
View reviewed changes

Retrofit of upload logic

26b83a1

kofalt force-pushed the upload-unification branch from 0fe75d1 to 26b83a1 Compare March 2, 2016 23:01

kofalt self-assigned this Mar 3, 2016

Move endpoint, authorize, copy permissions

3ad7566

rentzso reviewed Mar 3, 2016
View reviewed changes

Add group id to upserted session

4fdbb93

kofalt added a commit that referenced this pull request Mar 4, 2016

Merge pull request #178 from scitran/upload-unification

4402735

Retrofit of upload logic

kofalt merged commit 4402735 into master Mar 4, 2016

kofalt deleted the upload-unification branch March 4, 2016 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrofit of upload logic #178

Retrofit of upload logic #178

kofalt commented Mar 1, 2016

gsfr commented Mar 2, 2016

gsfr Mar 2, 2016

kofalt Mar 2, 2016

gsfr commented Mar 2, 2016

kofalt commented Mar 2, 2016

gsfr commented Mar 2, 2016

rentzso Mar 2, 2016

kofalt Mar 2, 2016

rentzso Mar 2, 2016

kofalt commented Mar 2, 2016

rentzso commented Mar 2, 2016

gsfr commented Mar 3, 2016

kofalt commented Mar 3, 2016

rentzso Mar 3, 2016

kofalt Mar 4, 2016

rentzso Mar 4, 2016

gsfr Mar 4, 2016

kofalt commented Mar 4, 2016

rentzso commented Mar 4, 2016

Retrofit of upload logic #178

Retrofit of upload logic #178

Conversation

kofalt commented Mar 1, 2016

gsfr commented Mar 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gsfr commented Mar 2, 2016

kofalt commented Mar 2, 2016

gsfr commented Mar 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kofalt commented Mar 2, 2016

rentzso commented Mar 2, 2016

gsfr commented Mar 3, 2016

kofalt commented Mar 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kofalt commented Mar 4, 2016

rentzso commented Mar 4, 2016