DirectToDiskFileUpload

= Handling of big file uploads directly to disk =
This work is based on the example at http://www.cherrypy.org/wiki/FileUpload adapted to CherryPy version 3.0.

== Main differences ==

 * Filter replaced by a tool, disabling cherrypy's request body processing
 * Default timeouts changed
 * Default request body size limit changed
 * Temporary file used by cgi.!FieldStorage changed to tempfile.!NamedTemporaryFile so as to avoid file copy after HTTP upload; this is very important when dealing with big files for speed and space efficiency reasons.

== The code ==
{{{
#!python

#!/usr/bin/python2.4

import cherrypy
import cgi
import tempfile
import os


__author__ = "Ex Vito"




class myFieldStorage(cgi.FieldStorage):
    """Our version uses a named temporary file instead of the default
    non-named file; keeping it visibile (named), allows us to create a
    2nd link after the upload is done, thus avoiding the overhead of
    making a copy to the destination filename."""
    
    def make_file(self, binary=None):
        return tempfile.NamedTemporaryFile()


def noBodyProcess():
    """Sets cherrypy.request.process_request_body = False, giving
    us direct control of the file upload destination. By default
    cherrypy loads it to memory, we are directing it to disk."""
    cherrypy.request.process_request_body = False

cherrypy.tools.noBodyProcess = cherrypy.Tool('before_request_body', noBodyProcess)


class fileUpload:
    """fileUpload cherrypy application"""
    
    @cherrypy.expose
    def index(self):
        """Simplest possible HTML file upload form. Note that the encoding
        type must be multipart/form-data."""
        
        return """
            <html>
            <body>
                <form action="upload" method="post" enctype="multipart/form-data">
                    File: <input type="file" name="theFile"/> <br/>
                    <input type="submit"/>
                </form>
            </body>
            </html>
            """
    
    @cherrypy.expose
    @cherrypy.tools.noBodyProcess()
    def upload(self, theFile=None):
        """upload action
        
        We use our variation of cgi.FieldStorage to parse the MIME
        encoded HTML form data containing the file."""
        
        # the file transfer can take a long time; by default cherrypy
        # limits responses to 300s; we increase it to 1h
        cherrypy.response.timeout = 3600
        
        # convert the header keys to lower case
        lcHDRS = {}
        for key, val in cherrypy.request.headers.iteritems():
            lcHDRS[key.lower()] = val
        
        # at this point we could limit the upload on content-length...
        # incomingBytes = int(lcHDRS['content-length'])
        
        # create our version of cgi.FieldStorage to parse the MIME encoded
        # form data where the file is contained
        formFields = myFieldStorage(fp=cherrypy.request.rfile,
                                    headers=lcHDRS,
                                    environ={'REQUEST_METHOD':'POST'},
                                    keep_blank_values=True)
        
        # we now create a 2nd link to the file, using the submitted
        # filename; if we renamed, there would be a failure because
        # the NamedTemporaryFile, used by our version of cgi.FieldStorage,
        # explicitly deletes the original filename
        theFile = formFields['theFile']
        os.link(theFile.file.name, '/tmp/'+theFile.filename)
        
        return "ok, got it filename='%s'" % theFile.filename


# remove any limit on the request body size; cherrypy's default is 100MB
# (maybe we should just increase it ?)
cherrypy.server.max_request_body_size = 0

# increase server socket timeout to 60s; we are more tolerant of bad
# quality client-server connections (cherrypy's defult is 10s)
cherrypy.server.socket_timeout = 60

cherrypy.quickstart(fileUpload())
}}}

== Possible Improvements ==

 * Maybe we don't need to lower case the headers for the cgi.!FieldStorage invocation ?
 * os.link will fail if the destination name already exists - should be handled somehow

== Final Notes ==

My python and cherrypy experience is limited. You are welcome to improve and/or correct the code and style.

Note: It seems `FieldStorage` will not use `make_file()` if the file size is small (eg, <=1000 bytes?), so the file might actually be a file-like object(eg, StringIO) instead. 

Note 2: You are correct, FieldStorage does not call 'make_file()' if the size of the file is < 1000 bytes. Workaround? I simply edit cgi.py as follows: 
 * Locate the function definition of "__write(self, line):"
 * Delete the line "if self.__file.tell() + len(line) > 1000:"
 * This or any other step will be absolutely necessary if you need files < 1000 bytes