Python Forum

When a file is uploaded from the client-side to the server, how can you save this uploaded file without it being read into memory (on a WSGI Python application)?

And also do this without using a third-party module or framework. The Python standard library has a cgi module that can parse POST form data including enctype="multipart/form-data" uploaded files. Here is a minimal code example:

Code snippet of the HTML form:

return """
    <!doctype html>
    <html>
        <h1>Upload new File</h1>
        <form method=post enctype=multipart/form-data>
            <input type=file name=file>
            <input type=submit value=Upload>
        </form>
    </html>
"""

Code snippet of Processing the uploaded file with Python:

    # Note: 'environ' is the variable that contains environment & request data that the WSGI server passes to the appliaction()

    import cgi
    field_storage = cgi.FieldStorage(
        fp=environ['wsgi.input'],
        environ=environ,
        keep_blank_values=True
    )

    for item in field_storage.list:

        # if it's a POST file
        if item.filename:
        
            storage_file_path = '/path/to/storage_dir/' + item.filename
            
            # Read the uploaded file
            file_content = item.file.read()
            
            # Save file
            with open(storage_file_path, 'wb') as file:
                file.write(file_content)

The problem with this though is the whole file content is read into memory before it saves the file because of the line file_content = item.file.read(). This is a problem because very large files that are uploaded will use too much memory/use up all the memory.

This problem can be fixed by using chunks, so that chunks of the file content are read into memory instead of the whole file content.

    import cgi
    field_storage = cgi.FieldStorage(
        fp=environ['wsgi.input'],
        environ=environ,
        keep_blank_values=True
    )
    
    for item in field_storage.list:
    
        # if it's a POST file
        if item.filename:
        
            storage_file_path = '/path/to/storage_dir/' + item.filename
    
            # Save file (in chunks - 100000 byte chunks)
            # Note: At the end of each iteration, the garbage collector will clear out the current chunk from the memory so you don't need to use 'del chunk' at end of loop
            with open(storage_file_path, 'wb') as file:
                while True:
                    chunk = item.file.read(100000)
                    if not chunk:
                        break
                    file.write(chunk)

This works, but is it possible to save an uploaded file without having to read any of the file contents into memory? Because there will still be a problem of using too much memory if many users upload a file at the same time. Any help appreciated.

Quote:When a file is uploaded from the client-side to the server, how can you save this uploaded file without it being read into memory (on a WSGI Python application)?

If you have uploaded the file to the server, it's already in memory. That's where it goes when uploaded!

Is it possible to make the uploaded file go straight to the hard drive instead of memory though?

A bit like how a swapfile works (uses space on a hard drive when the memory is fully utilised) but instead of using the memory at all for the uploaded file content, it uses the hard drive straight away without any file content being used in the memory.

(Nov-21-2019, 06:58 AM)andym118 Wrote: [ -> ]Is it possible to make the uploaded file go straight to the hard drive instead of memory though?

If the system is capable of sendfile(2), you can make a zero-copy from a socket to a file descriptor and reversed.
Here is a module, which supports it: https://pypi.org/project/pysendfile/
Under the hood they use mmap.

Another very important thing is following:

Quote:Also, it must be clear that the file can only be sent “as is” (e.g. you can’t modify the content while transmitting). There might be problems with non regular filesystems such as NFS, SMBFS/Samba and CIFS. For this please refer to proftpd documentation.

This means, that you can't modify the stream on the fly.

EDIT: It seems, that this has been implemented since Python 3.3: http://michaldul.com/python/sendfile/

andym118

Larz60+

andym118

DeaD_EyE