Python Forum

Full Version: best way to copy a big binary file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
i have a big file to copy. it could be a binary file. i have the source and target filenames. i might like to also show a progress bar. the file could be larger than RAM. what is the best and safest way to copy it? what is the best way to verify it was copied uncorrupted?
On Linux, use rsync
(Aug-26-2019, 07:36 PM)Larz60+ Wrote: [ -> ]On Linux, use rsync

+1
(Aug-26-2019, 07:36 PM)Larz60+ Wrote: [ -> ]On Linux, use rsync
how to do it within python with the progress handled by that code.
see: https://tylercipriani.com/blog/2017/07/0...in-python/
Python example with dissected rsync explanation
nice article. but my need gives rsync no advantage. the target does not yet exist. this is an in-host copy of a very large file such as a movie video. it may be copying between two different filesystems. it may be doing a first-time full backup (even rsync has to do this the first time). i am wondering how good copy code determines what is a good chunk size to better manage memory load, or if it uses (some module).sendfile() and how.
destination file doesn't have to exist. Have you tried it?
i've used rsync and many other programs. i have made code that does copying in many languages. and i know i did it right in C and Pike. i want to also do it in Python. i am not interested in calling some program to do it. i want to know if this is appropriate or what.
   while True:
        data = input_file.read(buffer_size)
        if not data:
            break
        output_file.write(data)
it would be a bad idea to call .read() without a size. the file might be larger than ram+swap.
(Sep-04-2019, 02:49 AM)Skaperen Wrote: [ -> ]i want to know if this is appropriate or what.
Something like that,shutil.copyfileobj already dos this.
Implemented as:
def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while True:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)
So default buffer a chunk of 16 KB in memory,it will be faster to read larger chunk if have a big file.
Output:
# 3.2 GB size file 50 KB: 29.539s 1 MB: 26.261s 10 MB: 25.521s 100 MB: 24.886s
The optimal buffer size ultimately depends on the amount of RAM you have available as well as the file size.
Speed depends also on hardware/media (HDD/SSD/EMMC/Flash) and used filesystem.
Smaller blocksize -> more IOPS
Bigger blocksize -> lesser IOPS

If you choose a too small blocksize, you lose speed.
If you choose a to big blocksize, you may lose also speed.

A good value is 64KiB - 512KiB.
A SSD can have a blocksizes of 4MiB.

The program disk dump (dd) uses a default blocksize of 512 bytes.
https://superuser.com/questions/234199/g...iskdump-dd
Pages: 1 2