Python Forum
best way to copy a big binary file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
best way to copy a big binary file
#1
i have a big file to copy. it could be a binary file. i have the source and target filenames. i might like to also show a progress bar. the file could be larger than RAM. what is the best and safest way to copy it? what is the best way to verify it was copied uncorrupted?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
On Linux, use rsync
Reply
#3
(Aug-26-2019, 07:36 PM)Larz60+ Wrote: On Linux, use rsync

+1
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#4
(Aug-26-2019, 07:36 PM)Larz60+ Wrote: On Linux, use rsync
how to do it within python with the progress handled by that code.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
see: https://tylercipriani.com/blog/2017/07/0...in-python/
Python example with dissected rsync explanation
Reply
#6
nice article. but my need gives rsync no advantage. the target does not yet exist. this is an in-host copy of a very large file such as a movie video. it may be copying between two different filesystems. it may be doing a first-time full backup (even rsync has to do this the first time). i am wondering how good copy code determines what is a good chunk size to better manage memory load, or if it uses (some module).sendfile() and how.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#7
destination file doesn't have to exist. Have you tried it?
Reply
#8
i've used rsync and many other programs. i have made code that does copying in many languages. and i know i did it right in C and Pike. i want to also do it in Python. i am not interested in calling some program to do it. i want to know if this is appropriate or what.
   while True:
        data = input_file.read(buffer_size)
        if not data:
            break
        output_file.write(data)
it would be a bad idea to call .read() without a size. the file might be larger than ram+swap.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#9
(Sep-04-2019, 02:49 AM)Skaperen Wrote: i want to know if this is appropriate or what.
Something like that,shutil.copyfileobj already dos this.
Implemented as:
def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while True:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)
So default buffer a chunk of 16 KB in memory,it will be faster to read larger chunk if have a big file.
Output:
# 3.2 GB size file 50 KB: 29.539s 1 MB: 26.261s 10 MB: 25.521s 100 MB: 24.886s
The optimal buffer size ultimately depends on the amount of RAM you have available as well as the file size.
Reply
#10
Speed depends also on hardware/media (HDD/SSD/EMMC/Flash) and used filesystem.
Smaller blocksize -> more IOPS
Bigger blocksize -> lesser IOPS

If you choose a too small blocksize, you lose speed.
If you choose a to big blocksize, you may lose also speed.

A good value is 64KiB - 512KiB.
A SSD can have a blocksizes of 4MiB.

The program disk dump (dd) uses a default blocksize of 512 bytes.
https://superuser.com/questions/234199/g...iskdump-dd
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  how would you copy a file? Skaperen 17 4,990 Oct-04-2019, 07:15 AM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020