Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
file transfer
#1
Hi,
My business is converting legacy files,lists,databases,images,pdfs,cards... etc.
into a format that can be searched by (genalogy) enthousiasts.
In doing this I create a lot of small files that need to be transferred to the server. (millions)
We are talking win 10/11 environment.
It is becoming a bottleneck, even using robocopy (in windows command mode)
My transfer speed is +/- 30.000-50.000 files /hour.
Note: the files are mostly very small like 25 kb. (small png images)
Any suggestions to speed up the process, given that I cannot change the hardware config.?
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#2
My first idea was to put everything in a database to get a single big file. But this will add overhead.

You could make on both sides a database only with metadata like path, mtime, size and hash. Before you transfer files, you update the database and after it, you transfer only the changed and new files. On the other side, you must delete files, which are removed from the source.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
If you can install rsync in Windows, you could try to synchronize directories with rsync. This program transfer onlys the files that have changed.
« We can solve any problem by introducing an extra level of indirection »
Reply
#4
Let me reflect on these proposals.
rsync does not seem a solution, because all the files are new (and they don't change)
Making one big file might be better, but how?
Think, think, think ...
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#5
(Feb-15-2024, 06:24 PM)DPaul Wrote: Making one big file might be better, but how?
If you need a single big file, you can just compress the directory containing all the files and images and transfer the compressed file.
« We can solve any problem by introducing an extra level of indirection »
Reply
#6
It is becoming a real problem, so I'm ready to try anything Smile
Compress you say. Not Zip?

- As long as the compressing does not take more time , when added to the transfer time. Rolleyes
- I will need to establish what size of batch I will transfer in 1 compression .
- I'll try with 25.000 first, then 50.000 and see if it is linear or not.
- Number of mb/gb must be proportional of course.
- and the total time must be < robocopy.
I'll report back.
thanks,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#7
(Feb-15-2024, 08:28 PM)Gribouillis Wrote: compress the directory containing all the files and images and transfer the compressed file.
I did a small preliminary test, before I try big transfers.
Batch = 11.629 png files = +/- 100 mb
1) Benchmark : ctrl-c , ctrl-v to SSD via usb3 => time 3 min 6 seconds.
2) send to -> zipping -> ctrl-c , ctrl-v. -> ssd.
The transfer is of course lightning fast, but then you need to unzip. The whole process takes 3 min 8 seconds.

Unless I am missing something, this is not very encouraging to continue.
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#8
I don't how you create the files and how long it takes, but right now it looks like your workflow is to create all (or batch of) files and then move them at once and you clock that time. Is it plausible approach to monitor the folder(s) and transfer any new file right after it is created while more files are being created at the same time? Basically the destination folder will mirror source folder in [almost] real time.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#9
(Feb-18-2024, 06:42 AM)buran Wrote: transfer any new file right after it is created
Yes, you are right. I did think of that, but:
The place where I create these "files", is not the same (physically)
as where the server is. Hence.
But as it is a real problem, and I if don't find another solution,
I'll have to do the development over there, at least for large batches.

The trick with the "send to" (zip), transfer, unzip would be OK,
but why does the unzip take so much more time than the zip?
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#10
(Feb-18-2024, 07:10 AM)DPaul Wrote: Yes, you are right. I did think of that, but:
The place where I create these "files", is not the same (physically)
as where the server is. Hence.
I don't see why this would be an issue as long as they are on same network or otherwise establish connection between the two. Actually I never thought it's the same, hence the need to transfer...
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020