Posts: 741
Threads: 122
Joined: Dec 2017
Hi,
My business is converting legacy files,lists,databases,images,pdfs,cards... etc.
into a format that can be searched by (genalogy) enthousiasts.
In doing this I create a lot of small files that need to be transferred to the server. (millions)
We are talking win 10/11 environment.
It is becoming a bottleneck, even using robocopy (in windows command mode)
My transfer speed is +/- 30.000-50.000 files /hour.
Note: the files are mostly very small like 25 kb. (small png images)
Any suggestions to speed up the process, given that I cannot change the hardware config.?
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Posts: 2,126
Threads: 11
Joined: May 2017
My first idea was to put everything in a database to get a single big file. But this will add overhead.
You could make on both sides a database only with metadata like path , mtime , size and hash . Before you transfer files, you update the database and after it, you transfer only the changed and new files. On the other side, you must delete files, which are removed from the source.
Posts: 4,790
Threads: 76
Joined: Jan 2018
Feb-15-2024, 11:01 AM
(This post was last modified: Feb-15-2024, 11:02 AM by Gribouillis.)
If you can install rsync in Windows, you could try to synchronize directories with rsync. This program transfer onlys the files that have changed.
« We can solve any problem by introducing an extra level of indirection »
Posts: 741
Threads: 122
Joined: Dec 2017
Let me reflect on these proposals.
rsync does not seem a solution, because all the files are new (and they don't change)
Making one big file might be better, but how?
Think, think, think ...
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Posts: 4,790
Threads: 76
Joined: Jan 2018
Feb-15-2024, 08:28 PM
(This post was last modified: Feb-15-2024, 08:29 PM by Gribouillis.)
(Feb-15-2024, 06:24 PM)DPaul Wrote: Making one big file might be better, but how? If you need a single big file, you can just compress the directory containing all the files and images and transfer the compressed file.
« We can solve any problem by introducing an extra level of indirection »
Posts: 741
Threads: 122
Joined: Dec 2017
It is becoming a real problem, so I'm ready to try anything
Compress you say. Not Zip?
- As long as the compressing does not take more time , when added to the transfer time.
- I will need to establish what size of batch I will transfer in 1 compression .
- I'll try with 25.000 first, then 50.000 and see if it is linear or not.
- Number of mb/gb must be proportional of course.
- and the total time must be < robocopy.
I'll report back.
thanks,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Posts: 741
Threads: 122
Joined: Dec 2017
Feb-16-2024, 07:48 AM
(This post was last modified: Feb-16-2024, 07:49 AM by DPaul.)
(Feb-15-2024, 08:28 PM)Gribouillis Wrote: compress the directory containing all the files and images and transfer the compressed file. I did a small preliminary test, before I try big transfers.
Batch = 11.629 png files = +/- 100 mb
1) Benchmark : ctrl-c , ctrl-v to SSD via usb3 => time 3 min 6 seconds.
2) send to -> zipping -> ctrl-c , ctrl-v. -> ssd.
The transfer is of course lightning fast, but then you need to unzip. The whole process takes 3 min 8 seconds.
Unless I am missing something, this is not very encouraging to continue.
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Posts: 8,160
Threads: 160
Joined: Sep 2016
Feb-18-2024, 06:42 AM
(This post was last modified: Feb-18-2024, 06:43 AM by buran.)
I don't how you create the files and how long it takes, but right now it looks like your workflow is to create all (or batch of) files and then move them at once and you clock that time. Is it plausible approach to monitor the folder(s) and transfer any new file right after it is created while more files are being created at the same time? Basically the destination folder will mirror source folder in [almost] real time.
Posts: 741
Threads: 122
Joined: Dec 2017
(Feb-18-2024, 06:42 AM)buran Wrote: transfer any new file right after it is created Yes, you are right. I did think of that, but:
The place where I create these "files", is not the same (physically)
as where the server is. Hence.
But as it is a real problem, and I if don't find another solution,
I'll have to do the development over there, at least for large batches.
The trick with the "send to" (zip), transfer, unzip would be OK,
but why does the unzip take so much more time than the zip?
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Posts: 8,160
Threads: 160
Joined: Sep 2016
(Feb-18-2024, 07:10 AM)DPaul Wrote: Yes, you are right. I did think of that, but:
The place where I create these "files", is not the same (physically)
as where the server is. Hence. I don't see why this would be an issue as long as they are on same network or otherwise establish connection between the two. Actually I never thought it's the same, hence the need to transfer...
|