Re: rsync with multiple threads

21 Feb 2017

      On Tue, 21 Feb 2017 04:58:52 PM Craig Sanders via luv-main wrote:
...
On Fri, Feb 17, 2017 at 06:25:38PM +1100, Joel W. Shea wrote:
...
Are you maxing out your disk/network bandwidth already?
This is key, IMO, to whether running multiple rsyncs in parallel is
worth it or not. Almost all of the time, rsync is going to be I/O
bound (disk and network) rather than CPU bound - so adding more rsync
processes is just going to slow them all down even more.  A single rsync
process can saturate the disk and I/O bandwidth of most common disk
subsystems and network connections.
If you have a RAID-1 array then you should be able to benefit from having as 
many processes as there are mirrors of the data for reading (IE the 
transmitting end and the receiver for updating previous data).

If you have a RAID-5 then you should get some benefits from multiple readers 
but it's not as easy to predict.  The same applies for command queuing in a 
single device, but for a much smaller benefit.

Linux does some queuing of requests and it's theoretically possible to get 
some benefits from multiple processes accessing a single disk at one time.  But 
the benefits will probably be small.

If you have a process that does some CPU operations as well as some IO there 
is potential for performance improvement from running multiple processes at 
once if nothing else is using the disk.  For example if the process is using 
10% CPU time and 90% iowait then you could get a 10% performance increase by 
using a second process as there will almost always be a process blocked on 
disk IO.

Apart from the case of 2 processes reading from a RAID-1 device the benefits 
from all these are small.  But for example if you want to transition a server 
to new hardware or a new DC in an 8 hour downtime window and the transfer 
looks like it will take 9 hours these are things you really want to do.
...
splitting up the transfer into multiple smaller rsync jobs to be
run consecutively, not simultaneously, can be useful....especially
if you intend to run the transfers multiple times to get
new/changed/deleted/etc files since the last run.  There's a lot
of startup overhead (and RAM & CPU usage) with rsync on every run,
comparing file lists and file timestamps and/or checksums to figure
out what needs to be transferred.  Multiple smaller transfers (e.g. of
entire subdirectory trees) tend to be noticably much faster than one
large transfer.
Yes, especially if you are running out of dentry cache.
...
in other words, multiple parallel rsyncs is usually a false
optimisation.
The thing that concerns me most about such things is the potential for 
mistakes.  For everything you do there is some probability of stuffing it up.  
Is the probability of a stuff-up a reasonable trade-off for a performance 
improvement?

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

Re: rsync with multiple threads

Russell Coker