
On Tue, 21 Feb 2017 04:58:52 PM Craig Sanders via luv-main wrote:
On Fri, Feb 17, 2017 at 06:25:38PM +1100, Joel W. Shea wrote:
Are you maxing out your disk/network bandwidth already?
This is key, IMO, to whether running multiple rsyncs in parallel is worth it or not. Almost all of the time, rsync is going to be I/O bound (disk and network) rather than CPU bound - so adding more rsync processes is just going to slow them all down even more. A single rsync process can saturate the disk and I/O bandwidth of most common disk subsystems and network connections.
If you have a RAID-1 array then you should be able to benefit from having as many processes as there are mirrors of the data for reading (IE the transmitting end and the receiver for updating previous data). If you have a RAID-5 then you should get some benefits from multiple readers but it's not as easy to predict. The same applies for command queuing in a single device, but for a much smaller benefit. Linux does some queuing of requests and it's theoretically possible to get some benefits from multiple processes accessing a single disk at one time. But the benefits will probably be small. If you have a process that does some CPU operations as well as some IO there is potential for performance improvement from running multiple processes at once if nothing else is using the disk. For example if the process is using 10% CPU time and 90% iowait then you could get a 10% performance increase by using a second process as there will almost always be a process blocked on disk IO. Apart from the case of 2 processes reading from a RAID-1 device the benefits from all these are small. But for example if you want to transition a server to new hardware or a new DC in an 8 hour downtime window and the transfer looks like it will take 9 hours these are things you really want to do.
splitting up the transfer into multiple smaller rsync jobs to be run consecutively, not simultaneously, can be useful....especially if you intend to run the transfers multiple times to get new/changed/deleted/etc files since the last run. There's a lot of startup overhead (and RAM & CPU usage) with rsync on every run, comparing file lists and file timestamps and/or checksums to figure out what needs to be transferred. Multiple smaller transfers (e.g. of entire subdirectory trees) tend to be noticably much faster than one large transfer.
Yes, especially if you are running out of dentry cache.
in other words, multiple parallel rsyncs is usually a false optimisation.
The thing that concerns me most about such things is the potential for mistakes. For everything you do there is some probability of stuffing it up. Is the probability of a stuff-up a reasonable trade-off for a performance improvement? -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/