
On Tue, 21 Feb 2017, Craig Sanders wrote:
On Fri, Feb 17, 2017 at 06:25:38PM +1100, Joel W. Shea wrote:
Are you maxing out your disk/network bandwidth already?
This is key, IMO, to whether running multiple rsyncs in parallel is worth it or not. Almost all of the time, rsync is going to be I/O bound (disk and network) rather than CPU bound - so adding more rsync processes is just going to slow them all down even more. A single rsync process can saturate the disk and I/O bandwidth of most common disk subsystems and network connections.
about the only time more rsync processes might help is if you're transferring between two servers with SSD storage arrays via a direct-connect 10+Gbps link....and even then, only if the disk + network throughput is at least a few multiples of what a single rsync job (incl. child processes for ssh and/or compression if any) can cope with.
or if the source AND destination of each of the multiple rsyncs are on completely separate disks/storage-arrays so they don't compete with each other for disk i/o. e.g. rsync from server1/disk1 to server2/disk1 can run at the same time as an rsync from server1/disk2 to server2/disk2...especially if you can use separate network interfaces for each rsync.
Not quite. It matters on read (which can be both sides when you're rewriting data), and not just for arrays with multiple spindles. A single rsync issues one read, the array does it's seek and finds the relevant spindles, and the reading rsync then sends that to the remote, which can cache and reorder as necessary when writing. If you have multiple independent rsyncs, then one rsync blocks on read, a second rsync blocks on read but its required data is closer to the heads, and a third rsync finds another disk in the array that the other rsyncs haven't made busy yet. You can get benefit running more rsyncs than the number of spindles because your block scheduler/raid controller/disk controller knows that one bit of data is closer than another, if there are multiple inflight scsi commands. For writes, you get no benefit unless rsync issues blocking fsyncs (I can't remember if it does - if I had to optimise for data transfer, I'd investigate this and consider using libeatmydata with the caveat that I'd need to manually rerun rsync in the event of a hardware fault soon after any transfers were run). -- Tim Connors