
From: "James Harper" <james@ejbdigital.com.au>
Rsync integration
Not claimed - no patches yet - Not in kernel yet
Now that we have code to efficiently find newly updated files, we need to tie it into tools such as rsync and dirvish. (For bonus points, we can even allow rsync to use btrfs's builtin checksums and, when a file has changed, tell rsync _which blocks_ inside that file have changed. Would need to work with the rsync developers on that one.)
Update rsync to preserve NOCOW file status.
Means: Make rsync work like btrfs send/receive;-) and put filesystem specific code in it. I am not sure whether this is a great idea. Most of the time you will have the same filesystem on both ends. Then you can use zfs/btrfs etc. tools. Or rsync if it's not a COW system. It is more "code polluting" than it's worth I think. And if btrfs send/receive isn't stable there is a good chance to implement an unstable rsync as well. [Slightly polemic: Later assume that rsync is running on btrfs and make it a requirement;-) (See Unix desktops which became Linux/systemd/upstart/udev/dbus/hald.. only.)] Regards Peter

From: "James Harper" <james@ejbdigital.com.au>
Rsync integration
Not claimed - no patches yet - Not in kernel yet
Now that we have code to efficiently find newly updated files, we need to tie it into tools such as rsync and dirvish. (For bonus points, we can even allow rsync to use btrfs's builtin checksums and, when a file has changed, tell rsync _which blocks_ inside that file have changed. Would need to work with the rsync developers on that one.)
Update rsync to preserve NOCOW file status.
Means: Make rsync work like btrfs send/receive;-) and put filesystem specific code in it.
Not really... it's just putting code in rsync to get existing metadata from the filesystem rather than calculating the metadata itself. I'm just testing out some of the deduplication stuff in btrfs, and was actually a little shocked to find it calculating the hashes itself. btrfs already has checksums, and if nothing else it could have used them to trivially reject blocks that are different before calculating a stronger hash. There is talk about exposing the btrfs checksums to userspace, but of course that puts contraints on further development as they now have to consider userspace compatibility. It would be a huge speedup for dedup though.
I am not sure whether this is a great idea.
Most of the time you will have the same filesystem on both ends. Then you can use zfs/btrfs etc. tools. Or rsync if it's not a COW system.
It is more "code polluting" than it's worth I think.
Maybe. Depends on the speedup. In a lot of cases, the above optimisations would speed up the processing that rsync has to do, but if 90% of the time taken in your rsync was actually moving data then you're never going to get anymore than 10% faster. For LAN links though, I normally just use -W for rsync because computing changes just adds overhead (I mean you have to read the file at both ends anyway, and unless your disk can pull data faster than 1GByte/second you're not going to saturate your 10GBit /second link so don't bother computing changes. If you got the change computation "for free", then it's a big win.
And if btrfs send/receive isn't stable there is a good chance to implement an unstable rsync as well.
(I think) Russell was supposing that there weren't many bugs reported for send/receive because not many people were using it. I'm not sure how we got from there to "send/receive isn't stable". But yes, new code has bugs :) James
participants (2)
-
James Harper
-
Peter Ross