On Fri, Jul 24, 2015 at 09:35:51PM +1000, Russell Coker wrote:
Running low on
space (>10%) also really hurt performance badly (my
limited experience with zfs is that it hangs badly when free space
gets under about 20%) [...]
I haven't noticed such problems on either filesystem. I think that
it depends on what your usage is, maybe my usage happens to miss the
corner cases where lack of space causes performance problems.
i've used zfs extensively since at least 2011 and i've run into this
several times. When the pool gets over about 80% full, performance turns
to shit...really, awfully, abysmally bad for both reads and writes.
in fact, i ran into it again this week with my backup pool here at home.
i'd foolishly allowed it to get to 87% full and performance was abysmal,
on the order of kilobytes/second rather than 200+MB/s.
i had to replace the 4x1TB drives (in RAIDZ1 configuration) in my backup
pool with 4x4TB (configured as two mirrored pairs).
i've found that once the pool gets over 80%, it gets slower the more
it's used, even fairly light usage over a few hours will make it slow
to a crawl...and you can forget about rsync to or from a fs > 80% full
(oddly, a zfs send to another pool runs reasonably fast), and daily cron
jobs like updating the mlocate db tend to just hang before completion.
the only solution is to add more vdevs to the pool, replace all the
disks in one or more vdevs with larger disks, or create a new pool with
larger/more disks and replicate to it with 'zfs send -R'
btrfs probably handles this slightly better because it's easier to add
a single disk or two to increase capacity...and, of course, you can
rebalance to redistribute your data evenly over all disks in the pool.
btw, 'zfs send' is so aweseome that i'm seriously considering converting
all my systems here to having root on zfs - i backup some filesystems
with zfs send and some with rsync, and backing up a system with zfs
send is many orders of magnitude faster than rsync because zfs knows
*exactly* which blocks have changed between any two snapshots without
having to stat or compare any files on source or destination.
same goes for btrfs too (in fact, from what i've read btrfs has more
flexible snapshot handling for send/receive) but since my backup pool
is zfs, btrfs isn't really an option for me. i could convert to btrfs
but i can't afford the extra disks or the time or hassle it would take -
rebalancing and resizing are tempting features but not tempting enough.
anyway: IMO, snapshots, error detection & correction, and send/receive
capabilty are more than enough reasons to use either btrfs or zfs. if
you're not already using one of them, you should seriously consider
switching.
> The problem I had is that I had mythtv storage in
a subvolume, and
> the only
actually, i was wrong earlier when i said there were only three options
to fixing a fs over 80% full. You can also delete files to get it back
well below 80% again. i had to do this on my mythtv zpool because it got
to about 85% full with recordings i either hadn't got around to watching
or had little intention of ever watching again. i deleted enough to get
it below 60% and performance went back to normal. i've since done a
ruthless purge and got it down to 35% full.
note: it will be painfully slow if you try to cp or rsync the files to
somewhere else before deleting them. zfs send is still fast (and if you
send to another pool on the same system you can set the mountpoint of
the destination fs so that it mounts in the same place as the source fs
used to be). the downside is that send only sends snapshots of entire
filesystems, you can't pick and choose which files to move....so,
useless for my myth situation but useful for a pool with multiple
filesystems on it.
control you
have over mythtv is that you can tell it to leave a
certain amount of GB free, and the maximum that can be is 200GB, so
obviously that's a problem on all but the smallest installations. I
ended up creating a 1TB file (with nocow enabled) and used ext4 on
loopback as my mythtv store. Performance is probably badly impacted
but I don't notice it.
If you had a 5TB RAID-1 array (the smallest I would consider buying
for home use nowadays) then 200G would be 4%.
and using all but 4% of the pool would mean using 96% of it.
the free space issue on zfs is not a "rule of thumb" but a well known
fact about it - getting over 80% full is really bad for performance, and
just about every tuning or setup guide will mention it. i don't know
btrfs as well but it wouldn't surprise me if the <10% free issue that
was mentioned is also a hard rule (as in "don't do it")
my understanding is that at >80%, zfs changes the algorithm it uses to
allocate space to files from "best fit" to "wherever there's some
space". this tends to cause massive fragmentation (even more than is
common on COW filesystems)
If I wanted good performance on a BTRFS array I would
make the
filesystem as a RAID-1 array of SSDs. Then I would create a huge
number of small files to allocate many gigs of metadata space. A 4TB
array can have 120G of metadata so I might use 150G of metadata space.
Then I'd add 2 big disks to the array which would get used for data
chunks and delete all the small files. Then as long as I never did
a balance all the metadata chunks would stay on the SSD and the big
disks would get used for data. I expect that performance would be
great for such an array.
with zfs, you'd just do:
zfs set secondarycache=metadata <fs>
that tells it to use the L2ARC (e.g. ssd or other fast block device
cache) to only cache metadata. this can be set on a per-filesystem
("subvolume" in btrfs terms) basis.
you can also set primarycache (i.e. ARC in RAM) to the same values -
all, none, or metadata with the default being all (actually, inherit
from parent with the ultimate parent's default being all) .
zfs doesn't do rebalancing, or rseizing, (unfortunately - they're the
key features that btrfs has that zfs doesn't) but if it did you wouldn't
have to avoid using them so that a kludge like that keeps working.
clever tricks can be cool but designed reliable features are better.
craig
--
craig sanders <cas(a)taz.net.au>