File fragmentation and waste disk space

Assembled cognoscenti; a general question about file-systems and file fragmentation. As the price of storage of bytes drops, do file systems really need the economy of fragmenting files to fill fragmented empty disk space? That is if there is insufficient space between two existing files, the new file is just written in the first interstitial space which is large enough; or the unfragmented space starting at the last file; and small interstitial empty spaces are just wasted. Do any file-systems do this and if not is the reason simply storage economy ? regards Rohan McLeod

Rohan McLeod wrote:
a general question about file-systems and file fragmentation.
What file fragmentation? It should be negligible unless you're using a legacy fs like FAT, an unusual workload, or you are routinely filling ext's reserved space (either by setting it to 0%, or filling the disk as root). Likewise, naïve defragmentation on ext is trivial: for each file f copy f to f' move f' to f (For btrfs, zfs and log-oriented filesystems, there is dedicated infrastructure to coalesce/rebalance/scrub/resilver/&c for you, which will do a better job.) If you're asking "why are we wasting code even having this feature", consider copying a DVD to a disk with 1TB free, but no contiguous free segment exceeding 4GB. Should the user just be told "just buy a bigger disk, they're really cheap"? That seems ludicrous to me.

Trent W. Buck wrote:
..........snip
If you're asking "why are we wasting code even having this feature",
No, the question I think I am asking is, is it really necessary to fragment the file during writing, in the first place? Then efficient use of disk space just amounts to defragmenting the free space; ie arranging the files contiguously. I don't doubt your contention that FAT is particularly evil in this respect; or that defragmentation is quite simple in ext or automatic in btrfs and zf..
consider copying a DVD to a disk with 1TB free, but no contiguous free segment exceeding 4GB. Should the user just be told "just buy a bigger disk, they're really cheap"? That seems ludicrous to me. On second thought; I am not even convinced that 'non-fragmented' writes implies any more than a temporary loss of space. Can't the OS just write the 4GB DVD to RAM till the FS makes the free-space contiguous ? But on the subject of economy , we already accept a trade-off between disk space utilisation and speed of access etc.; it seems to be widely accepted that once a disk gets down to about 20% free-space, it is time to get a bigger drive ?
regards Rohan McLeod

Rohan McLeod <rhn@jeack.com.au> wrote:
No, the question I think I am asking is, is it really necessary to fragment the file during writing, in the first place? Then efficient use of disk space just amounts to defragmenting the free space; ie arranging the files contiguously.
It does, but that's surely where the performance deficit lies, incurred whenever defragmentation is required. To make the case, you would have to show that having contiguously allocated files outweighs the cost of performing the defragmentation. This is most obviously so if the files are written once, never modified, and read often, but clearly, most file system usage doesn't follow that pattern.

Jason White wrote:
..............snip
To make the case, you would have to show that having contiguously allocated files outweighs the cost of performing the defragmentation. This is most obviously so if the files are written once, never modified, and read often, but clearly, most file system usage doesn't follow that pattern. So to your knowledge: 1/ no FS does this 'contiguous allocation' ?; 2/ the reason being that ' the cost of re-arranging the files contiguously ', is greater than 'the cost defragmentation' ?;
thanks for clarifying that , Rohan McLeod
_______________________________________________ luv-talk mailing list luv-talk@lists.luv.asn.au http://lists.luv.asn.au/listinfo/luv-talk

Rohan McLeod <rhn@jeack.com.au> wrote:
So to your knowledge: 1/ no FS does this 'contiguous allocation' ?;
I'm not aware of any.
2/ the reason being that ' the cost of re-arranging the files contiguously ', is greater than 'the cost defragmentation' ?;
I'm confident it would be done if there were a performance benefit. XFS performs delayed writes to reduce, but not eliminate, fragmentation, as does ext4 if I remember correctly.

Jason White wrote:
Rohan McLeod <rhn@jeack.com.au> wrote:
So to your knowledge: 1/ no FS does this 'contiguous allocation' ?; I'm not aware of any. 2/ the reason being that ' the cost of re-arranging the files contiguously ', is greater than 'the cost defragmentation' ?; I'm confident it would be done if there were a performance benefit.
Jason thanks for replying; No doubt you are right; but it is far from intuitive. If I imagine a disk space containing say n groups of contiguous files, comprising say m bytes in total; separated by n-1 free-spaces comprising f bytes in total; then making those groups contiguous, would seem to simply involve, moving m bytes of n contiguous files, toward the front of the disk, so the free-space is moved to the other end of the disk. Whereas in a file system which allows fragmentation; if those n-1 free-spaces were occupied by a fragmented file say; then to get to a contiguous, set of contiguous files would seem to involve 1/ moving the f bytes of the fragmented file to the end of the disk to defragment it 2/ moving the m bytes of the contiguous files to the front of the disk 3/ moving the f bytes of the defragmented file again, up to the end of the rest of the contiguous files ? ie m bytes in the first case and m + 2.f bytes in the second ! regards Rohan McLeod

In many systems the bottleneck is writing not reading. For example in a typical mail server writes will outnumber reads because if you have adequate cache then most mail is read from cache as the users who poll mail the most frequently tend to be the ones who receive the most mail. When writes are the bottleneck you don't want to do much to defragment disk allocation which is good for reads. You want to interleave writes for multiple files to defragment writes. WAFL, ZFS, and I believe BTRFS do this. Automatic defragment on BTRFS is new and experimental, I won't use it for a while except on the most experimental systems. Also I think that the balance bug that affects systemd systems would also get autodefragment. -- Sent from my Samsung Galaxy Note 2 with K-9 Mail.

On Sat, 25 Jan 2014 12:25:00 Trent W. Buck wrote:
Russell Coker wrote:
Also I think that the balance bug that affects systemd systems [...]
I didn't hear about that. Have you a cite?
There was a mention in one of the list discussions. But it's something you can verify yourself. Just run a balance on a filesystem that has a systemd journal and it will report corruption. Apparently systemd itself will also report corruption so it's actually corrupting the file data. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/
participants (4)
-
Jason White
-
Rohan McLeod
-
Russell Coker
-
Trent W. Buck