From the course: Exploring Linux Internals: Advanced Insights and Practical Applications
Sparse files - Linux Tutorial
From the course: Exploring Linux Internals: Advanced Insights and Practical Applications
Sparse files
- All right, now that we know about generic file system figures, let's investigate some specifics. To start with, the sparse file. So what is a sparse file? Well, a sparse file is a solution to use disk space in an efficient way when the file is partially empty. In a sparse file, blocks are only written to disk if the block contains real data. If it's an empty block, it's not committed to disk, and the result is that you can have efficient storage of disk images, snapshot, log files, and much more. If sparse files are used, then the utilities must support it. As cp has an option --sparse=auto and that's a default option, and this option keeps the sparse nature of files. That's pretty important because if you're 10 gigabytes sparse file, being a sparse file only occupies 100 megabytes and then you copy it over, and suddenly it needs 10 gigabytes, you're in trouble. In rsync you need the option -S to keep the sparse nature of the file. So particularly with rsync, you need to pay attention. Let me demonstrate how to work with sparse files. So just to be sure, df -h showing me available disk space. That's all right, 12 gigabytes available. That's good. dd if=/dev/zero of=/sparsefile.img bs=1. So dd, you may have heard of it before. It's a utility that you can use to clone devices. if is the input file, dev zero, of is the output file, it'll create an image. bs=1, we'll use a block size of one bind. Then we use count=0, which means that it's not going to allocate anything. And seek is 10G, which is doing what? Well, it marks the file as a 10 gigabyte file, but it's a 10 gigabyte sparse file. Seek marks the end of the file and that's pretty cool. So in the output, we now have a very big file. If you use ls -l on sparse file.img, you can see it, it's a 10 gigabyte file. Oh boy. But let's drop a -h as well for human readable. And let's make that ls -lsh. And there we can see the actual size that is used on disk. That's a zero in the beginning. So this sparse file is reported to be 10 gigabytes, but really it's using no disk space at all. Can we see that? Well, df -h is still showing 12 gigabytes available in the root file system. And that is how sparse files can really be efficient. And if you have a lot of sparks files, the actual size of all files that are reported by ls -l may be bigger than the actual size of your entire volume. Next I'm going to use mkfs.xfs -b, size is 2048, so we create an XSF file system with a block size of two kilobytes, /sparsefile.img. And now I'm going to mount this file system using mount -o loop on /sparsefile.img and we mount it on the /mnt directory. I'm copying /etc/host/ to /mnt and I'm using ls -lsh on /mnt/hosts, which is showing what? Well it's showing a 158 bytes file, but the size use on disk is two kilobytes. That's for the simple reason that we are occupying one block, which is small file, because that's a minimal allocation unit. Now let me use dd if=/dev/urandom of=/mnt/bigfile BS=1M count=10. So here we are creating /mnt/bigfile. The block size is one megabyte and we allocate ten 1 megabyte blocks. So that's a 10 megabyte or better mebibyte file. If next I'm using ls -lsh on /mnt, then what do we see? We see that a 10 megabyte actually is 10 megabyte. That's because of the nature of the device dev/urandom. The dev/urandom device generates a random data and stores it in the output file. So that 10 megabytes is really 10 megabytes in this case. How about ls -lsh on /sparsefile.img? Well here we can see the sparse file occupying 75 megabytes. 75 megabytes, that's a shock. Well, is it really? No, not really because we have a sparse file image that is mounted as an XFS file system. When we created the XFS file system on top of it, XFS metadata has been generated and the XFS metadata occupies space. So we have 10 megabytes and two kilobytes for the real files and 65 megabytes of administrative overhead for the XFS file system. And that's what I wanted to show you about sparse files. Now a bit related to sparse files, we now need to talk about file block allocation. When creating files, filesystems that support the fallocate system call can allocate blocks without actually writing anything in these blocks. And by marking these blocks as occupied, the files can be written much faster, in particularly when a non-sparse file contains a lot of non-used space, the alternative would be to fill the allocated blocks with zeros, and that's inefficient. If non-used space is allocated anyway, the fallocate command can be used to mark the occupied blocks as free. Let me show you. So let's use dd if=/dev/zero of=/bigfile2.img bs=1M and count=1024. That creates a one gigabyte file. ls -lsh on /bigfile2.img is showing a one gigabyte file that really occupies one gigabyte. But hey, the problem is that is one gigabyte file is filled with zeroes, and that's where fallocate can help. So fallocate --dig-holes on /bigfile.img. And next, I'm using ls -lsh on /bigfile.img. And you see the difference, the file is still reported to be one gigabyte whereas the size on disk in the very first column is now set to zero. Hey, can you see how sparse files and fallocate together can be used as a very primitive way to apply some level of compression? Isn't that cool?
Contents
-
-
-
-
-
-
-
-
-
(Locked)
Learning objectives52s
-
(Locked)
Filesystems and the VFS4m 18s
-
(Locked)
About POSIX and non-POSIX filesystems4m 24s
-
(Locked)
Linux filesystem components4m 4s
-
(Locked)
Inodes and block allocation5m 56s
-
Sparse files7m 34s
-
(Locked)
FUSE filesystems3m 56s
-
(Locked)
Next-generation filesystems4m 34s
-
(Locked)
Running ZFS on Linux5m 37s
-
(Locked)
Running Btrfs7m 54s
-
(Locked)
Using the Ext filesystem debugger7m 41s
-
(Locked)
Managing XFS IDs4m 24s
-
(Locked)
Real-world scenario: Exploring cool filesystem tools3m 54s
-
(Locked)
-
-
-
-
-
-
-
-