Tuesday, July 16, 2019

4 -Filesystems, Mounting










Toucan Linux Project - 4

A Roll Your Own Distribution
Goal 1 – Base System


Stage 1 – Building the Tool Chain


Step 4 - File Systems, LFS variable, Mounting

With the partitions created we can create filesystems. Some partition management programs will also create filesystems, but fdisk does not. Now it is time to create the file systems. You’ll need to make a map of the devices and the partitions. If you used the recommended you’ll have something like this:

/dev/sda1 /boot
/dev/sda2 swap
/dev/sda3 / (root)
/dev/sda4 /usr/local (primary)
/dev/sda5 /usr/local (secondary)
/dev/sda6 /var
/dev/sda7 /home


/boot
UEFI requires a FAT type file system. The vfat variation is fine. We will dispense with using the mkfs dispatch command and call the creation utilities directly. To make the boot use:

mkfs.vfat /dev/sdX1

where sdX1 is proper device (/dev/sda3 in the above example).

Swap

The swap space is a raw partition as opposed to a filesystem but it still has a structure. Setup it up with

mkswap /dev/sdX2

where sdX2 is the proper device for the swap partition.

If you are using a host with Linux already installed with a swap partition, this isn’t necessary. Multiple installations of Linux can share a single swap partition.

File System Choices

It’s once again decision time, but first a quick history. There are many choices for file systems in Linux. The first (besides the original Minix filesystem) was extended filesystem (ext) by Rémy Card to create a filesystem specifically for Linux. He further enhanced it to create the ext2 filesystem, and for a long time this was the best and standard filesystem for Linux. Since then many companies, such as IBM and Silicon Graphics, ported their filesystems to Linux. And even the extended filesystem 2 was enhanced. Today we have a lot of choices, including filesystems designed specifically for flash drives (F2FS).

There are a lot of good choices depending on your needs. One option you need to be sure you have is a journaled filesystem or other mechanism such as copy-to-write. The first journaled filesystem available for Linux was SGI’s XFS from IRIX. Shortly behind it the Linux developers community created the ext3 filesystem and IBM ported their JFS to Linux. A journaled filesystem performs all writes in a journal and only when the transaction is committed are the changes made so that on a system crash the journal can be used to clean up the disk and undue any incomplete transactions. Other mechanisms exist besides journaling such as copy-on-write, so this isn’t the only choice.

The second option we will require in TTLP is a multi-threaded filesystem. This is pretty easy because unless you are enabling some very old kernel options the ones available to you are generally multi-threaded unless they are special purpose. Like everything else, some filesystems carry a lot of bloat around that you probably don’t need. For this project we will only choose from four choices: ext4, xfs, brtfs, and f2fs.

Ext4
The ext4 filesystem is the fourth-generation file system of the original ext filesystem. It is journaled, 64-bit, and decently fast. It is the most supported filesystem for Linux, but it is still considered a stop-gap choice until a “next generation” filesystem comes along. Regarding performance it is decent but whole systemic performance is middle ground.

BTRFS
Oracle’s BTRFS (“butter fuzz” or “better fs”) is not only a filesystem but a volume manager as it allows you to combine multiple disks and partitions into a single volume. It is designed to be transaction safe using “copy-on-write,” and has a lot of options including auto-defragmentation, built-in compression, sub-volumes, and self-healing. It is a good, safe filesystem but all the features it offers comes as a price. Tt is not fast. It contains many options you will probably rarely use. A great choice except It suffers poor performance for most operations.

F2FS
F2FS is design for use on SSDs. This is not a journaled filesystem (at least directly) but a log-structured file system (LFS). There are different internal flash memory management schemes for SSD and F2FS is designed to allow access to some of those internals. Flash drives are unique in that memory can be written only a given number of times for each sub-unit (as designed by the SSD) prompting the manufactures to add logic to spread data as equally as possible across the sub-units to make sure the disk lasts as long as possible. They employ various cleaning routines to ensure the longest possible life. While F2FS should theatrically outperform other filesystems on a flash memory disk, that isn’t always the case. It is a very good performer on SSD disks but should not be used on anything else.

XFS
This was the first 64-bit, multi-threaded filesystem written in 1994 by the now defunct Silicon Graphics, Inc. (SGI) for their IRIX operating system (a Unix variant). Even though it is twenty-five years old (or more depending on when you are reading this), it has many features that defined future filesystems. It is a high performer even today. It supports extended attributes, variable blocks sizes, extents, and uses B-trees for directories, extents, and free spaces. It is very fast and still outperforms most other filesystems. It even has an area designates as “realtime” for very fast data access (which is available in Linux, but considered unstable.) It’s weakest area is continuously writing a lot of very small files (where extents offer no help) especially using random access. It is journaled and very safe except for silent disk fails (when operating as a single partition not part of a RAID) like all filesystems that don’t use block checksums. It is not easy to resize the filesystem (specifically to shrink them, they can be grown into free space).

All of these are good choices. Go with ext4 if you expect your partitions to be mounted on different Linux systems and keep in mind that all Linux testing is done using an ext4 filesystem. Portability is really the only reason so use ext4. Choose BTRFS is you want its built-in compression and volume manager. It is large, feature-rich, and combines both a logical volume manager and a filesystem. If you’d prefer to split that functionality instead of combine it use the LVM in Linux and XFS. Choose F2FS if it can create higher performance on an SSD. If you’re using only spinning magnetic disks it will not provide you with any benefit (and XFS on NVMe and SSD is still faster in all but a few cases). 

Since The Toucan Linux Project (TTLP) is designed for high performance, I suggest sticking to XFS and letting mkfs.xfs decide on the parameters of the filesystem. Unless you have special needs (we don’t) then it will pick the best parameters. If you do some research you will find many people that will write that XFS has no real advantages to Ext4 (this is because most the Ext4 ideas came directly from XFS), but run any set of good performance tests and the difference becomes obvious quickly. I think many people hesitate to recommend a filesystem that is so old just because old must mean bad (I’ve read articles where it is called a “dinosaur”—good time to stop reading that article). But when you get it right, you get it right, and that’s what SGI did. One of our principles is, don’t fix what isn’t broken, and XFS certainly isn’t broken. It only tries to be a very fast, safe filesystem, nothing else—fortunately it gets that very right and so far, hasn’t seen another general purpose filesystem challenge it in pure performance.

Here’s some performance metrics from Phoronix that show how well these filesystems perform when compared to each other:

https://www.phoronix.com/scan.php?page=article&item=linux-50-filesystems&num=1

One more note regarding filesystems. There is working being done on ZFS (OpenZFS) which, like BTRFS it is a combined LVM and filesystem (plus RAID), but is the only 128-bit file system if you plan on videoing your complete life to a single file. Its performance is better than BTRFS but slower than XFS. XFS is a filesystem that is highly parallel, stable, and safe and only a filesystem. Use it with LVM2 if you want dynamic partitions (though shrinking means copying all data to a new partition – though the xfs tools will help with this). Since performance and simplicity rules in TTLP, not complexity and buzzwords like “new generation,” TTLP will use XFS which is still the fastest filesystem with extreme stability.

To make file systems on the remaining partitions as root run:

mkfs.xfs /dev/sdX3 && mkfs.xfs /dev/sdX4 && mkfs.xfs /dev/sdX5 && mkfs.xfs /dev/sdX6 && mkfs.xfs /dev/sdX7

This assumes you are using the partitioning scheme above. If you are using only a boot and root use this command to make only the root filesystem:

mkfs.xfs /dev/sdX3

Of course, you’ll need to use the proper device names instead of the sdX device prototypes in the text

Super Speed XFS - OPTIONAL


1) If you have an NVMe drive (it also works very well with an SSD) you can make a super fast filesystem using XFS. The XFS system allows the journal area to be separated from the data area. Since all data is first written to the journal and then the data area it will allow parallel operations and reduce the number of disk seeks. To do this create a partition on the NVMe (or SSD) for the journal (called log in XFS documentation). To find out how big, first run the mkfs.xfs option with the -N to specify to only print the information but not actually make the filesystem.

mkfs -N /dev/sdX

It will return something like this:

# mkfs.xfs -N /dev/sdc7
meta-data=/dev/sdc3 isize=512 agcount=4, agsize=1923136

blks= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=0
data = bsize=4096 blocks=7692544, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=3756, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0


(Note you will need to add -f to the command if there is a filesystem present on the partition). This shows all the values mkfs.xfs will use to create the file system. The log section is highlighted which shows the size of the log area and that it is currently internal to the filesystem. It shows the block count with the blocks value. Use that to know how large the partition needs to be. In this case it is 3,756 * the block size (which is 4,096 bytes) for a total of 15,384,576 bytes. Now create a partition that is slightly larger than that. In this case, make it 16 MB. For this example we’ll assume that the data partition (the big one) is /dev/sdc7 (/home) and the log partition is /dev/sda1 (the NVMe drive). If you’ve already made the /home partition you can delete it and split it by re-partitioning.

Now make the filesystem using the two partitions:

mkfs.xfs -l logdev=/dev/sXa1 /dev/sXc7

This use the /dev/sXa1 partition for the log (journal) and the /dev/sXc7 device for the data and realtime areas. Note the sXa and sXc for the command above to make sure it would fail if you don’t change them. Replace them with the proper device. The key is to put the log area on the fastest disk possible, but the trade-off is that if either disk fails (instead of just one) you’ll lose the data (backups still rule).

Now to mount it you’ll need to supply a mount option:

mount -o logdev=/dev/sXa1 /dev/sXc7 $LFS/home

where sXa1 and sXc7 are the proper partitions.

This will increase performance even on standard magnetic disks. Of course, the fastest is to put both the log and data areas on the NVMe but if yo have limited space on the fastest storage media, separating the log and data will vastly improve performance.

2) XFS uses allocation groups to allow parallel operations. The number of allocation groups determines that parallelism. Increasing it means faster disk operations but more CPU given to the filesystem. If you have a fast machine with lots of cores, you might want to increase the number of allocation groups by one or two for disk partitions you intend to use a lot (/home for instance since users access all their data here). This can be changed when the filesystem is created using the agcount with the -d argument.

Generally, you will get an agcount of 4 or above. The larger the data partition, the higher the agcount. But we can set it higher if we choose. Again we can use xfs_info to find out the agcount of a given partition. Using the example from above for our /home partition:

# mkfs.xfs -N /dev/sdc7
mkfs -N /dev/sdX
meta-data=/dev/sdc7 isize=512 agcount=4, agsize=1923136 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=0
data = bsize=4096 blocks=7692544, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=3756, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0


That shows it will use four allocation groups for the filesystem. Let’s instead use five to increase the speed (and more CPU dedicated to the filesystem). Here’s the command to make the filesystem:

mkfs.xfs -d agcount=5 /dev/sXc7

To combine these two (a separate log partition and a higher number of allocation groups) do the following:

mkfs.xfs -d agcount=5 -l logdev=/dev/sXa1 /dev/sXc7

Then still mount with (see below):

mount -o logdev=/dev/sXa1 /dev/sXc7 $LFS/home

It will read the agcount from the metadata in the filesystem, no mount option is necessary.


Step 5 - The LFS Variable

Creating the system using the LFS book requires you to create a mount point called /mnt/lfs to mount the filesystem of the system you are building on the host. It then uses an environment variable called LFS to reference the mount point. This variable must always be set until the LFS base system is built and can boot itself. To ensure that the variable is added, run the following:

echo “export LFS=/mnt/lfs” >> ~/.bashrc
Do that for your own user account and for the root account which you will need for LFS (though rarely). See the LFS section 2.7

(http://www.linuxfromscratch.org/lfs/view/development/chapter02/aboutlfs.html) for more information.

Step 6 - Mounting the filesystems

Each time you work on building the base LFS system you’ll need to mount the filesystems appropriately. First create an LFS work area in your home directory.

mkdir -v ~/lfs
cd ~/lfs


Then create a script to mount the filesystems. If you used method 1 (a root and a boot) do the following:

cat > mount.sh << EOF
mkdir -pv $LFS
mount -v /dev/<sdX> $LFS
mkdir -v $LFS/boot
mount -v /dev/<sdY> $LFS/boot
EOF


The <sdX> should be the partition for the root (most likely /dev/sda1) and <sdY> should be the boot partition (most likely /dev/sda2).

If you used method 2 (seven partitions) do the following:

cat > mount.sh << EOF
mkdir -pv $LFS
mount -v /dev/<sdV> $LFS
mkdir -v $LFS/boot
mount -v /dev/<sdW> $LFS/boot
mkdir -pv $LFS/usr/local
mount -v /dev/<sdX> $LFS/usr/local
mkdir -v $LFS/var
mount -v /dev/<sdY> $LFS/var
mkdir -v $LFS/home
mount -v /dev/<sdZ> $LFS/home
EOF


Match the following device prototypes to the partitions:

sdV - / (root)
sdW - /boot
sdX - /usr/local (primary)
sdY - /var
sdZ - /home

If you are booting from the primary disk of the system these will be /dev/sda3, /dev/sda1, /dev/sda4, /dev/sda6, and /dev/sda7 respectively.

If you are using the Super Speed XFS from above for the /home directory then the script might look like this:

cat > mount.sh << EOF
mkdir -pv $LFS
mount -v /dev/<sdV> $LFS
mkdir -v $LFS/boot
mount -v /dev/<sdW> $LFS/boot
mkdir -pv $LFS/usr/local
mount -v /dev/<sdX> $LFS/usr/local
mkdir -v $LFS/var
mount -v /dev/<sdY> $LFS/var
mkdir -v $LFS/home
mount -v -o logdev=/dev/sXa1 /dev/<sdZ> $LFS/home
EOF


If you used your own partitioning scheme, you’ll need to adjust the mount commands to mount your partitions with the root at /mnt/lfs.

Now mark the script as executable and run it:

chmod 750 mount.sh && ./mount.sh && df

The output should show the partitions at their mount point. Check them and modify the file if anything is wrong. Then umount them and run the script again until it is correct.

Next time we’ll move setup the host to create the first tool chain which is temporary, download all the necessary files, and setup a working environment to use for the build.

No comments:

Post a Comment