Toucan Linux Project: 3 -Partitioning

Toucan Linux Project - 3

A Roll Your Own Distribution

Goal 1 – Base System
Stage 1 – Building the Tool Chain

Step 3 - Partition the Disk

Partitioning a disk ought to be easy, correct? Doing it is, but choosing which way to do it is much more difficult. Neal Stephenson said in In the Beginning...Was the Command Line:

The file system on Unix machines all have the same general structure. On your flimsy operating systems, you can create directories (folders) and give them names like Frodo or My Stuff and put them pretty much anywhere you like. But under Unix the highest level—the root—of the filesystem is always designated with the single character “/” and it always contains the same set of top-level directories:
/usr
/etc
/var
/bin
/proc
/boot
/home
/root
/sbin
/lib
/tmp
and each of these directories typically has its own distinct structure of subdirectories. Note the obsessive use of abbreviations and avoidance of capital letters; this is a system invented by people to whom repetitive stress disorder is what black lung is to miners. Long names get worn down to three-letter nubbins, like stones smoothed by a river.

This is not the place to try to explain why each of the above directories exists, and what is contained in it. At first it all seems obscure. When I started using Linux I was accustomed to being able to create directories wherever I wanted and to give them whatever names struck my fancy. Under Unix you are free to do that, of course (you are free to do anything) but as you gain experience with the system you come to understand that the directories listed above were created for the best reasons and that your life will be much easier if you follow along.

Yet over time changes have been made such as adding /opt or moving /home to /usr/home or worse of all, installing everything in the wrong place (more later). In truth, the gentlemen that gave us Unix (primarily Thompson and Ritchie) used it a lot and used solely the command line because in the beginning there was the command line. It is an old operating system, having been originally bootstrapped in 1969, and there existed a wealth of users that were experienced with it by the time PCs came about in the 1980s. There was not only a reason for everything they did, but a well tested reason for everything they did. So with that said we move on to the discussion of partitioning.

Unlike Stephenson’s book, this is the place to try to explain why each of the above directories exist. It goes something like this:

/usr - originally the directory holding user home directories, its use has changed, and it now holds executables, libraries, and shared resources that are not system critical bu instead designated for “user” use

/etc - Contains configuration files and some system databases

/var - short for "variable;" a place for files that may change often, such as the storage to a database, the contents of a database, log files (usually stored in /var/log), email stored on a server, files waiting for print

/bin - Stands for "binaries" and executable file; Contains the set of utilities needed by a system administrator

/home - contains the home directories for the users; originally this was /usr by eventually changed to /home when it became apparent that data needed to to be separated from configuration, libraries and executables

/mnt - This is the default location to mount external devices like hard disk drives, or memory sticks, other mount points were created under it

/lib - This is the depository of all integral UNIX system libraries

/tmp - a place for temporary files; many systems clear this directory upon start up, enforcing it is indeed temporary

/dev - short for devices, contains file representations of every peripheral device attached to the system (though that is ot always true today)

/proc – Not part of the original Unix this is short for processes and contains information about every process (and lots of system information) publish using a virtual filesystem to make it easy to access for non-root users

/boot – Not part of original Unix, this is to store the files necessary to boot the operating system, usually the kernel and necessary modules with boot options; originally the kernel was simply a file in the root

/root – Not part of the original Unix, this is the home directory for the superuser, root; in classic Unix, root’s home directory was (no surprisingly) the root (/)

/sbin – Not part of the original Unix, this was a set of utilities useful only for the super user root and thus was not mapped into the search path for normal users.

This seemed all fine except as Unix expanded and became more widely used, other vendors started adding their programs. The question became, “Where do I put my files?” If your were using AT&T Unix and AT&T decided to upgrade the system by installing all the new binaries in a directory called /usr/bin.new, then deleting all of /usr/bin, and renaming /usr/bin.new to /usr/bin then it was no longer a safe place. Or even simpler, what if your binary used a name that AT&T later decided to use themselves? When they upgraded they would overwrite your binary with theirs. Essentially they needed a place to install binaries, libraries, and configuration files that were not part of the base operating system which was controlled by the vendor.

To solve this problem, underneath /usr AT&T created a mirror of the base directories that contained: etc, lib, include, and bin in a directory called local. This directory was meant to contain files for the “local” system, as in those not part of the operating system distribution. This worked well, except vendors didn’t like the idea of putting their binaries right along with other vendor’s binaries, or even the fact they didn’t control the whole directory. Along then came /opt (originally /vol) that allowed a vendor to create their own directory and under it, have complete control, including the ability to delete it later without fear of destroying other applications. The problem became adding the binaries to the user’s search path and the libraries to the linker search path (though in static linking this didn’t matter much). Regardless, /opt became a place for vendor additions for many versions of Unix for many years. Why? I’m not sure. It was certainly possible to create a directory under /usr/local that did the same, perhaps because that directory already had a well established and understood structure.

Now in modern Linux the distro maintainer gets to choose where to install everything. And your are now you’re own distro maintainer. There is reason to have each of those directories above have it’s own partition. There are also reasons they shouldn’t. Decision time.

/boot
The first one is easy. The boot area /boot needs to be on its own partition. The boot loader that the system calls from the BIOS (or primary boot loader) needs everything here it needs to boot the OS including all the pieces of the kernel necessary to boot it on the hardware. This is the kernel file itself, any support modules it might need, and kernel configuration and boot options. The boot loader is a program (that used to be very simple) to load the kernel and then the kernel started doing what it needed to bring up the complete system before handing off to the initial program. One job the kernel must accomplish is mounting the root directory.

Modern systems use a system called UEFI (Unified Extensible Firmware Interface) to be able to boot from a plethora of media such as hard disks, SSDs, USB memory devices, SD devices, and even over a network. Almost any general purpose computer designed after 2006 will boot using this system. The alternative is called the MBR for master boot record, which is a legacy method for the primary bootloader to load the secondary bootloader or, in older days, the operating system itself. In this case, there is a reserved area on the disk where the primary bootloader will look for code to load and execute as the secondary bootloader. This is now known as “legacy boot.”

The compressed Linux kernel will reside here along with the secondary bootloader and its configuration files. This partition doesn’t need to be too large because it shouldn’t contain more than two or three different kernels to allow safe experimenting (experimental, known good, safe standby) and the boot loader. A minimum of 150MB is suggested but 250MB is better. The EFI area is generally around 25MB, a kernel about 5-8MB and the secondary bootloader about 9MB. I never have more than about 3 kernels and my boot partition is about 100MB total. With most distros there will be a large (25-30MB) file that contains the initial RAM disk (initrd) which is actually a compressed file system itself. The Toucan Linux Project will not use an initrd unless CPU firmware is required and in that case it will be very small. If you really need disk space 35MB is plenty for a single kernel on a system that doesn’t dual boot.

If you are using a system where Linux is already installed and you are using extra disk space for The Toucan Linux Project, you don’t need another boot partition. You can use the one from the host OS unless you specifically want them separate.

swap
This is a special filesystem the kernel will use as virtual memory to swap data pages in and out of memory as needed. With a slim, trim system this won’t be necessary and is generally avoided anyway since it can lead to SHP (Serious Horrible Performance) and worse to a condition called thrashing. But the swap partition is also used to hibernate (suspend to disk) which might be useful on a laptop. I would suggest a size of 4GB minimum for a workstation and for a laptop where hibernation might be a nice feature, the minimum is the same size of the system’s RAM. If you have plenty of space make it the same size as your system RAM. If you are using a system where Linux is already installed and you are using extra disk space for The Toucan Linux Project, you don’t need another swap partition. You can use the one from the host OS.

/ (root)
Here it gets a bit harder. This is the last required partition. All the other directories can be created using the space of this partition without any problem to the system (except for performance when a system can write to multiple partitions in parallel). The real danger is that if the root partition fills up, the system might crash or simply be unable to boot after being shutdown. I would say a minimum is 15GB but that really does vary depending on what you intend to use it for. I believe in a separate root. It might be best to table this discussion until we understood more about the other directories.

/usr
I would argue to NEVER make this a separate partition. It should be part of the root drive along with /lib, /bin, and /sbin. It contains many libraries that programs will need to link. While /bin, /sbin, and /lib are intended to contain everything required to repair the system should the other partitions get trashed, there are simply too many useful tools that required libraries in /usr/lib. It only contains code, configurations, data shared across system applications, and files used for development (like C include files). The biggest single entity it will contain in The Toucan Linux Project is the kernel source code for one to three kernels—these can be quite big (around 10G required per kernel to compile) and the place to hold all the source code for other libraries and applications for TTLP.

/usr/local
This should be a separate partition under this project. The reason being that the applications that are not firmly in the base install reside here. If you have a stable system and decide to doing something experimental (which you will) then having two of these will save you a lot of headache. You will have one as the primary which will contain your stable system before you start the experiment. You will back it up to the secondary partition, then replace the primary with the secondary at the mount point. Compile. Build. Install. Test. Rinse and repeat. If the experiment fails you just remount the primary and all is well. If it works, copy the secondary to the primary and your experiment becomes “stable.” We will create two of these for this purpose in the project (for those who are aware of the overlay file system this will make even more sense).

/var
Since data will grow here as the system logs are written, there is the possibility it might fill the root partition if it is not separate. Though we will use a log management system to archive and remove old logs, so it is not essential to have it on a separate partition. Keep in mind, though, one vulnerability might be to simply cause the system to log so much that the partition fills, thus filling the root if it is not a separate partition. In The Toucan Linux Project (TTLP) we will use this as a build area for large applications (though this can be changed), so it will need extra space. We will also make sure the system logs can't fill the partition. Compiling the Chromium browser currently requires 20GB depending on the options, though 6-8 GB is more of a target for TTLP unless you intend to use the bigger browsers and office suites like LibreOffice or OpenOffice then you should expect around 20GB for the large build area. For security and system stability, /var should be a separate partition.

/home
Home should definitely be a separate partition where possible. It is the user data area and it can easily fill up just by video or sound editing and is an easy target for someone with ill intent. Since your data, videos, audio, your own programs and code if you’re a programmer, and all the configuration and data for such things as games will be here, it should be whatever space is left after you create all the other partitions. Since there is a lot of change on this drive, the filesystem will become fragmented over time making it a good target for cleaning and defragmentation.

The Don’ts
Kernel modules are found in /lib/modules in the root drive. Now suppose the root filesystem is a JFS filesystem and the kernel needs to load the jfs.ko module in order to mount the root drive (at boot the bootloader generally can only access the /boot partition so the kernel will first need to mount the root just to access the files it needs to startup). We have a problem here because the jfs.ko file resides on the root drive that we are trying to mount. The kernel will not be able to boot since it can not access the modules it needs to boot. The idea of the root (/) partition is to contain everything the system needs to fully load the kernel in a state it can use all the hardware and features of the system. It will need some libraries in /usr/lib later in the boot stages, but this really isn’t booting the kernel anymore, but booting the system and bringing it into a desired state. DO NOT under any circumstances make /bin, /lib, or /sbin into a separate partition. Under the rule of KISS, the goal of this project is NOT to require a complicated initrd to boot the system, and creating any of these partitions, including /usr, as a separate partition might break that rule. Creating one for /usr/src is okay but never /usr.

Using a GUID Partition Table

Part of the UEFI specification was a new type of partition table called GUID Partition Tables which is best known as GPT for short. UEFI removed the requirement for a Master Boot Record (MBR) since only one OS could own it at a time. If you used to dual-boot Linux and Windows, and Windows did an update it would also wipe out the bootloader (probably LILO or Grub version 1) and you’d have to repair it. UEFI allows multiple operating systems to have their own bootloaders as well as providing a standard boot method for any media like DVD, hard disk, USB drive, or SD drive. The older type of partition table is the MBR (or many others). Fortunately, the designers of the UEFI standard allowed for UEFI + MBR booting called “Legacy” booting. We now have the choice of booting UEFI for systems that support it, MBR or system that don’t, and generally UEFI+MBR hybrid for UEFI system. We will use a GPT partition table because it alleviates many issues that existed for the MBR type (4 primary partitions with a kludgy “extended” partition, 450 bytes for the secondary bootloader, etc.).

If you’re using a system old enough that it doesn’t support, UEFI you’ll need to create a very small partition as the primary partition number 1. It can be as small as 1MB in size to allow room for the boot loader. Primary partition 2 should be the swap partition. Primary partition 3 should be the root partition, and primary partition 4 should be an logical extend partition of the rest of the remaining space of the disk to allow you to create additional partitions. The instructions below assume you are using UEFI, adjust according if you are not.

Partitioning – Method 1 – Simple and reasonable for for a single user system

/boot – 250MB
swap – 2G or RAM size
/usr/local (primary) – 4GB
/usr/local (secondary) – 4GB
/root – remaining disk space (kernel and application source will be here)

There is no reason for the /boot partition to be bigger unless you intend to test kernels. The two /usr/local partitions will hold all the binary, include files, libraries, for the non-base portion of the system, so they can be much larger say 8GB each if you have the space.

Partition – Method 2 – Safer

/boot – 250MB
swap - 2GB or RAM size
/usr/local (primary) – 2GB to 8GB
/usr/local (secondary) – 2GB to 8GB
/var – 4GB or 25GB for a large build area
/root – 20GB up to 60GB (kernel source and package source will be here)
/home – remaining disk space

For both methods you could make one /usr/local smaller and the other around 25GB. The larger one (the secondary) could be used for build space for large packages using a symbolic link. This is how I choose to partition. If you are using a 1TB or greater drive you can easily extend all of these sizes by 40%.

What About “Dynamic” Partition Types?

It is possible to use the logical volume manager to build all of these partitions in such a manner that you can resize them on the fly such as shrink one and add the space to another, or even to add another drive and map it’s space in existing partitions. If you know what you are doing and want to do that you certainly can. The LVM is somewhat slower, certainly more complicated (violating TTLP’s KISS principle) but I can understand if you choose to do it. I think with the size of modern drives this isn’t necessary except for servers with large arrays. If you choose to do this and can do it already, then your experienced enough to do it well.

Another choice is some of the newer filesystems that are a volume manager and file system combined into one. We will discuss that next time.

Now It’s Time to Partition

In MX Linux you will find the fdisk and parted programs to handle partitioning. It also allows using a graphical tool if you prefer with gparted. If you have a dedicated drive for The Toucan Project as recommended I suggest you first choose to make a new GPT partition table to make sure everything is cleared. Most disks come setup to operate in Windows with a partition table and file system already installed. Clear it by making a new blank GPT.

Create the partitions first, then set the partition types. Here’s the types as listed by fdisk.

For the swap partition you need code 19 (type 8200 Linux swap). For the Linux partitions you need code 20 (type 8300 Linux filesystem). If you are using an UEFI boot scheme you need to mark the boot partition as code 1 (type ef00). You can mark the /root with code 22 (type 8304 Linux root (x86)) which I recommend. You can mark the /home with code 28 (type 8302) if you’d like. The codes come from fdisk it will be different with a different partitioning tool.

Since we’re building an LFS as our starter system you might want to check the comments in the book regarding partitioning: http://www.linuxfromscratch.org/lfs/view/development/chapter02/creatingpartition.html.

Copyright (C) 2019 by Michael R Stute

Toucan Linux Project

Tuesday, July 30, 2019

3 -Partitioning