This is a LinuxChix course page. This and the other courses pages are yet to be ported to the new LinuxChix website. Please feel free to browse the course in the meantime.

We appreciate your patience.

The UNIX Filesystem: Lesson 3

Introduction

In the previous lessons, we have covered what the filesystem is, how it's structured, and various concepts involved, such as symbolic links. Soon, it'll be time to get onto the concrete stuff - mounting volumes.

First, however, it's time to get acquainted with yet another of the big principles of the filesystem:

Everything's a file!

One of the brilliant design moves of the UNIX operating system is that everything that can be represented as a file, is represented as a file. A hard disk is a file, a terminal is a file, your webcam is a file - everything.

"What, everything?" I hear you cry. Well, in fact, I don't, because the usual response to this revelation is blank incomprehension. So let's try another route. Try this:

Well, I hope I now have you at least partially convinced of the power that comes from treating everything as a file. Now to actually make use of it.

Commonly used devices

The devices we will use most to start with are the hard drives and floppy drive. On a PC, /dev/hda is the primary master IDE disk. If you don't know what that means, and just bought a computer you've never opened and poked inside, then it's a safe bet that /dev/hda represents your one and only hard disk. Your second IDE disk (primary slave) is /dev/hdb. Your third IDE disk (secondary master) is /dev/hdc - on the vast majority of systems, this is not a hard disk but a CD-ROM drive. To make things easier, many systems have a symbolic link at /dev/cdrom, pointing to the actual device (almost always /dev/hdc). Secondary slave is /dev/hdd, and so on. Personally, I administer a system which keeps on until /dev/hdg, and we're thinking about adding more.

Your first floppy drive will be /dev/fd0, your second /dev/fd1, and so on.

Any of these devices may be present or absent - the fact that there is a file in /dev/ for it doesn't mean anything!

All of these devices, and in fact most disk devices, are block devices. This means that they can only be written to, read from, and skipped around in multiples of the block size - usually 512 bytes. By contrast, devices such as terminals, serial ports and sound cards are character devices, which can be read from and written to like ordinary files, byte by byte. This becomes important later, when using loopback-mounted filesystems.

Anatomy of a filesystem

Now, back to the UNIX filesystem. Each volume is represented by a block device. As far as the filesystem is concerned, each block device is a one-dimensional array - just a long string of blocks, one after another. Device drivers in the operating system take care of the fact that most data is in fact stored in spirals on spinning disks of various kinds. This is our first layer of abstraction - everything reading or writing data to any device doesn't need to bother about how it's stored, it can just talk to things in exactly the same way. This is good - it means that the driver to handle DOS-formatted floppy disks does not need to be rewritten to handle DOS-formatted hard drives - the layout of the data on the disk is the same, and the hardware drivers take care of the differences for us.

A block device isn't much use to us as-is - we need to be able to store files on it. So, we need some coherent system for storing arbitrary-length streams of data (files), in an organised heirarchy (directories), in a fixed-length stream of data with a fixed block size and no organisation (a storage device). Square peg, round hole.

This is, as you may have guessed, a difficult problem, and as is the way with such things, it has spawned a legion of solutions, each competing on its various merits (speed, efficiency, reliability, and so on). If I mention a few examples, some of them may well sound familiar: FAT16, FAT32, ext2, ext3, reiserfs, UFS, HFS, NTFS...and that's only the ones I see on a regular basis. These I will call filesystem implementations, to distinguish them from "the filesystem" as an entity, but their normal name is "filesystems" ("Which filesystem does Solaris usually use?"). A volume formatted with one of these filesystem implementations is also called a "filesystem" ("Is there a filesystem on that disk already, or should I format it myself?"), but I will call them "formatted volumes" to avoid confusing everyone, myself included.

We don't need to go into specifics of how a filesystem implementation works - entire books have been written about it, and we're still innovating. However, it is good to know how the general system is organised.

The important thing to realise is that this is all about layers of abstraction. The operating system kernel itself is one gigantic abstraction, providing a uniform interface to programs (called system calls), allowing them to use completely artifical constructs such as files and directories, text input and output and network interfaces, rather than forcing them to fiddle with blocks on hard disks, flip bits on a video card, or catch interrupts from an ethernet adaptor, just to do the simplest tasks. This layer of abstraction also allows it to control access to these resources.

Anyway, back to the subject. Let's see how one filesystem transaction happens:

  1. System call
    A program (my text editor) wants to open the file /home/meredydd/Lesson3.html. It asks the kernel to open that file, using the open() system call (with which some of you will be very familiar).
  2. VFS
    Control is now passed to a section of the kernel called the VFS (Virtual Filesystem), whose job it is to figure out what to do with this request. It consults the mount table - a table within the kernel that stores which bits of the filesystem are stored on which devices, using which filesystem implementation. You can view the mount table with the mount command:

    As you can see, my mount table tells me that my root volume (mounted under /) is /dev/hda1, and that it is using the reiserfs filesystem implementation. There are also a few other filesystems mounted, all of them irrelevant to this example. The VFS sorts through this table, and figures out which volume is responsible for the directory /home/meredydd - in this case /dev/hda1.

  3. Filesystem driver
    The VFS then calls up the driver for the filesystem implementation in use by that volume (in this example, reiserfs), and asks it for information about the file in question. The filesystem driver then reads the appropriate blocks from the device in question (/dev/hda1 here), interprets them however it should, and returns the information to the VFS, which then makes vital decisions such as whether the file exists, whether this user has permission to open the file, and so on.
  4. Hardware
    As I have said before, the filesystem driver does not actually need to know anything about the hardware on which this data is being stored. The appropriate driver (in this case, the driver for my IDE controller) handles reading and writing the actual blocks on the disk, freeing the filesystem driver from the need to handle the physical stuff.

    In fact, the system is a bit more complicated - just above the hardware driver is a complicated system of buffers and caches, to ensure that a minimum of time is wasted on hardware reading and writing (which is slow), and frequently-used blocks are instead held in memory (which is a lot faster). Caching also helps to speed up write operations - modified blocks are held in memory until they can be written to the device all at once, which is much more efficient. Unfortunately, this means that if the power is unexpectedly removed, this data never actually gets written to disk, which is why turning off your computer without properly shutting down is a Bad Thing, and can lose data. Filesystems have evolved mechanisms for minimising the damage caused by this and other nasty events - a topic which I will handle in later lessons.

Next Lesson

I apologise for giving you another heavy theory lesson. I promise that next time we'll actually get down to some practical work. Really!