Kernel Hacking Lesson #6: Overview of the Kernel Source
In this lesson, you'll get a general idea of where the various parts of
the kernel are located in the source tree, what order they execute in,
and how to go looking for a particular piece of code.
Where is all the code?
Let's start with the top-level directory of the Linux source tree,
which is usually but not always in
/usr/src/linux-<version>. We won't
get too detailed, because the Linux source changes constantly, but
we'll try to give you enough information to figure out where a certain
driver or function is.
This file is the top-level Makefile for the whole source tree. It
defines a lot of useful variables and rules, such as the default gcc
This directory contains a lot of useful (but often out of date)
information about configuring the kernel, running with a ramdisk, and
similar things. The help entries corresponding to different configuration
options are not found here, though - they're found in
files in each source directory.
All the architecture specific code is in this directory and in the
include/asm-<arch> directories. Each architecture has its own
directory underneath this directory. For example, the code for a
PowerPC based computer would be found under
arch/ppc. You will find
low-level memory management, interrupt handling, early initialization,
assembly routines, and much more in these directories.
This is a cryptographic API for use by the kernel itself.
As a general rule, code to run peripheral devices is found in
subdirectories of this directory. This includes video drivers,
network card drivers, low-level SCSI drivers, and other similar
things. For example, most network card drivers are found in
drivers/net. Some higher level code to glue all the drivers of one
type together may or may not be included in the same directory as the
low-level drivers themselves.
Both the generic filesystem code (known as the VFS, or Virtual File
System) and the code for each different filesystem are found in this
directory. Your root filesystem is probably an ext2 filesystem; the
code to read the ext2 format is found in
fs/ext2. Not all of the
filesystems compile or run, and the more obscure filesystems are
always a good candidate for someone looking for a kernel project.
Most of the header files included at the beginning of a
.c file are
found in this directory. Architecture specific include files are in
asm-<arch>. Part of the kernel build process creates the symbolic
asm-<arch>, so that
#include <asm/file.h> will
get the proper file for that architecture without having to hard code
it into the
.c file. The other directories contain non-architecture
specific header files. If a structure, constant, or variable is used
in more than one
.c file, it should be probably be in one of these
This directory contains the files
version.c, and code for creating "early userspace".
version.c defines the Linux version string.
main.c can be thought of as the kernel "glue". We'll talk more
main.c in the next section. Early userspace provides
functionality that needs to be available while a Linux kernel is coming up, but
that doesn't need to be run inside the kernel itself.
"IPC" stands for "Inter-Process Communication". It contains the code
for shared memory, semaphores, and other forms of IPC.
Generic kernel level code that doesn't fit anywhere else goes in here.
The upper level system call code is here, along with the
code, the scheduler, signal handling code, and much more. The files
have informative names, so you can type
ls kernel/ and guess fairly
accurately at what each file does.
Routines of generic usefulness to all kernel code are put in here.
Common string operations, debugging routines, and command line parsing
code are all in here.
High level memory management code is in this directory. Virtual
memory (VM) is implemented through these routines, in conjunction with
the low-level architecture specific routines usually found in
arch/<arch>/mm/. Early boot memory management (needed before the
memory subsystem is fully set up) is done here, as well as memory
mapping of files, management of page caches, memory allocation, and
swap out of pages in RAM (along with many other things).
The high-level networking code is here. The low-level network drivers
pass received packets up to and get packets to send from this level,
which may pass the data to a user-level application, discard the data,
or use it in-kernel, depending on the packet. The
directory contains code useful to most of the different network
protocols, as do some of the files in the
net/ directory itself.
Specific network protocols are implemented in subdirectories of
net/. For example, IP (version 4) code is found in the
This directory contains scripts that are useful in building the
kernel, but does not include any code that is incorporated into the
kernel itself. The various configuration tools keep their files in
here, for example.
Code for different Linux security models can be found here, such as
NSA Security-Enhanced Linux and socket and network security hooks.
Drivers for sound cards and other sound related code is placed here.
This directory contains code that builds a cpio-format archive containing a
root filesystem image, which will be used for early userspace.
Where does it all come together?
The central connecting point of the whole Linux kernel is the file
init/main.c. Each architecture executes some low-level setup
functions and then executes the function called
is found in
The order of execution of code looks something like this:
Architecture-specific setup code (in arch/<arch>/*)
The function start_kernel() (in init/main.c)
The function init() (in init/main.c)
The user level "init" program
In more detail, this is what happens:
- Architecture-specific set up code that does:
- Unzip and move the kernel code itself, if necessary
- Initialize the hardware
- This may include setting up low-level memory management
- Transfer control to the function
start_kernel() does, among other things:
- Print out the kernel version and command line
- Start output to the console
- Enable interrupts
- Calibrate the delay loop
rest_init(), which does:
- Start a kernel thread to run the
- Enter the idle loop
- Start the other processors (on SMP machines)
- Start the device subsystems
- Mount the root filesystem
- Free up unused kernel memory
At this point, the userlevel
init program is running, which will
do things like start networking services and run
getty (the login
program) on your console(s).
You can figure out when a subsystem is initialized from
init() by putting in your own
printk's and seeing when the printk's
from that subsystem appear with regard to your own
example, if you wanted to find out when the ALSA sound system was
printk's at the beginning of
init() and look for where "Advanced Linux Sound Architecture [...]" is printed out
relative to your
printk's. (See Kernel Hacking
Lesson #5 for help with using the
Finding things in the kernel source tree
So, you want to start working on, say, the USB driver. Where do you
start looking for the USB code?
First, you can try a
find command from the top-level kernel
$ find . -name \*usb\*
This command will print out every filename that has the string "usb"
in the middle of it.
Another thing you might try is looking for a unique string. This
unique string can be the output of a
printk(), the name of a file in
/proc, or any other unique string that might be found in the source
code for that driver. For example, USB prints out the message:
usb-ohci.c: USB OHCI at membase 0xcd030000, IRQ 27
So you might try using a recursive grep to find the part of that
printk that is not a conversion character like
$ grep -r "USB OHCI at" .
Another way you might try to find the USB source code is by looking in
/proc. If you type
find /proc -name usb, you might find that there
is a directory named
/proc/bus/usb. You might be able to find a
unique string to grep for by reading the entries in that directory.
If all else fails, try descending into individual directories and
listing the files, or looking at the output of
ls -lR. You may see
a filename that looks related. But this should really be a last
resort, and something to be tried only after you have run many
different find and grep commands.
Once you've found the source code you are interested in, you can start
reading it. Reading and understanding Linux kernel code is another lesson
in itself. Just remember that the more you read kernel code, the easier it
gets. Have fun exploring the kernel!