bitwiser.dev A journal of programming insights that made me a bit wiser

Laymen's Guide to the Linux Filesystem Hierarchy

Ethos

It’s a continual goal of mine to gain a more cohesive understanding of Linux. As a small step towards that goal, I set out to understand the philosophy behind its filesystem hierarchy.

If you’re not a Linux wizard, it can be daunting to understand the intention behind its tersely-named directories: three-to-four letter names like opt and etc don’t always make it easy to grok their purpose.

I’m not a Linux wizard myself. Though, I strive to understand its madness on my journey to become a more holistic software engineer. Beyond shear curiosity, Linux exhibits many clever design decisions that may influence the way you approach other problems (see my previous post about applying unix philosophy to your note-taking setup).

I wrote this layman’s guide to not only solidify my own understanding, but to explain the basics of each directory to the other Linux laymen of the world as well.

Let’s dive in.

/

In a linux terminal, / denotes the root directory: it’s the highest point in the hierarchy that houses all other directories below it.

Underneath root, you’ll find a collection of directories each with their own purpose:

.
├── bin
├── boot
├── dev
├── etc
├── home
├── lib
├── media
├── mnt
├── opt
├── proc
├── root
├── run
├── sbin
├── srv
├── sys
├── tmp
├── usr
│   ├── bin
│   ├── lib
│   ├── local
│   ├── sbin
│   ├── share
└── var
    ├── log
    ├── tmp

Let’s explore what each of these directories are for.

NOTE: This guide is not comprehensive of every directory in the Filesystem Hierarchy Standard, but I’ll attempt to cover most of them to the best of my ability.

/bin

/bin contains many of the common built-in executable programs such as mv, ln, etc. These executables are essential building blocks that the system requires to minimally operate.

If you echo $PATH, you’ll notice that $PATH includes /bin in its string of directories. When you type a command such as ls, $PATH instructs your system where to look for an executable program that matches the given command. As such, /bin is likely where your system finds and runs a lot of common commands such as ls, mv, and otherwise.

/boot

/boot contains files necessary to the system boot process.

NOTE: The Linux boot process is outside the scope of this article. For now, just be aware that this directory is imperative to it.

/dev

/dev is a virtual file system that contains device files. To understand what a virtual file system is, see /proc.

Device files provide an interface that allows users to interact with devices. Typically, they represent physical devices like disk drives, modems, RAM, etc.

Though, they are also pseudo device files that implement the interface but do not represent actual hardware. For example, /dev/urandom is a “device” file that attempts to gather random environmental noise from device drivers, populates an entropy pool, then generates random output from it:

~$ cat /dev/urandom
uKϯՙ, ;-3MǦ)bL/c0qbUG Xh:8IES

/etc

/etc contains system configuration files. Generally, these files are not executable but simply configure how your system operates.

For example, /etc/hosts contains IP addresses that map to host names. When the system tries to fetch a URL such as foobar.com, it contacts a DNS server to resolve the host name foobar.com to an actual IP address. Before it does that, the system checks if the host name exists locally in /etc/hosts.

Let’s say I have a local web server running on port 5555 that I want to address by foobar.com. If I add 127.0.0.1 foobar.com to /etc/host and curl foobar.com:5555, the system resolves that host name to my local web server due to my entry in the configuration file.

This is just one example, but /etc contains many other files that dictate system-wide configurations.

/home

/home contains a directory for each user on the system. A user’s home directory contains personal files as well as user-specific configuration files for applications on your system.

These configuration files are often referred to as dotfiles since they start with .. For example, ~/.bashrc and ~/.bash_profile configure how your console behaves.

NOTE: ~/ is simply a shortcut for /home/yourusername/.

/lib

/lib contains shared library images and kernel modules that help the system boot.

Shared libraries contain pieces of C code used by programs to perform common tasks.

You can easily identify shared libraries by their .so or .a suffix: .so signifies a dynamic library while .a signifies a static library.

Executables in /bin and /sbin often include these libraries when they link or compile to leverage the code they contain. For example, libc.so.6 is a dynamic library that contains many standard C functions such as malloc, read, open, etc. If an executable wants to open a file via open, it links this shared library.

NOTE: The numbers that follow .so denote the shared library’s version.

NOTE: Dynamic vs. static libraries is outside the scope of this article. For now, just be aware that they exist.

Kernel modules contains pieces of compiled C code used by the kernel to extend it’s functionality.

You can easily identify kernel modules by their .ko suffix.

To put it simply, they are special shared libraries used solely by the kernel.

/media

/media contains external file systems from removable devices that the system may automatically mount for you (ex. USB flash drives, CD-ROMs, etc.). This is opposed to /mnt where you manually mount external file systems yourself.

Mounting is the process that makes external file system available to your system.

/mnt

/mnt contains external file systems that you temporarily and manually mount yourself. This is opposed to /media where the system may automatically mount removable media such as USB flash drives, CD-ROMs, etc.

For example, you could mount a remote file system over SSH in this directory for easy access.

/opt

/opt contains third-party software that do not exist in the default installation. These programs typically do not abide by Unix standards. As such, they are usually self-contained and live in a single directory named after itself.

/proc

/proc is a virtual file system that represents information about running processes and other system data. The files it contains aren’t actually real: they are an abstraction used to represent information about your system.

To reiterate, these files are not actual data stored on disk: ls -l /proc shows that the size of almost every file is 0. Let’s explore why.

On Linux, everything is a file. A file is simply an interface to a stream of bytes and does not necessarily represent static data that lives somewhere on disk.

Why does Linux abstract the concept of files to such a wide definition? Think about it from the perspective of a programmer: it’s simpler to write code and tools for a single interface rather than n arbitrarily different interfaces. At the end of the day, you just want to interact with data and probably don’t care where it actually lives.

/proc leverages this interface to make data about your system accessible. Specifically, it contains a directory named after every running process ID or PID. For example, let’s locate the corresponding folder for cron by it’s PID:

~$ ps aux | grep cron
root       798  0.0  0.1  23652   636 ?        Ss    2019   0:07 cron
...
~$ ls /proc | grep 798
798

This directory has files that contain information about PID 798. For example, let’s print the contents of /proc/798/status to see data about the process’ status.

~$ cat /proc/798/status
Name:   cron
State:  S (sleeping)
Tgid:   798
Ngid:   0
Pid:    798
...

This example just scratches the surface. For now, just be aware that /proc is a special directory that let’s you access data about your system.

/root

/root is the root user’s home directory. It serves the exact same purpose as /home/yourusername but it exists as a top-level directory instead of nested under /home.

/run

/run contains temporarily files created by running programs. It’s a fairly new directory established in 2011 that’s similar to /tmp but slightly different. From my understanding, it limits privileges a bit more than /tmp (which can be a bit of a wild west).

NOTE: Since /run is meant to be ephemeral, your system clears its contents on reboot.

/sbin

/sbin contains many built-in executable programs such as reboot, ln, etc. It’s very similar to /bin, but unlike /bin, $PATH does not typically include it because the executables concern system administration (and thus should not be universally discoverable).

For example, reboot is an executable that restarts the system. Since a system reboot is a very administrative action, it lives in /sbin away from less disruptive programs like ls.

NOTE: From my understanding, /sbin was not split off from /bin for security reasons. In fact, all users likely have read and execute permissions on its executables. It’s merely a way to separate common everyday utilities from system administrative ones.

/srv

/srv contains data for specific services that require a self-contained directory. You’ll often find the subdirectories named by protocol such as ftp, rsync, www, etc.

For example, FTP may create a /srv/ftp directory here. FTP stands for File Transfer Protocol which allows other computers to upload or download files from a server. By default, it may host /srv/ftp as a directory to upload and download files.

/sys

/sys is a virtual file system that represents information about devices, drivers, the kernel, and more. To understand what a virtual file system is, see /proc.

For example, you can inspect the size of the CPU’s L1 cache:

~$ cat /sys/devices/system/cpu/cpu0/cache/index0/size
32K

/tmp

/tmp contains temporary files created by running programs. For example, some programs may create lock files here to ensure that only one thread of execution performs a certain task at a time. Other programs may create files here to store temporary data that only needs to exist for a short period of time.

NOTE: Since /tmp is meant to be ephemeral, it’s likely that your system clears its contents on reboot (but this behavior is not guaranteed).

/usr

/usr likely holds the lion’s share of data on your system: it contains many user-space programs and data.

You can think of user-space as a “sandbox”: programs run with lower privileges than those that run in kernel space. This is a bit of an oversimplification (and frankly, a topic that deserves it’s own blog post)– but for now, just be aware of the distinction.

/bin

/usr/bin is similar to /bin, but contains executables that are not required on boot.

/lib

/usr/lib is similar to /lib, but contains libraries that are not required on boot.

/local

/usr/local contains third-party software that does not exist in the default installation. If you think this sounds very similar to /opt, you’d be correct!

From my understanding, the topic of whether to use /opt vs. /usr/local is a classic holy war amongst Linux wizards. I won’t get into the weeds in this post, but this article documents a fun debate on the topic.

/sbin

/usr/sbin is similar to /sbin, but contains system administrative executables that are not required on boot.

/share

/usr/share contains files (ex. fonts, manual pages, etc.) that do not depend on your system’s architecture (ex. x86_64, ARM, etc.).

Different companies manufacture CPUs that understand different sets of instructions (or “architectures”). As such, programmers often compile their program to support multiple architectures, and place non-compilable files in /usr/share since they are architecture-independent.

From my understanding, this folder is a bit of a relic. In the past, system administrators would share this directory across networks of machines to save space. Though, disk space is cheap these days so people rarely do this anymore.

/var

/var contains variable data files (as opposed to /usr which contains static data files). Programs often use this directory to record runtime information.

/log

/var/log contains log files that document the chronology of events that occur while a program runs.

For example, if a user fails to authenticate and you’re not sure why, you could tail -f /var/log/auth.log for some hints.

/tmp

/var/tmp is similar to /tmp, but preserves files between reboots.

Addendum

Filesystem Hierarchy Standard

The Filesystem Hierarchy Standard, or FHS for short, is a set of guiding principles written by the Linux Foundation to standardize the location and purpose of directories and their associated files.

I sourced much of my knowledge from the FHS (supplemented by information from other sources as well). If you’d like to know more or learn about a directory I did not cover here, seek out this standard for more information.

Archaism

When you learn about Linux, it’s important to understand that it continually evolves since its release almost 30 years ago. As a result, archaic design decisions may persist alongside newer iterations of the same idea.

For example, /proc started as a virtual filesystem to contain information about running processes. Over time, it exposed system-level data as well (ex. cpuinfo, etc.). Since this cluttered /proc’s original focus on individual processes, /sys was created to house system-level data exclusively. Though, a lot of the same system-level data still exists in /proc to maintain compatibility with programs that expect it there.

Linux was not immediately created in its final form, and often lives in a transitional state between old and new ideas. So if the distinction between certain directories feels unclear, it may be a result of this continual evolution.

Beyond the context of Linux, archaism is everywhere.

JavaScript was popularized as the language that all browsers understand despite its notorious reputation. To address the flaws of JavaScript, TypeScript was invented: a language that sits “on-top” of JavaScript. If you did not know the history of JavaScript as the arbitrarily-standard language of all browsers, the fact that TypeScript compiles to JavaScript might seem incredibly bizarre. Only recently did we standardize WebAssembly, an assembly language for the web, that lets browsers break ties with JavaScript entirely!

My point is that almost no standardized software was made in its perfect form. In order to maintain backwards compatibility while iterating towards a better future, icky bits of archaism may hang around longer than we like.

The realization that software is incredibly human has vastly improved my ability to synthesize new information and empathize with the way things are rather than the way they should be. The sooner you realize this, the clearer the picture becomes and the better of a software engineer you will be.