Mazunki's lair

Better File Hierarcy Structure

Preface

I kind of dislike the standardized FHS of POSIX. There's many things to dislike about POSIX. That said, I do believe standards are very important. Without them, chaos would be the standard.

Now, since I love standards, why am I still creating yet-another-better-standard? [insert xkcd://927 here]

Each day, more people care about the XDG Base Directory standard, which aims to allow users to set up their own paths for all their things. If all programs respected this, we could do whatever we wanted!

We should delegate the job of telling apps where to find their stuff to runtime environment queries. I feel it would be better to have a program similar to command(1) or which(1) to ask for these things.

Several languages/frameworks already implement this. Neovim uses a function `stdpath()` to figure out what the path should be like. Portage uses `doexe` as a wrapper to install executables onto the system. Why is this? The answer is simple: It makes our lives easier.

If you're anything similar to me, you've written several scripts. And you've probably writen something similar to XDG_CONFIG_HOME="${XDG_CONFIG_HOME:-${HOME:-/home/${USER:?Didn\'t find the user}}/.config}", only to then test -d "${XDG_CONFIG_HOME}" || exit 1. This is very ugly. And we need to do it for all the variables! In all my scripts! Why not delegate that work to a single library?

Enough rambling. Let's just suppose we have a functional way to figure out where the paths are. Someone else™ will take care of that.

Considerations and history

Before we move on, let's discuss history, as to not fall victim to the same errors as our past selves did.

The FHS itself, which is honestly better described on the Red Hat documentation pages, has a set of directories for different purposes. I'll describe them shortly here:

/bin, /sbin: User binaries and system binaries
/boot: Bootloader files
/dev: Block device files
/etc: Configuration files
/home: User files
/lib: Libraries
/media, /mnt: Removable devices' mountpoints.
/opt: Optional programs
/root: Root's home directory
/run: Runtime data
/srv: Publicly exposed data
/sys, /proc: Kernel files
/tmp: Temporary data

There's a bit of redundancy in these directories, and some unrequired directories. Most of this comes from historical reasons. The FHS is a bodge on top of a bodge. And it has become standardized out of sheer need.

Directories with executable files

Let's look at the directories containing binaries: /bin, /sbin, /usr/bin, /usr/sbin, /usr/local/bin, /usr/local/sbin, /opt/bin, ~/.local/bin. Excuse me, whAT?! Why are there 8 different bin directories?

And that's without accounting for the modern libexec directories, and /var/lib stuff. Let's digest. The standard is very clear on the purpose for /usr: «It's the second major section of the filesystem, and contains shareable, read-only data. [...] must not be written to». Hold on, so we can never modify our systems, ever? That'd mean we can never update our systems. That's honestly just silly.

When we look at /usr, historically, it used to be mounted from the network on the early phases of the boot phase. It makes sense that our own host should never write to it. It also makes sense that the user should never write to it. Hell, it even makes sense that the system administrator of the host should never write to it. Someone else™ should do it. Someone else, today, means the package manager. Back then, it was the server from where we mounted it from.

What does /usr even stand for today? Or then, for that sake? Some people think it's short for «user», since user files were mounted there from the network due to a lack of storage on the host device itself. Related: «Understanding the bin, sbin, usr/bin, usr/sbin split», Rob Landley». Others claim it's short for «Unix Shared Resources», which is probably just a backronym.

I believe it's important, especially for newcomers to system administration, to understand what the purpose of each directory is by reading its name. We shouldn't need to open up the manpage or the wikipage just to figure out what elemental stuff does.

Since we use package managers today, it only makes sense to rename it to /packages. Only the package manager should be allowed to write to files under it. All files under it should be owned by the package manager's user account. Not root.

What are the /usr/local directories for, then? Per the standard, it's intended for the system administrator to install software locally. Through a modern interpretation, it's a temporary directory for the package manager to use while installing stuff.

Last variable here is the bin vs sbin split. They're mostly the same, except sbin is intended to be used for system management, while bin is intended for user executables. On this note, I believe calling them binaries is silly. Let's call them executable files instead. A picture is also a binary file, but is not executable. I think namespaces are cool, we should do more of those. But not through this arbitrary split.

/opt was a mistake. Let's not discuss it.

Volatile directories

There's a few directories containing temporary files, and although some people prefer more static approaches to it, most distributions treat /dev, /tmp, /run, /proc, /sys as volatile. If you're a newcomer, you'd probably never guess what any of these directories do, except perhaps /tmp.

/tmp is considered a dumping ground for everyone to just throw garbage in, and is thus considered an unsafe directory. Applications should assume stuff there will be deleted at any random moment, and the system should assume anything saved there is hazardous. /run is the system's runtime directory, thus that's where we'll find most of the socket files used by system daemons.

/dev contains block devices. The heck are those, anyway? Mostly they're just... kernel devices. A device is mostly just an instance of a driver. That's quite a handwavy way of putting it, but it's good enough for our needs. We used to have all kernel specific files under /sys, but this was eventually split into /proc for software, and /sys for hardware stuff. I've already said I like namespaces.

Configuration files

No system can be entirely monolithic. Everyone will want to customize their systems, and configuration files are good for that. These days, applications generally store their configuration defaults in /usr/share, and allow users to configure stuff under /etc.

Other than the entirely (apparent) random names used for these directories, I think it's a decent approach, although I believe we should separate host configuration from package configuration. Why? Because most users do not know the difference between «system files» and «program files», and we should make it easy for everyone to tinker responsibly.

Home directories

You might have noticed /home and /root are both home directories, and wondered why the root user is not in /home as everyone else. The explanation, again, is historical. At some point, /usr was actually where we found the home directory. This was before Linux existed.

If a user wants to log in to the system, they need a home directory. Here's the issue: what would happen if the user's home directory didn't exist (because it wasn't mounted, for instance?). Well, they wouldn't be able to log in. This, understandably, was unacceptable for the root account.

I speak in past tense, because I don't believe this is relevant anymore. Let's just set / as the homedir for root, if needed. Also, as a matter of fact, let's rename root to system. I've seen so many people be confused between /root, the root account, and the root path.

The better standard

Because an image tells more than a million words, here's my current revision of the BFHS.

/
├── config
│   ├── environment
│   │   ├── hostname
│   │   └── nameserver
│   ├── paths
│   ├── packages
│   └── system
├── exec
├── info
│   └── logs
├── packages
│   ├── defaults
│   │   └── config
│   ├── exec
│   │   ├── system
│   │   └── user
│   └── libs
│       └── x86_64-pc-linux-gnu
├── users
│   └── mazunki
│       ├── cache
│       ├── config
│       │   ├── environment
│       │   └── paths
│       ├── home
│       │   ├── downloads
│       │   ├── images
│       │   ├── projects
│       │   └── videos
│       ├── appdata
│       ├── runtime
│       └── state
└── volatile
    ├── devices
    ├── hardware
    ├── runtime
    └── processes

Home directory changes

You might have noticed a subtle change in the layout of the user directory. There are no dotfiles, they are literally a historical mistake, and even Rob Pike expressed their opposition towards it in «A lesson in shortcuts».

Regardless, most users don't care about seeing a bunch of garbage in their home directory, but sadly in the average user's home directory there are usually many dotfiles. Some of them are rational, others are not. The XDG Base Directory explains the need for some directories, but I believe the average user should not need to know about these unless they actually do the research.

The rationale for this layout is this: Users have a need for their own local application files. On multiuser systems (which are not as common today on desktops as it used to be), this is a necessary split. Since the user owns the files, it only makes sense to keep them under the user's directory. On the flip side, most users don't want to see the files when browsing their home, since most of these files are handled by wrappers anyway.

When the user is logged in, they'll be sent to their home. Their home is always /users/${username}/home. Most shells have ~/ as an alias for the home directory, and ${HOME} should point to it. XDG_*_HOME directories will generally be located in ${USER_DIR}/, which corresponds with /users/${username}/

Regarding these paths, I believe we should split our concept of environment variables, and path variables. Let's keep using ${envvar} for environment variables. Let's also add @{pathvar} for paths. Here's a list of path variables which should be available from a compliant environment, with their corresponding defaults:

@{user_dir}: /users/${USER}
@{config}: @{user_dir}/config
@{cache}: @{user_dir}/cache
@{state}: @{user_dir}/state
@{run}: @{user_dir}/runtime
@{data}: @{user_dir}/appdata
@{home}: @{user_dir}/home

We can also see a correspondence with the classic ${XDG_*_HOME} variables. If a shell doesn't find these path variables given by the environment, it should raise a warning to the user, but assume the defaults given above. Since many applications still use XDG_*_HOME variables, it's okay for a shell to assume those variables out of the given path variables, and export them when spawning a process.

Using curly braces around the path variable is optional.

Configuration files

We can see /etc has been renamed to /config. Much more readable. Furthermore, we have mainly two mainspaces under it. Each package will have a directory for itself: /config/packages/${package_name}/, or otherwise fall back to /packages/defaults/config/${package_name}.

Here's a catch, though: If a program tries to read any file from under /config/packages/${package_name}/, and it does not exist, it should be the operating system's responsibility to make a copy from /packages/defaults/config/${package_name}. Why? Because we don't want any updates to the package defaults to modify the behaviour of the system after the user got used to some specific configurations. If a package is entirely incompatible with the old configuration, the package manager may suggest or request the user to delete the old directory. This should not happen automatically, and should ideally back up the old configuration in case the user wants to check out their old configuration, or roll back the version.

We can also see /config/environment with two regular files in it. Each file in this directory, represents a variable to be loaded. It is valid to create subdirectories here, in which case they'll be sorted and exported alphanumerically. Ascending values are exported later (meaning .../99-env/ would be the final set exported).

We can see /config/paths which is similar to the environment variables. It is more thoroughly explained in the user directory section. Additionally, ${USER} must correspond to the user logging in, and ${USER_DIR} must be available. Both ${HOME} and ~ must correspond to the user's home directory.

The export order is to be the same for /users/${username}/config/environment/. These values are local for the user, and should be exported on login after the system environment is inherited.

Executable files

There are mainly three locations for executable files:

/exec
/packages/exec/system
/packages/exec/user

Under /packages/exec, we see two namespaces: system and user. The package manager is free to create more namespaces. All files under /packages/exec/${namespace} must be owned by the package manager's user, and the usergroup ${namespace}. The permissions of the executables there must be 750 or 754.

As a consequence, users must be in the ${namespace} group to access the executables of said namespace, but only the package manager is able to modify the files.

Additionally, we have a single /exec at the root node. This directory must be available on boot, and must include, at least, a single init file. All files here must be statically linked, and should have the required files to recover a broken system.

Information directory

This directory is similar to /var. It mainly contains information about what has happened on the system.

Applications and daemons should deliver their logs to a system logger, which should store it in /info/logs/${application_name}/, and optionally keep logs there. The system logger may choose to keep all the backups under /info/logs/${application_name}/, or to create a directory /info/logs.old/ in which elder copies are stored. The backup files should not be deleted automatically, unless requested by the system administrator explicitly. Furthermore, live copies must be stored in human readable text.

Volatile directory

Similar to the current Linux standard, we have /proc and /sys, except we store them under /volatile as processes and hardware. Additionally, we have moved /dev into the same directory as devices. Furthermore, we have moved the system runtime directory into it too, as runtime instead of /run

Addendum

Many of the renames here involve typing out the whole word for what used to be 3 or 4 characters. Back in the old days, two things were true: Storage was a limiting factor, and developers knew what they were working on, while simultaneoulsy being lazy. If you know dev is a shortname for devices, that's perfectly fine. I think this is not pragmatic.

If people don't want to type out the full name for a directory, make your shell do it for you. Set up auto-completion, tabulation, fuzzymatching, or take similar approaches. We should not set up our paths based on laziness. Instead, I suggest using path variables, instead. Instead of typing cd /config/environment, we may just type cd @sysenv if we have this set up in our configuration.

I may edit this document as I see fit.