Mazunki's lair

Stop using spaces, it's silly

The argument of tabs versus spaces is an ongoing one, and has been going on for a long time. The fight, mostly, is about which character to use for indentation. I think part of the reason why people can't agree on this topic is because there is no alternative solution. I will summarize the arguments each side use, and then suggest a new character set altogether.

Regarding tabs...

The school of thought who will defend the tab as the correct unit of indentation is because of the fact that it's one unit. One character per unit of indentation feels natural. If you want to move some code out of a function, you could simply delete one character. You don't need to think about how many characters there are per level of nesting. Furthermore, you can simply count how many tab characters there are to see how deep the current code is nested.

I believe most people agree with those benefits, and wouldn't argue against them, per se.

Another one of its benefits is that people can choose to display indentation however they want in their viewer, which matters when people have monitors of different sizes. On the flip-side, that's also one of its downfalls. If the developer had configured their code viewer to display tabs as four monospace units, it looked fine on their monitor, and they're happy with the outcome. When suddenly another developer uses 8 characters for their tab width, it looks weird, and nothing is aligned anymore.

This issue can partially be circumvented in vim with a modeline specifying how the file should be formatted:

// vim: tabstop=4 shiftwidth=4 softtabstop=4 noexpandtab

If this is placed at the beginning or the end of a file, vim will detect it as the rules to use for this file. Other editors may also support it.

Of course, if your editor does not support a modeline, you're screwed, and the rules which the codebase use wouldn't apply to you... causing a +1238 -1234 diff when requesting to merge your changes. Please don't be that guy.

Regarding spaces...

Some developers, armed to their teeth, will defend spaces as the go-to approach for formatting code. This approach is a much "simpler" approach to consistent code across the codebase. Forget about editor choices causing formatting issues, forget about meaningful characters, forget about any of that weird whitespace which isn't displayed properly across your printer and your terminal emulator.

A big issue with this is accessibility: not everyone has a large monitor, not everyone can count in units of four. Some people prefer 8 width for their indents, since they can read it more easily. Other people prefer to display it with 2 spaces. Congratulations, you just enforced your own font taste onto 8 million readers.

My own takes on all this

For the longest time, I have been a massive defender of the tab as the perfect indentation unit.

Over time, I have learned that there are many cases where the unit length of a "tab" actually does make a difference, and it's important to preserve it. At the beginning of the line this generally doesn't matter (unless you're doing ASCII art —which is quite fun!—), but it does matter when using symbol names with different lengths, such as in multiline arguments, or dictionaries/hashmaps.

The solution of elastic tabstops suggested by Nick Gravgaard on his website is similar to how I maintain my own formatted keyboard layout's source code.

By figuring out the longest length of a symbol, I can set that to my modeline, and just use tabs anywhere. Of course, this causes the indentation at the beginning of the line to be of that width per nesting level too. A compromise I'm willing to take when there's only one tab unit there anyway. On an unrelated note, having more than two levels of nesting is often a sign of code smell. Abstract those nestings away into a function!

Unicode Consortium, please help

Let's make one thing clear. All this fighting are formatting changes. They do not affect the logic of your program (unless you're called Python or Brainfuck). Even in the case of Python, part of the issue comes from using the same character for multiple purposes.

We are accepting all kind of emoji into the standard, why not insert some new characters into the standard? Basing off of the elastic tabstops and several ~~fights~~discussions I've had about this topic, I will compile a list of different characters which could be useful:

Scope Unit: A level of indentation, should only be seen at the beginning of a line.
Local alignment unit: As seen in the elastic tabstops, alignment units should be placed at the same column across several lines. The scope of the alignment unit is only in consecutive lines. If a line with no alignment units appears, it should clear the column positions.
Global alignment unit: Same as the alignment unit, but spans a whole file. If a local alignment scope contains any global alignment units in-between, both must be respected.
List separator unit: To be used in function calls, function declarations, lists, dictionaries, e.g. Each list separator unit marks a new symbol in the list. Traditionally this has been comma+space—and can be displayed as such by your editor—. Can also be used in shells to mark a new argument. Why escape textual spaces when they don't need to be syntactically meaningful anyway? :)
Token separating unit: Since there is sometimes a need to add spaces, such as between types and symbol names, we need a symbol which doesn't carry any meaning except for separating tokens.
Decorative space unit: As a last resort, we may use a space just for decorating our source file. Must be represented as one character width. This may be used for ASCII art, for instance, or in-code diagrams, after the corresponding local alignment units. May also be used in argument lists, or similar, if other methods are insufficient.

That's quite a few different characters. You might quickly think «that's too many characters for my silly little keyboard!», and you'd be right. My answer to that is simply: get a better keyboard, get a better editor, or get a better input method on your system. You already use <Tab> to insert four spaces, anyway.

As a last resort, you can set up a pre-commit/pre-save hook to convert any "leftover spaces/tabs" into the appropriate meaningful whitespace, based on the Concrete Syntax Tree of your code. If a human can figure out what a space means by simply looking at the code, a decent LSP should also be able to figure it out. In fact, using different types of spaces in the source code will help the tokeniser recognise fields quicker —in theory—.

so what about U+0009?

Using the tab character for auto-completion, website navigation, and other forms of mechanisms seems alright. After all, many of the control characters in the ASCII table are used for similar stuff.