The Problem With GNU getopt; Or, On Standards

2019-10-28

GNU getopt(3) is broken, the term that they would use is 'nonstandard', but nonstandard interfaces make software less robust, less portable and less maintainable. Generally speaking such interfaces are present to create vendor lock-in. No matter how much GNU "respects your freedom" or allows you to do anything with their software, they are still software vendors who are incentivized to make it harder for us to use other versions of software. To be clear, I prefer GNU's attempts at lock-in over proprietary lock-in, but the consequences are often the same anyways, non-portable and fragile programs.

The getopt(3) function is a POSIX system interface implementing argument parsing according to the Utility Syntax Guidelines. It's widely implemented across programming languages as a sane way to handle command line options like the -l in ls -l. Those guidelines require that POSIX conformant utilities place all their options before their operands. A conformant getopt will stop parsing when the first non-option value is encountered.

The glibc implementation of getopt does not conform to the Utility Syntax Guidelines. In fact, when trying to write a command line C utility, it has some extremely surprising behavior. To illustrate, I present an example that I ran into while implementing the env(1) command for ctools¹. Here, my version of env is linked to glibc:

$ cat /dev/urandom | base64 | env -i PATH=/usr/bin head -n 15
env: invalid option -- 'n'
env [-i] [name=value]... [utility [argument...]]

What? The -n option was clearly for head. What happened? It turns out that glibc's getopt permutes the elements of argv as it scans. Why? To quote glibc's docs:

The default is to permute the contents of argv while scanning it so that eventually all the non-options are at the end. This allows options to be given in any order, even with programs that were not written to expect this.

POSIX demands the following behavior: the first non-option stops option processing. This mode is selected by either setting the environment variable POSIXLY_CORRECT or beginning the options argument string with a plus sign (‘+’).

To summarize, to write a utility conforming to the syntax guidelines with no dependencies other than the system interfaces under GNU, POSIXLY_CORRECT must be set, otherwise, glibc may break the program². In trying to write portable and robust software, there is instead a broken program that doesn't work.

Why do we even try to standardize? We write standards because we disagree. In software, we disagree on what algorithm to use to sort a list, or what programming language to use to write a client to send or fetch and read an email. Standards mean that I don't have to care about what you use in order to know that my message will be readable, or the list I give you will be sorted. It doesn't matter if /bin/true is an empty file with the execute bit set or a C program the entire text of which is int main(void) { return 0; }. They both return a true value that my shell and I can rely on in a script. When I use getopt to implement an option parsing for a utility, I expect it to behave, not break my program.

It's not in GNU's interest to break my program, but it's also not not in GNU's interest to break my program. GNU wants us to write programs for GNU. A proliferation of programs that only work under GNU makes it more attractive to install GNU, and less attractive to install alternatives. There may have been historical reasons for this, GNU's not UNIX after all, but our software should be better. We should expect to be able use more programs in more environments without modifications. A free program is strictly better than a proprietary one as it can be modified to run in a different environment, but not everyone can do that, and those users deserve to be able to use software as they like as well.

Without standards, even de facto ones, alternatives proliferate. When programmers can't rely on the properties of the system, they implement their own. These are generally buggy. We end up with POSIX compatibility implementations for GNU and GNU behaving implementations for POSIX systems. These all get fewer eyes on them than standard interfaces and so can often be buggier than their standard counterparts. Standards can furthermore be validated against specifications and different implementations.

We need to both push on standards and work within them. They are both an artifact and a living process, frustratingly ossified and a rock solid base to build with. We should be collaborative in developing interfaces, actively working to prevent fragmentation in our ecosystems while always being open to innovation. If we do this we can make using standards obvious and freeing, rather than difficult and limiting. In making and using standards, we free ourselves and we free our users, as we and they can be confident that our utilities and interfaces will work robustly, predictably, and everywhere.

ctools is an implementation of strictly conformant core POSIX utilities written in C. The self contained nature of every utility makes it a really easy project to contribute to. Here is the source of my env implementation.↲
I sincerely hope that no other utilities on my system rely on behavior that will change when POSIXLY_CORRECT is set, but who am I kidding? 100% chance that some other program would break.↲