Devs on Acid

Using sabotage packages on other distros

27 Jan 2025 15:19 UTC

Introduction to prefixes

With simplicity as a primary goal, sabotage adopted from the start a new old directory layout - the same one as used on early UNIX. That corresponds to what you think of as a "prefix" defined as "" or alternatively "/". Meaning that there's a /bin, but not /usr/bin. Same for /include, /lib, and so on.

Not only is it shorter to type, it's also easier to navigate a filesystem that only has a single bin directory, instead of 4 or more. I refer to /usr/bin, /usr/sbin, /bin/ and /sbin that exist on popular linux distributions and even on BSDs.

The practice of having a separate /usr stems from historic limitations on multi-user systems, whereas the typical linux user today has a dedicated home computer at his disposal where he is the only user. Yet most or all popular modern linux distros use /usr as the prefix for their packages.

The other common choice for the prefix option in software build systems is /usr/local, set as default in GNU autoconf. Its purpose is to allow the user to build custom packages from source without having them interfere with other libraries installed via the official package manager, typically as binary builds.

Since I anticipated from the start of my work on sabotage that some users might want or need a different "prefix", all packages were written using an overridable butch_prefix variable. Nobody ever used it though, so there were a couple of minor bugs that I fixed recently when I first needed it.

In order to properly integrate our packages into a foreign distro environment, this is the key ingredient.

Use case

My task was to set up a lightweight desktop environment on a raspberry pi using the Raspberry Pi OS (installed from "lite" version). There's only very limited amount of space on the SD card it uses, so I opted for using my favorite window manager openbox, combined with lxpanel. That's enough to get 90% of a desktop environment's functionality at a minimal resource cost.

At first I installed openbox from the distro repos, which had far too many, but still an acceptable amount of dependencies. Then i typed apt install lxpanel and was confronted with 50MB of dependencies, including gtk3 AND gtk4 in addition to the already installed gtk2, but more importantly, it would pull in and install pulseaudio!

The obligatory rant about pulseaudio

In case you missed it, pulseaudio is one of the first successes in Lennart Poettering's quest to ruin the Linux Desktop Experience. A Red Hat employee at the time, now on Microsoft's payroll, he developed a new sound system for Linux that works by injecting a pre-loaded DLL into processes and overriding library functions to use his pulesaudio daemon. What could possibly go wrong? The trojan horse was eagerly adopted by fedora in its infancy and then widely propagated to other distributions.

At that time, I happened to upgrade my fedora install to the version that first shipped it (fedora 9), and it took me almost a week to get sound working again!

I spare you the details of the hoops i had to jump, though the solution was to get rid of pulseaudio entirely.

Whenever you have issues with audio on Linux, look no further than pulseaudio. It came right in time as the Linux desktop gained considerable traction.

Other well known Poettering projects are dbus and systemd; rammed down our throats in a similar manner. Whenever Poettering announces a new project, you can be sure it will not take long to find its way into your distro. Unless you use sabotage linux.

One size fits all

The fundamental problem with binary distros like debian is that they need to provide a single source of truth for every library and application. If there's an editor with optional support to detect a hot-plugged printer, then that optional support will be built in, including all 20 dependencies needed for this feature. The feature wouldn't exist if there was nobody in the whole world wanting to use it.

The result of that "all-options-enabled" policy is that these distros are horribly bloated and will fill up your hard disk at break-neck speed.

"Not a problem, just buy more RAM and a bigger SSD!", you may say, to which I reply "Thanks for the offer. Here's my bank account number."

Build-from-source distros like Gentoo let you toggle optional dependencies on and off to your liking. Such a mechanism also exists for a select few packages in sabotage, even though we generally go for the "min deps, max usefulness" aka "best bang for the buck" configuration.

Back to my use case...

Anyway, it was clear to me that I will NOT install pulseaudio and let it fuck up the audio subsystem! And I certainly don't need 3 GTK versions installed just to orchestrate a couple of xterm windows.

What are the options to rectify the situation and get lxpanel without those dependencies?

Here's what I could think of:

figure out which dependency pulls in the undesired ones, then get the source packages for those, patch them or modify their configuration, and install them using the Debian-provided tools so the replacements appears like regular upstream packages. Turns out that this is a lot of work, involves a lot of reading, and it's not even simple to find out where the dependencies come from! For example it was suggested to use apt-rdepends, but this doesn't even list pulseaudio.
Try to compile lxpanel by hand, and whenever it fails to find a dependency, either get the dependency from apt (if reasonable) or compile it as well. This also involves downloading all required source tarballs manually. Following this route involves a lot typing, trial and error and finally results in a one-off success. If you ever need executing the process again it's the whole work all over.
use the existing recipes and patches from sabotage which are specifically tuned to my preferences, using the existing build manager for automation.

It should come as no surprise that I went for the third option.

Getting our hands dirty

The first thing we need is obviously the sabotage package recipe repository and the build manager butch. Since butch is written in portable shell and lives in the sabotage repo, both are only a single git clone away.

git clone https://github.com/sabotage-linux/sabotage

After I cloned the repo, and since we will need root powers anyway, I moved it to /src - basically mv sabotage /src.

butch lives in KEEP/bin and can be readily executed from there. The only thing it needs is a config file. The repo has 2 templates one could use, KEEP/config.stage1 and KEEP/config.cross. I chose the former because half of the latter is boilerplate for cross-compiler setup. Since we compile natively on and for debian, we can use the existing system toolchain.

cp KEEP/config.stage1 config

after a few edited lines, my config looks like this:

export A=aarch64
export K=/src/KEEP
export S=/src
export R=/
export C=/src/tarballs
export LOGPATH=/src/logs
export MAKE_THREADS=4
[ -z "$CC" ] && export CC="gcc -I/usr/local"
export HOSTCC="$CC"
[ -z "$CXX" ] && export CXX="g++ -I/usr/local"
export HOSTCXX="$CXX"
export BUTCH_BUILD_TEMPLATE="$K"/butch_build_template.txt
export BUTCH_DOWNLOAD_TEMPLATE="$K"/butch_download_template.txt
export STAGE=1
export butch_prefix=/usr/local

That's it. the STAGE=1 thing is needed so butch is aware that we're past the bootstrapping stage. The C and C++ compiler's env variables are exported with -I/usr/local/lib added, so they will find the headers of things we compile in addition to those in the system header directory.

The butch_prefix is set to /usr/local so stuff we compile gets installed there and doesn't interfere with system libraries. Actually things will be installed in /opt/packagename and then linked into the root directory. For lxpanel, the binary will live in /opt/lxpanel/usr/local/bin/lxpanel and be available as a symlink /usr/local/bin/lxpanel pointing to the former.

As I don't like to type out the full path to the butch script, every time I use it, i will install it into a directory in $PATH:

cp KEEP/bin/butch* /usr/bin

If we would execute butch at this point, it would assume nothing is installed and build the entire dependency tree for lxpanel, including a C compiler. The reason for that is that its package installation database /var/lib/butch.db is still empty. In fact it doesn't even exist yet. Let's fix that.

butch deps lxpanel | awk '{print $0 " 1"}' > /var/lib/butch.db

The strategy here is that we will make butch believe everything is already installed and then chip away dependency after dependency until the db mirrors adequately what's already installed by means of distro packages, and what's yet needed to be built.

/var/lib/butch.db is a plain text file consisting of "packagename packageversion" tuples on each line. The command above created the file populated with all dependencies of lxpanel, including itself, and the respective package version of 1.

Not having the latest version here is fine as long as you don't call butch update which would naturally try to update all installed to packages to the latest package version.

Next step is to remove the line with lxpanel in butch.db, so we can try to build it and see which dependencies aren't there.

sed '/^lxpanel .*/d' -i /var/lib/butch.db

With that done,

butch install lxpanel

the output looks roughly like:

...
2025-01-26 22:43:37 building lxpanel (/src/build/build_lxpanel.sh) -> /src/logs/build_lxpanel.log
2025-01-26 22:43:38 WARNING: lxpanel failed to build! wait for other jobs to finish.
2025-01-26 22:43:38 done.
got 0 download errors and 1 build errors.
failed to build: lxpanel
time spent: 00:00:01

As expected the build fails and the log file contains the hint that libfm is missing. So we remove the libfm line in butch.db (so butch will try to build it) and try again.

sed '/^libfm .*/d' -i /var/lib/butch.db
butch install lxpanel

this time, libfm fails to find GTK+2. Since GTK+2 is already installed on the host debian, this means it's lacking the development header files we need for source builds. So let's install those headers.

apt install libgtk2.0-dev

You need to draw the line somewhere which packages you use from the distro repo or which ones you build yourself. For me, that's:

for everything that's already installed, I'll install -dev packages as needed
stuff that's very common and highly likely to be needed in the future for other parts of the system that I'd rather not touch. an example is libxml2.
parts of complex systems such as the plethora of X11 components. I'm not gonna install my own version of libxcb or libx11 and mess the entire stack up.

Continuing with our build:

butch install lxpanel

libfm package fails to build, because it wants libfm-extra installed first and we only pretended it's there. So we remove the libfm-extra line from butch.db. Rinse and repeat.

At some point it can't find libxml2, and as I mentioned earlier, this one we're gonna take from upstream.

apt install libxml2-dev

After half a dozen build failures, log checks to figure out which dependencies we have yet to build and removing their corresponding lines from butch.db, the lxpanel build finally succeeded.

And works; except that on first start the GLIBC dynlinker complained that it couldn't find the libraries we built. Since we're in GNU land, the solution is to run ldconfig. Optionally, or on a non-glibc system, one could export LD_LIBRARY_PATH=/usr/local/lib before running the binary, or by making a wrapper that looks like

#!/bin/sh
LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/real-lxpanel "$@"

The only thing left to do is to wire up lxpanel in the openbox config so it's started automatically when you startx.

Post or read comments...

How to create a minimal debian rootfs

20 Nov 2021 05:09 UTC

Because Ubuntu has removed support for the i386 arch, I was exploring to use a Debian rootfs in order to install wine for temporary usage (wine doesn't work well with musl libc because it depends on non-portable glibc dlclose() semantics) and last time I tried to compile it it was a huge shitfest. If I was to use a 64-bit Ubuntu rootfs for this, I'd have to install many libs that are already installed for 64bit in a 32bit version, i.e. double- bloat.

Debian, unlike Ubuntu, doesn't ship a minimal base rootfs. However, one can quite easily create his own using the debootstrap tool, which consists of a single portable shell script and a directory with some shared data.

1) acquire debootstrap.

2) run the following command as root in your host distro:

DEBOOTSTRAP_DIR=XXX/usr/share/debootstrap/ XXX/usr/sbin/debootstrap --arch=i386 --variant=minbase sid DIRECTORY http://deb.debian.org/debian/

where XXX is the prefix you installed deboot and DIRECTORY where stuff is being installed to. The resulting rootfs will be around 220MB in size. Output from the above command

3) trim the fat part1, installation leftovers:

rm DIRECTORY/var/cache/apt/archives/*.deb
rm DIRECTORY/var/cache/apt/*cache.bin
rm DIRECTORY/var/cache/debconf/*.dat-old
rm -rf DIRECTORY/var/lib/apt/lists/*

4) trim the fat part2, unneeded documentation and translation:

rm -rf DIRECTORY/usr/share/doc/*
rm -rf DIRECTORY/usr/share/locale/*
rm -rf DIRECTORY/usr/share/man/*

Now your rootfs is tidied up and should be around 90MB.

6) edit DIRECTORY/etc/dpkg/dpkg.cfg and add the following 3 lines:

path-exclude=/usr/share/doc/*
path-exclude=/usr/share/locale/*
path-exclude=/usr/share/man/*

this will prevent future package installs to install unneeded things.

I have manually diffed the contents of ubuntu-base rootfs and the one created using these instructions (by looking at /var/lib/dpkg/status), and the following packages are only in the debian rootfs:

gcc-11-base gcc-9-base libcap2 libext2fs2 libgssapi-krb5-2 libk5crypto3 libkeyutils1 libkrb5-3 libkrb5support0 libnsl2 libssl1.1 libtirpc-common libtirpc3 libxxhash0 tzdata.

(full comparison)

These might not actually be needed and you may want to try to remove these on first usage of the rootfs and see whether something breaks. At least removing tzdata seems to be safe.

Prepping the rootfs for chroot use

In order to use the rootfs with a tool like bubblewrap, the following additional steps are necessary:

remove bogus _apt user that gets into the way when running inside a container by editing DIRECTORY/etc/passwd and removing the last line.
patch gnu tar to prevent it from trying to chown files.

process is documented here. In case of a i386 rootfs at the time of this writing, you gotta replace e81decfeff8d93782400008dbb7424000085c00f94c0 with e81decfeff8d93782400008dbb7424000031c0909090 in the /usr/bin/tar binary using a hexeditor (recommended: hexedit).

If you have chown-related problems installing a specific program with apt, use my tool idfake (link in the above article).

Post or read comments...

10 years Sabotage Linux - The history of the first musl-based distro

9 Apr 2021 12:02 UTC

On Monday, 5th April 2021, Sabotage Linux had its 10 years anniversary since its first commit by Christian Neukirchen aka chris2, nowadays known as Leah Neukirchen.

It consisted of the following packages:

linux 2.6.38.2
binutils 2.21
busybox 1.18.4
gnu make 3.82
gcc 3.4.6 (used only for stage0 bootstrap)
gcc 4.5.2 (stage1 compiler) plus its dependencies gmp, mpfr, mpc
perl 5.12.3
gawk 3.1.8
gnu sed 4.2.1
gnu m4 1.4.16 (plus a patch to disable GNUlib fallout)
musl from a forked git repository, roughly equivalent to musl 0.7.6

The build procedure consisted of building a stage0 rootfs, containing a musl targeting C compiler toolchain and a stripped down busybox binary that was barely sufficient to make it possible to chroot into the rootfs and build the rest of the packages without leaks from the host environment. GCC 3.4.6 was chosen as the stage0 compiler because it doesn't require 3rd party libraries like mpc, mpfr, gmp (these were added as a hard dependency to gcc >= 4.3), and because it is much slimmer than GCC 4+ and therefore faster to build.

Once inside the rootfs, GCC 4.5.2 was built (because the linux kernel required extensions only available in recent gccs), GNU m4 (required as a prerequisite for almost every package using GNU autotools), and GNU awk/sed because the sed and awk offered by busybox were too buggy or didn't support extensions used by other package build scripts. Perl was built because the kernel build system required to execute some perl scripts. Also busybox got built a second time with a bigger set of applets.

Everything was built with a set of shell scripts, and there wasn't even an init system yet, so sabotage could only be used inside the chrooted rootfs.

The next day with commit 6fc138a the init system based on busybox runit and the necessary /etc infrastructure was added to make sabotage almost bootable on bare metal. The boot loader extlinux was added a day later, in commit b57b0f1, marking the first version that could be booted. During the next days a couple of tweaks and packages were added (zlib, openssl, git) and on April, 9th the first bootable binary distribution was released.

Shortly after that release some people that tried out sabotage suggested improvements and Chris switched to a build system based on plan9's mk. The editor vim was added, replacing busybox's vi, as well as a couple of other packages including ncurses, bsdtar, xz, automake, and openssh. On 13th of April, another binary release was shipped that was already much more usable, one could even SSH in!

The following days saw addition of python 2.7.1, expat, and a basic set of packages to support Xorg display server. the musl package recipe was switched to the official git repo.

Sabotage was the first distro built upon musl libc and was crucial for getting musl to a point where it could be used with opensource packages from a variety of sources. During the early development chris2 was communicating issues he encountered on a hourly basis to dalias, the musl libc author and the issues got fixed almost immediately. Getting mainstream packages to build usually required hacks and patches, because back then GNU/Linux was a GLIBC-centric monoculture, and many packages used either GLIBC-specific extensions, or worse: private __ prefixed symbols and types that were never meant to be used outside of internal libc code.

Release 2011-04-18 was the first sabotage release that could be used with basic X11 windowing, and when I first tried it out. I've been idling in the #musl IRC channel (at that time consisting of 8 regulars) since a couple weeks, following its development, mainly interested in it due to its ability to create static linked binaries with minimal footprint.

The musl C library

musl is a libc, that means the C standard library. It provides the functions and headers dictated by the C standard, as well as those mandated by POSIX. It's the most important component in UNIX userspace, as almost all software interfaces with it. By doing syscalls to the Linux kernel, it's the interface between userspace and kernel space. The standard in 2011 was the GLIBC libc.

Rich Felker, musl's author, was frustrated with GLIBC because it was designed around a central idiom: dynamic linking. Static linking was only possible to a limited extent (no use of network code involving DNS lookup, as that would pull in the dlopen()ed GLIBC framework that allows to use different name lookup backends), and with a giant footprint: even a simple "hello world" resulted in a 500 KB binary. Another thing that frustrated him was the unreadability of the GLIBC source code, where often the real definition of a function is hidden behind numerous layers of abstractions.

Rich was writing his own libc on an as-needed basis to get the applications he's been interested in into a tiny static executable that could be transfered from one PC to the other and executed without having to ship a bunch of dynamic 3rd party libraries. He's been working on this for a couple years, when he decided in February 2011 to publish the first release 0.5.0 for the i386 architecture.

Musl was already of such high quality back then that he immediately attracted contributors, the first one being Nicholas J. Cain who contributed support for x86_64 already within the first week of musl's public release. Other early adopters tried it out on their favorite programs and reported issues, which were fixed almost in real-time. From 0.5.0 up to 0.7.9, a new version was released typically in less than one week.

Musl 0.7.11, released on June 28, 2011, was the first version to feature a dynamic linker.

Before that, only static linking was available.

My involvement

I started using sabotage after the 2011-04-18 release, mainly to have an isolated rootfs environment where I had a compiler toolchain targeting musl to build my own programs with it.

The alternative was to use the musl-gcc wrapper script, part of the musl release. It was a shell script starting gcc with the right options to pick up musl's include and library directories instead of the host GLIBC ones.

Chris2 continued work mainly to add support for 32bit x86, as sabotage up to that point targeted only the x86_64 architecture, culminating in the 2011-04-30 release. Then he stopped doing any work on sabotage for about 2 months.

At that point it had become clear that we would need dynamic linking at some point, because of the following reasons:

The Xorg stack consisted of dozens of libraries designed for dynamic linking. Even though Chris2 managed to get the X server statically linked, it resulted in a huge monolithic binary, but it couldnt load additional xf86 drivers.
Python and perl depend on dynamic linking to load extensions/packages written in C via dlopen(). perl offers a way to statically link those extensions into the perl binary, but that requires to relink the interpreter whenever a new extension has to be added. AFAIK, python doesn't offer anything equivalent.
Some applications, like e.g. weechat, are designed around a dynamically-linked plugin system that are loaded on an as-needed basis via dlopen().

I suspect Chris2 had never intended for sabotage to support dynamic linking, and was frustrated with this situation, causing him to abandon the project. Meanwhile the sabotage users Josiah Worcester aka pikhq (nowadays known as Ada Worcester) and myself were pushing Rich to write a dynamic linker so the above could be addressed, resulting in the 0.7.11 release of musl. It was pikhq who then added basic dynamic linking support to his sabotage fork, which I ultimately picked up to start hacking on sabotage on my own, after a PR I filed in early may was ignored for almost 2 months, and nothing else happened on Chris2's repo. Meanwhile pikhq started to work on his own distro project called bootstrap linux.

It was on July 19th that I decided to do my own thing, as there were a couple issues I had with the upstream way of building packages and reckoning that upstream seemed pretty much dead anyway. First I fixed a couple things that were buggy, and added a couple of packages. 2 days later pikhq merged my changes into his master, and another 2 days later Chris2 suddenly started hacking on sabotage again, yet he merged only the changes made by pikhq but none of mine. He bumped a couple of package revisions, but his activity on sabotage stalled for a second time on July 29th, this time for good.

The major problem I had with the old build system was that it didn't pick up from where it left off when things went wrong. One had to compile the whole set of packages over and over.

I had previous experience with a build-from-source package manager on MacOS, MacPorts, and was frustrated that it did things in a strictly serial way. For example, if one was to build a package with 10 dependencies, the first dependency would be downloaded, then built, then the next dependency downloaded, and so on. Clearly, if one has a slow internet connection, it's much preferable to download package2 while package1 is building, or even better, completely detach the download and build steps, and try to download several packages at once to saturate the available bandwidth, and start building as soon as the first download is complete.

This was the design I had in mind for my new package manager called butch.

Mid September, when it became clear that Chris2's sabotage was abandoned for good, I started hacking on butch and it was finished the next day, on 19th.

The package manager had a new package format, that was composed of ini-style sections, with a build section containing the build instructions, and another section listing download mirrors for source code release tarballs and their checksum for integrity checks.

For greater flexibility, the build section was merged with a shell-script template and executed once the tarball was available. That meant one could adjust things like CFLAGS and other things from a single template rather than hardcoding them into every recipe. It also allowed me to experiment with the location things get installed to.

Another major change that my package manager introduced was per-package installation directories. Ever since I started using Linux I was confused by which file belongs to which package, so my design was to create one directory per package in /opt, .e.g /opt/ncurses, and then symlink the files in there into the main FS root via a relative symlink. This allows one to do a simple ls -la on a file and immediately know which package it belongs too:

$ ls -la /bin/tic
lrwxrwxrwx    1 root     root            28 Jan 18 21:42 /bin/tic -> ../opt/netbsd-curses/bin/tic

But also, to remove a package by simply removing its directory in /opt.

Meanwhile, the butch package manager has been rewritten in POSIX sh in lieu of C, which makes it much more hackable.

Apart from that, sabotage to this day still follows the initial philosophy and file system layout and init system created by Chris2 during the first week of sabotage's existence.

Evolution of sabotage and musl

Sabotage Linux was the only major distro based on musl libc for several years, only in 2014 alpine linux joined the ranks. Until that happened, it's been mainly my feedback to Rich about issues I encountered that turned musl into a libc ready for prime-time. I filed uncountable bug reports to 3rd party packages that relied on buggy behaviour of GLIBC or used GLIBC-only extensions or internal types/data structures/functions, and got quite some of them to use a different, more portable approach. When Alpine Linux joined the ranks, most of the pioneer work was already done, including making musl compatible to GCC's libstdc++. Before that, sabotage was strictly C-only and one of the major issues I faced was that some required C libraries required the build system CMake written in C++. I ended up writing custom Makefiles for some projects only to make them buildable in my C++-less distro.

Fortunately, back in the day almost the entire Linux FOSS infrastructure was based on C, so it was relatively easy to bootstrap most things from source. This is quite different to today with Rust zealots starting to rewrite critical library components in Rust, which is basically almost impossible to bootstrap from source, and only supports a small subset of the architectures supported by sabotage.

I refuse to add Rust to sabotage, and am asking myself whether Rust and the accompanying security theater was created to fragment the FOSS ecosystem, and weaken the status of the C programming language, which is the underlying cause for the huge success and performance, stability and resource-efficiency of the UNIX operating system. The leaked halloween documents prove without a shadow of a doubt that M$ saw Linux/FOSS (already in 1998) as a huge threat to their market monopoly and sought ways to undermine it. Certainly they didn't stop after the leak and were seeking ever new methods to achieve their goal of weakening the FOSS movement. A collaboration with Mozilla and Google (Go with its online micro-dependency concept) seems possible. Just make it too hard to build stuff from source and FOSS will exist only in name.

Meanwhile even GCC switched to C++ as its implementation language as of GCC 4.8. Had that been the case in 2011, it's easy to imagine that a distro based on musl would've given up already during infancy.

During the years, I made sabotage compile on a big variety of architectures, at first using QEMU to build in a native environment, later by adding support for cross-compilation. I even contributed support for powerpc and x32 architectures to musl.

Once Alpine Linux joined the ranks of distros using musl, I back-pedaled my involvement quite a bit, figuring that alpine with its big number of contributors could take over the job of filing upstream bug reports and playing guinea pig for new musl releases.

Alpine Linux got hugely successful once it was chosen as the standard distro for Docker images due to its small footprint and binary package manager, attracting even more users to musl. Other distros like Void Linux joined.

Even though many projects and desktop linux distros still target GLIBC only, musl has become a serious competitor and willingness of upstreams to support it has considerably increased. It is meanwhile used by many projects, even the WebAssembly workgroup has chosen it as its C library implementation as it was already adopted by emscripten.

Sabotage itself always stayed a niche project, since I didn't spend any effort on advertising it or creating a polished website to attract new users, therefore it was most of the time a one-man-show, even though many contributors appeared and disappeared over time. Apart from myself, only AequoreaVictoria who also happens to provide the build server hosting has been with the project since 2012 with regular contributions.

Yet, it still is one of the most stable, mature and versatile musl distros available, and probably the easiest way to get a usable and slim distro cross-compiled for any new architecture or embedded hardware project.

During its development a number of side-projects were released that allowed to side-step the need of bloated dependencies, most notably gettext-tiny and netbsd-curses, which are now used by a number of other distros, but also things like atk-bridge-fake which allows to build GTK+3 without dbus dependency.

Post or read comments...

Modern autotools

9 Mar 2021 14:28 UTC

GNU autotools aka the GNU Build System is a build system designed to produce a portable source code package that can be compiled about everywhere.

The intentions are good: when properly used, a configure script is generated that runs everywhere a POSIX compatible shell is available, and a Makefile that can be used everywhere a make program is available. No further dependencies are required for the user, and the process to build source with ./configure, make and make install is well-established and understood.

From the developer's perspective though, things look a bit different. In order to create the mentioned configure script and Makefile, autotools uses the following 3 main components:

autoconf
automake
libtool

To use them, the developer needs perl and gnu m4 installed in addition to the tools themselves, as well as a basic understanding of m4, shell scripting, Makefiles, and the complex interaction between autoconf and automake.

He also needs a lot of time and patience, because each change to the input files requires execution of the slow autoreconf to rebuild the generated sources, and running ./configure and make for testing.

Libtool is a shell script wrapper around the compiler and the linker with >9000 lines, which makes every compiler invocation about a 100 times slower. It is notorious for breaking static linking of libraries and cross-compilation due to replacing e.g. "-lz" with "/usr/lib/libz.so" in the linker command. Apart from being buggy and full of wrong assumptions, it's basically unmaintained (last release was 6 years ago).

While there's reasonably complete documentation for autoconf and automake available, it is seriously lacking in code examples, and so many supposedly simple tasks become a continuous game of trial and error.

Due to all of the above and more, many developers are overwhelmed and frustrated and rightfully call autotools "autocrap" and switch to other solutions like CMake or meson.

But those replacements are even worse: they trade the complexity of autotools on the developer side against heavy dependencies on the user side.

meson requires a bleeding edge python install, and CMake is a huge C++ clusterfuck consisting of millions of LOC, which takes up >400 MB disk space when built with debug info. Additionally meson and cmake invented their own build procedure which is fundamentally different from the well-known configure/make/make install trinity, so the user has to learn how to deal with yet another build system.

Therefore, in my opinion, the best option is not to switch to another build system, but simply to only use the good parts of autotools.

Getting rid of libtool

It's kinda hard to figure out what libtool is actually good for, apart from breaking one's build and making everything 100x slower. The only legitimate usecase I can see is executing dynamically linked programs during the build, without having to fiddle around with LD_LIBRARY_PATH. That's probably useful to run testcases when doing a native build (as opposed to a cross-compile), but that can also be achieved by simply statically linking to the list of objects that need to be defined in the Makefile anyhow. Libtool being invoked for every source file is the main reason for GNU make's reputation of being slow. If GNU make is properly used, one would need to compile thousands of files for a noticeable difference in speed to the oh-so-fast ninja.

Getting rid of automake

Automake is a major pain in the ass.

Makefiles are generated by the configure script by doing a set of variable replacements on Makefile.in, which in turn is generated by automake from Makefile.am on the developer's end.

Makefiles are auto-generated, huge, and unreadable.
Any change to a Makefile is overwritten as soon as configure is re-run. In order to change something permanently in the Makefile, one has to call autoreconf after each change to Makefile.am, which runs the entire autotools stack on all the input files which (a) requires that those tools are available and (b) waiting for a looooong time for these slow tools.
The interaction with configure is complex and underdocumented.
Automake is designed for a recursive build. Each directory has its own Makefile, which is unaware of the other dependencies and defeats parallel compilation. See the link for the many issues caused by recursive make.
There are some implicit rules additional to those already implied when using make, which can get into your way, e.g. when using the file extension .c for generated files.

The only real advantage that automake offers over a handwritten Makefile is that conditionals can be used that work on any POSIX compatible make implementation, since those are still not standardized to this day, and that dependency information on headers is generated automatically. The latter can be implemented manually using the -M options to gcc, like -MMD.

The Good Part(s)

The only good part of autotools is the generated portable configure script, and the standard way of using the previously mentioned trinity to build. The configure script is valuable for many reasons:

It can check whether all necessary prerequisites are available.
It can check whether unportable or recent headers or features are available on the system that's targeted by correctly doing a compile test rather than hardcoding an ugly and most-of-the-times wrong ifdef-jungle into the source code.
It allows to enable or disable optional features and behaviours.
It creates a single reusable build configuration, consisting of build options, user-provided or default CC, CFLAGS, LDFLAGS... and results for mentioned system-specific test results.

Additionally to the above, autoconf-generated configure scripts have some useful features built in:

config.log for figuring out why a certain test failed
config.cache to store test results, so the same things aren't tested over and over again. When using standard tests, one can even provide a pre-generated config.cache that specifies the system properties and can be used to speed up all programs that use autotools.
provides a standard set of options to configure destination paths, like --prefix, --libdir, --bindir etc.
./configure --help output, explaining nicely all the available options, influential environment vars and so on.

On the other hand generated configure scripts tend to be quite big, and, as they are executed serially on a single CPU core, rather slow.

Fortunately this can be fixed by removing the vast majority of checks, just assume C99 support as a given and move on. While you're at it, throw out that check for a 20 year old HP-UX bug, please.

The Solution

A modern project using autotools should only use autoconf to generate the configure script, and a single top-level Makefile that's handwritten. Build configuration can be passed to the Makefile using a single file that's included from the Makefile. The number of configure checks should be reduced to the bare minimum, there's little point in testing e.g. for the existence of stdio.h which is standardized since at least C89, especially if then later the preprocessor macro HAVE_STDIO_H isn't even used and stdio.h is included unconditionally. Used this way, the configure script will be quick in execution and small enough to be included in the VCS which allows the users to checkout and build any commit without having to do the autotools dance. A good guide for writing concise configure scripts is available here.

As for conditionals in make, I'm pretty much in favor of simply assuming GNU make as a given and using its way to do conditionals.

It's in widespread use (default make implementation on any Linux distro) and therefore available as gmake even on BSD installations that usually prefer their own make implementation. Apart from that it's lightweight (my statically linked make 3.82 binary has a mere 176 KB) and one of the most portable programs around.

The alternative is to target POSIX make and do the conditionals using automake-style text substitutions in the configuration file produced by the configure run.

The Ingredients

Our example project uses the following files: configure.ac, Makefile, config.mak.in, main.c, foo1.c and foo42.c with the following contents respectively.

configure.ac:

AC_INIT([my project], 1.0.0, maintainer@foomail.com, myproject)
AC_CONFIG_FILES([config.mak])
AC_PROG_CC
AC_LANG(C)

AC_ARG_WITH(foo,
    AS_HELP_STRING([--with-foo=1,42], [return 1 or 42 [1]]),
    [foo=$withval],
    [foo=1])

AC_SUBST(FOO_SOURCE, $foo)
AC_OUTPUT()

Makefile:

include config.mak

OBJS=main.o $(FOO).o
EXE=foo-app

all: $(EXE)

$(OBJS): config.mak

$(EXE): $(OBJS)
    $(CC) -o $@ $(OBJS) $(LDFLAGS)
clean:
    rm -f $(EXE) $(OBJS)

install:
    install -Dm 755 $(EXE) $(DESTDIR)$(BINDIR)/$(EXE)

.PHONY: all clean install

config.mak.in:

# whether to build foo1 or foo42
FOO=foo@FOO_SOURCE@

PREFIX=@prefix@
BINDIR=@bindir@

CFLAGS=@CFLAGS@
CPPFLAGS=@CPPFLAGS@
LDFLAGS=@LDFLAGS@

main.c:

#include <stdio.h>
extern int foo();
int main() {
    printf("%d\n", foo());
}

foo1.c:

int foo() { return 1; }

foo42.c:

int foo() { return 42; }

You can get these files here.

After having the files in place, run autoreconf -i to generate the configure script. You'll notice that it runs unusually quickly, about 1 second, as opposed to projects using automake where one often has to wait for a full minute.

The configure script provides the usual options like --prefix, --bindir, processes the CC, CFLAGS, etc variables exported in your shell or passed like

CFLAGS="-g3 -O0" ./configure

just as you'd expect it to, and provides the option --with-foo=[42,1] to let the user select whether he wants the foo42 or foo1 option.

How it works

configure.ac:

AC_CONFIG_FILES([config.mak])

Here we instruct autoconf that config.mak is to be generated from config.mak.in when it hits AC_OUTPUT() (which causes config.status to be executed). It will replace all values that we either specified with AC_SUBST(), or the built-in defaults like prefix and bindir (see config.status for the full range) with those specified by the user.

AC_ARG_WITH(...)

This implements our --with-foo multiple choice option. You can read about how it works in the usual autoconf documentation. Other autoconf macros that you will find handy include AC_CHECK_FUNCS, AC_CHECK_HEADERS, AC_CHECK_LIB, AC_COMPILE_IFELSE to implement the various checks that autoconf offers, as well as AC_ARG_ENABLE to implement the typical --enable/--disable switches.

AC_SUBST(FOO_SOURCE, $foo)

This replaces the string @FOO_SOURCE@ in config.mak.in with the value assigned by the user via the AC_ARG_WITH() statement, when config.mak is written.

The rest of the contents in configure.ac are the standard boilerplate for C programs.

Makefile:

include config.mak

This statement in Makefile includes the config.mak generated by configure. If it is missing, running make will fail as it should. config.mak will provide us with all the values in config.mak.in, where each occurence of @var@ is replaced with the results of the configure process.

OBJS=main.o $(FOO).o

This sets OBJS to either main.o foo1.o or main.o foo42.o, depending on the choice of the user via the --with-foo switch. We didn't even have to use conditionals for it.

$(OBJS): config.mak

We let $(OBJS) depend on config.mak, so they're scheduled for rebuild when the configuration was changed with another ./configure execution, as the user might have changed his CFLAGS or --with-foo setting to something else. For bonus points, you could put all build-relevant settings into e.g. config-build.mak, and directory-related stuff stuff into config-install.mak (unless you hardcode directory names into the binary) and make the dependency to config-build.mak only.

    install -Dm 755 $(EXE) $(DESTDIR)$(BINDIR)/$(EXE)

Two important things about this line:

In the install target, always use $(DESTDIR) in front of $(BINDIR), $(PREFIX) etc, so the user can install stuff to a staging directory as is custom.
The used install command -Dm mode option isn't compatible with BSD's install program. The BSD guys really should get their act together and finally support this handy option, until then you can use the portable script here that emulates it.

The rest of the Makefile contents are pretty standard. You might notice the absence of a specific rule to build .o files from .c, we use the implicit rule of make for this purpose.

config.mak.in:

FOO=foo@FOO_SOURCE@

config.status will replace the string @FOO_SOURCE@ with either 1 or 42, depending on which --with-foo option was used (1 being the default), shortly before configure terminates and writes config.mak. The values for @CFLAGS@ and the other variables will be replaced with the settings the configure scripts defaults to or those set by the user.

foo1.c, foo42.c and main.c:

... should be self-explanatory.

Testing

You can run ./configure && make now and see that it works - foo-app is created, and make DESTDIR=/tmp/foobar install installs foo-app into /tmp/foobar/bin/foo-app. ./configure --with-foo=42 && make should cause the foo-app binary to print 42 instead of 1.

Lua vs QuickJS

17 Nov 2020 23:17 UTC

Since I didn't find any performance comparison of QuickJS and Lua, and having a lot of spare time due to being locked into my flat due to the WHO plandemic, I decided to do my own little benchmark.

As there's already a benchmark comparing QuickJS and the V8 JIT, this also let's us reason how Lua would fare in comparison to a heavily tuned JIT.

We compare the performance for "fannkuch redux", a well-established benchmark program which originally featured on debian's shootout game, which was later redacted because it was not seen as "politically correct" to compare speed. Implementors of slow languages felt offended, and successfully lobbied to take it down.

javascript version to test QuickJS requires a small modication from arguments[0] to scriptArgs[1] for quickjs.

lua version

Since the js version is a transliteration of the lua code, this test couldn't be any fairer.

Results for QuickJS and Lua compiled with -Os:

$ time qjs fannkuch.js 11
556355
Pfannkuchen(11) = 51

real    0m41.560s
user    0m41.558s
sys     0m0.002s

$ time lua5.3 fannkuch.lua 11
556355
Pfannkuchen(11) = 51

real    0m40.585s
user    0m40.584s
sys     0m0.000s

Results for QuickJS and Lua compiled with -O3 each:

$ time qjs fannkuch.js 11
556355
Pfannkuchen(11) = 51

real    0m40.950s
user    0m40.946s
sys     0m0.000s

$ time lua5.3 fannkuch.lua 11
556355
Pfannkuchen(11) = 51

real    0m35.973s
user    0m35.973s
sys     0m0.000s

When optimized for size, Lua and QuickJS perform almost identically, whereas Lua is slightly faster (about 15%) when compiled for speed. That's a quite impressive result for QuickJS, as Lua is known to be one of the fastest non-jitted programming languages out there and has been around for a while, whereas QuickJS is the new kid on the block. QuickJS does spend a lot more time building though: using make -j 16 it takes 16 seconds, Lua compiles in one second.

Of course this little benchmark tests only a small subset of the functionality of those language interpreters, and is by no means conclusive, but IMO it's sufficient to give a good picture of the ballpark they're in.

An interesting addition would be to compare against the performance of the Dino language which is the fastest scripting language I'm aware of (with JIT turned off).

Post or read comments...

When "progress" is backwards

20 Oct 2020 15:58 UTC

Lately I see many developments in the linux FOSS world that sell themselves as progress, but are actually hugely annoying and counter-productive.

Counter-productive to a point where they actually cause major regressions, costs, and as in the case of GTK+3 ruin user experience and the possibility that we'll ever enjoy "The year of the Linux desktop".

Showcase 1: GTK+3

GTK+2 used to be the GUI toolkit for Linux desktop applications. It is highly customizable, reasonably lightweight and programmable from C, which means almost any scripting language can interface to it too.

Rather than improving the existing toolkit code in a backwards-compatible manner, its developers decided to introduce many breaking API changes which require a major porting effort to make an existing codebase compatible with the successor GTK+3, and keeping support for GTK+2 while supporting GTK+3 at the same time typically involves a lot of #ifdef clutter in the source base which not many developers are willing to maintain.

Additionally GTK+3 made away with a lot of user-customizable themeing options, effectively rendering useless most of the existing themes that took considerable developer effort for their creation. Here's a list of issues users are complaining about.

Due to the effort required to port a GTK+2 application to use GTK+3, many finished GUI application projects will never be ported due to lack of manpower, lost interest of the main developer or his untimely demise. An example of such a program is the excellent audio editor sweep which has seen its last release in 2008. With Linux distros removing support for GTK+2, these apps are basically lost in the void of time.

The other option for distros is to keep both the (unmaintained) GTK+2 and GTK+3 in their repositories so GTK+2-only apps can still be used, however that causes the user of these apps to require basically the double amount of disk and RAM space as both toolkits need to live next to each other. Also this will only work as long as there are no breaking changes in the Glib library which both toolkits are built upon.

Even worse, due to the irritation the GTK+3 move caused to developers, many switched to QT4 or QT5, which requires use of C++, so a typical linux distro now has a mix of GTK+2, GTK+3, GTK+4, QT4 and QT5 applications, where each toolkit consumes considerable resources.

Microsoft (TM) knows better and sees backwards compatibility as the holy grail and underlying root cause of its success and market position. Any 25 year old Win32 GUI application from the Win95 era still works without issues on the latest Windows (TM) release. They even still support 16bit MS-DOS apps using some built-in emulator.

From MS' perspective, the freedesktop.org decision makers played into their hands when they decided to make GTK+3 a completely different beast. Of course, we are taught to never believe in malice but in stupidity, so it is unthinkable that there was actually a real conspiracy and monetary compensations behind this move. Otherwise we would be conspiracy theorist nuts, right ?

Showcase 2: python3

Python is a hugely successful programming/scripting language used by probably millions of programmers.

Whereas python2 development has been very stable for many years, python3 changes at the blink of an eye. It's not uncommon to find that after an update of python3 to the next release, existing code no longer works as expected.

Many developers such as myself prefer to use a stable development environment over one that is as volatile as python3.

With the decision to EOL python2 thousands of py2-based applications will experience the same fate as GTK+2 applications without maintainer: they will be rendered obsolete and disappear from the distro repositories. This may happen quicker than one would expect, as python by default provides bindings to the system's OpenSSL library, which has a history of making backwards-incompatible changes. At the very least, once the web agrees on a new TLS standard, python2 will be rendered completely useless.

Porting python2 to python3 isn't usually as involving as GTK+2 to GTK+3, but due to the dynamic nature of python the syntax checker can't catch all code issues automatically so many issues will be experienced at runtime in cornercases, causing the ported application to throw a backtrace and stopping execution, which can have grave consequences.

Many companies have millions of line of code still in python2 and will have to produce quite some sweat and expenses to make it compatible to python3.

Showcase 3: ip vs ifconfig

Once one had learned his handful of ifconfig and route commands to configure a Linux' box network connections, one could comfortably manage this aspect across all distros. Not any longer, someone had the glorious idea to declare ifconfig and friends obsolete and provide a new, more "powerful" tool to do its job: ip.

The command for bringing up a network device is now ip link set dev eth1 up vs the older ifconfig eth1 up. Does this really look like progress? Worst, the documentation of the tool is non-intuitive so one basically has to google for examples that show the translation from one command to the other.

The same critics apply to iw vs iwconfig.

Showcase 4: ethernet adapter renaming by systemd/udev

Latest systemd-based distros come up with network interface names such as enx78e7d1ea46da or vethb817d6a, instead of the traditional eth0. The interface names assigned by default on Ubuntu 20 are so long a regular human can't even remember them, any configuration attempt requires one to copy/paste the name from ip a output. Yet almost every distro goes along with this Poettering/freedesktop.org-dictated nonsense.

Showcase 5: CMake, meson, and $BUILDSYSTEMOFTHEDAY

While the traditional buildsystem used on UNIX, autoconf, has its warts, it was designed in such a way that only the application developer required the full set of tools, whereas the consumer requires only a POSIX compatible shell environment and a make program.

More "modern" build systems like cmake and meson don't give a damn about the dependencies a user has to install, in fact according to this, meson authors claimed it to be one of their goals to force users to have a bleeding edge version of python3 installed so it can be universally assumed as a given.

CMake is written in C++, consists of 70+ MB of extracted sources and requires an impressive amount of time to build from source. Built with debug information, it takes up 434 MB of my harddisk space as of version 3.9.3. It's primary raison-d'etre is its support for the Microsoft (TM) Visual Studio (R) (TM) solution files, so Windows (TM) people can compile stuff from source with a few clicks.

The two of them have in common that they threw over board the well-known user interface to configure and make and invented their own NIH solution, which requires the user to learn yet another way to build his applications.

Both of these build systems seem to have either acquired a cult following just like systemd, or someone is paying trolls to show up on github with pull requests to replace GNU autoconf with either of those, for example 1 2 . Interestingly also, GNOME, which is tightly connected to freedesktop.org, has made it one of its goals to switch all components to meson. Their porting effort involves almost every key component in the Linux desktop stack, including cairo, pango, fontconfig, freetype, and dozens of others. What might be the agenda behind this effort?

Conclusion

We live in an era where in the FOSS world one constantly has to relearn things, switch to new, supposedly "better", but more bloated solutions, and is generally left with the impression that someone is pulling the rug from below one's feet. Many of the key changes in this area have been rammed through by a small set of decision makers, often closely related to Red Hat/Gnome/freedesktop.org. We're buying this "progress" at a high cost, and one can't avoid asking oneself whether there's more to the story than meets the eye. Never forget, Red Hat and Microsoft (TM) are partners and might even have the same shareholders.

Post or read comments...

Speeding up static regexes in C using re2r and ragel

16 Oct 2020 00:16 UTC

While working on tinyproxy I noticed that its config file parser got notoriously slow when processing big config files with several thousand lines (for example Allow/Deny directives).

The config parser uses a set of static POSIX ERE regexes which are compiled once using regcomp(3p) and then executed on every single line via regexec(3p).

For example, the regex for the "Allow" directive is

(((([0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)(/[0-9]+)?)|(((([0-9a-fA-F:]{2,39}))|(([0-9a-fA-F:]{0,29}:([0-9]+[.][0-9]+[.][0-9]+[.][0-9]+))))(/[0-9]+)?))|([-A-Za-z0-9._]+))

which consists of the more readable parts

"(" "(" IPMASK "|" IPV6MASK ")" "|" ALNUM ")"

as defined using some CPP macros in the source code.

So basically the regex matches either an ipv4 address with a netmask like 10.0.0.0/8, an ipv6 with a netmask, or an alphanumeric domain name.

Parsing 32K lines with Allow statements using the libc's regexec function took about 2.5 seconds, which made me wonder whether we could get this a little bit faster.

POSIX regexec() has the following signature:

int regexec(const regex_t *restrict preg, const char *restrict string,
    size_t nmatch, regmatch_t pmatch[restrict], int eflags);

preg is the compiled regex, string the string to match, nmatch the maximum number of matching groups, and pmatch an array of end/start indices into the string, corresponding to matching groups. Matching groups are the parts enclosed inside parens in the regex. This is a very practical feature as it allows to easily extract submatches.

My idea was to write a wrapper around re2c or ragel (both of which compile a fast finite state automaton), which automatically turns a POSIX-compatible ERE expression into the expected format and generates a regexec()-like wrapper function that provides the same convenient submatch array.

For evaluation, I first created a manual re2c conversion of (a predecessor of) the above "Allow" regex, however that resulted in almost 10K (!) lines of C code emitted. Re2c input

Next I tried the same thing with ragel, and to my pleasant surprise the resulting C code was only a little over 900 lines, i.e. 10% of re2c. Ragel input

This made it quite clear that ragel was the winner of the competition.

After spending some more effort, the product was named re2r (regex to ragel) and is available here.

re2r accepts input on stdin, a machine name followed by a space and a regex per line. For example (from tinyproxy):

logfile "([^"]+)"
pidfile "([^"]+)"
port ([0-9]+)
maxclients ([0-9]+)

which generates the following code:

re2r helpfully prints the message:

 diagnostics: maximum number of match groups: 2

more about that in a minute.

As a size optimization, for multiple identical regexes, the wrapper for that machine simply calls the wrapper for the machine with the identical regex, e.g. re2r_match_pidfile() calls re2r_match_logfile().

The prototype for our regexec()-like match functions looks like:

RE2R_EXPORT int re2r_match_logfile(const char *p, const char* pe, size_t nmatch, regmatch_t matches[]);

RE2R_EXPORT needs to be defined by the user to either "static" or "extern", depending on how he needs the visibility of the function. re2r_match_logfile is the function name generated for the named regex "logfile".

p is a pointer to the start of the string to be matched, and pe to the end (usually it can be defined as as p+strlen(p)). nmatches is just like in the POSIX regexec() signature the maximum number of items that can be stored in the matches array, which is optimally of the size that our diagnostic line earlier notified us about (here: 2). The matches array is of type regmatch_t[] (thus we need to include the header regex.h to get the definition) and it must consist of nmatches items.

Now we only need to run ragel on the re2r output to get a heavily optimized matcher function that returns almost identical results to using the same regex/ string with POSIX regcomp()/regexec(), while having an almost identical function signature, so it's straightforward to replace existing code.

As a trick, the plain output of re2r can be directly compiled using gcc -include regex.h -DRE2R_EXPORT=extern -c foo.c after running ragel on it, without having to embed/include it in other source files.

In the case of tinyproxy, parsing the 32K allow statements using the re2r/ragel reduced the runtime from 2.5 seconds to a mere 236 milliseconds.

re2r also ships a testing tool called re2r_test which can be used as follows:

re2r_test -r "((foo)|bar(baz))"

which then waits for test input on stdin. upon entering "foo", we get the following output:

---------- RE2R  ----------
0: foo
1: foo
2: foo
((foo)|bar(baz))
12   2    3   31
12   2         1
---------- POSIX ----------
0: foo
1: foo
2: foo
((foo)|bar(baz))
12   2    3   31
12   2         1

The first block is the output from the re2r matcher function, the other from POSIX regexec(). The 0, 1, 2 positions show the extracted match groups, then the regex is displayed followed by 2 lines that show

1) the offsets of all possible matching groups, and 2) the matching groups that actually matched.

In this case only the matching group 1 (outer parens pair) and 2 (foo) matched.

Note that POSIX always makes a matching group 0 available, which has start and end offsets of the entire string if it was successfully matched.

If we now enter "barbaz", we get:

---------- RE2R  ----------
0: barbaz
1: barbaz
3: baz
((foo)|bar(baz))
12   2    3   31
1         3   31
---------- POSIX ----------
0: barbaz
1: barbaz
3: baz
((foo)|bar(baz))
12   2    3   31
1         3   31

In this case, we don't have a match for matching group 2, but one for 3. Group 1 matches again, as it surrounds the entire expression.

Note that while re2r itself is GPL licensed, the code it emits is public domain.

I hope that re2r will be helpful in the adoption of fast ragel parsers into C projects, and believe that re2r_test can be a generally useful tool to visualize regexes and matching groups on the terminal.

The result of the re2r/ragel work on tinyproxy can be evaluated in the ragel branch.

Post or read comments...

Restoring accidentally deleted files on Linux

02 May 2019 22:27 UTC

Doh. Through a tiny bug in a Makefile auto-generated by my build system rcb2, I accidentally deleted the C source file I've been working on for almost an hour, and which wasn't checked into git yet.

Fortunately, I know the basic steps to restore a file* in a filesystem-agnostic way.

These are:

open the raw blockdevice (e.g. /dev/sda1/) the file was on,
search for a string known to be (only) inside the deleted file,
from that position on, find the start and end offsets of the file,
restore it using these offsets with the dd utility to another blockdevice.

First of all though, I sent a SIGSTOP signal to firefox, the most volatile process on my desktop, to prevent it from writing any files onto my harddisk while the restoration was in process, potentially overwriting the blocks occupied by the deleted file. I did this via an extension I wrote for my Window Manager openbox, which adds a menu item "Pause and iconify" to the popup menu on the titlebar of all windows. I usually use this to prevent Firefox from consuming CPU and draining my laptop's battery, while I'm traveling. Other than that, there's almost nothing running on a typical sabotage linux box which could interfere via constant disk writes, unlike GNU/Linux systems with systemd and a gazillion of background daemons installed.

Then I opened /dev/mapper/crypt_home, the blockdevice containing my /home filesystem in my favorite hexeditor, went on the ascii tab on the right side, and started a search for a string I knew was only in that new C file, which was <openDOW/vec2f.h> since I used that file in a hackish way via an include statement.

After hitting ENTER in hexedit's search dialog, CPU usage went to 100%, and it slowly crunched its way through the encrypted harddisk's blockdevice mapper. I left my computer to brew a coffee, and came back after about 5 minutes. From the current offset displayed, I figured that the search was currently only 40GB into the blockdevice. Many more GBs to go, since the file could be at the very end of the SSD. After another break of about 10 mins, I was lucky enough and the string was found at offset 0x13c6ffa0ab, at about 84 GB into the blockdevice.

Using pageup/pagedown in hexedit, the beginning and end offsets of the source file were quickly found. They were 0x13C6FF1FFC and 0x13C6FFB472, respectively.

dd if=/dev/mapper/crypt_home of=/dev/shm/dump bs=1 skip=$((0x13C6FF1FFC)) count=$((0x13C6FFB472 - 0x13C6FF1FFC))

did the rest to restore the file onto /dev/shm, the ramdrive.

Since my SSD is usually a lot faster than this, I decided to write a program to speed up future searches. The plan is simple, one needs to read from the filesys in large chunks, so the time spent in syscalls is neglible, and then search over the memory chunks using an optimized algorithm that compares word-at-a-time, just like musl's memmem() function does. Plus some more logic to find the searchterm even across chunk boundaries. The result can be found here in a small C program.

And indeed, it is a lot faster than hexedit.

# time ./fastfind /dev/mapper/crypt_home '<openDOW/vec2f.h>'
curr: 0x13498f0000
bingo: 0x13c6ffa0ab
^CCommand terminated by signal 2
real    1m 4.26s
user    0m 20.35s
sys     0m 19.38s

at 64 seconds total, it crunched through the blockdevice at a rate of 1.2GB/sec, at least 10x faster than hexedit.

So for future undelete tasks, my fastfind utility will become the first stop, to find an offset, which will then be followed by my good old friend hexedit to find beginning and end position in the neighbourhood of that offset, and to be finished off with dd.

*: This approach works well for smaller files, whereas bigger ones are usually spread over several non-adjacent blocks.

Post or read comments...

Mastering and designing C/C++ build systems

19 Apr 2019 10:36 UTC

A Primer for build system developers and users

As the maintainer of sabotage linux, a distro compiled from source, with >1000 packages, and being involved in the development of musl libc, I've seen a wide variety of odd build systems, or regular build systems used in an odd way. Which resulted in lots of issues trying to get other people's packages building.

The vast majority of build system coders and developers using these build systems for their packages do not understand in detail how their toolchains are supposed to be used, and especially cross-compilation is a topic the majority of people knows nothing about. The intent of this blog post is to explain the basic mechanisms, to change this situation.

But first, let's establish the meaning of some terms. From here on, the term user will be used to mean the person trying to compile your software package from source. We're not concerned here about people using the compilation result via a binary package.

Now we will first take a quick look at the basic concepts involved in compilation, followed by the typical 3 stages of a build process, which are: Configuration, Compilation, Installation.

Basic Compilation Concepts

So in order to get your program compiled on a variety of different hosts, you typically need to interface with the following components:

The compiler.

For the C programming language, the convention is that on the user's system there's a C compiler installed in the default search PATH with the name cc. It can be overridden with the environment variable CC.

so if CC is set to clang, the build system should use clang instead of cc.

A sanely designed build system does something along the lines of:

if is_not_set($CC): CC = cc

For C++, the default binary name is c++ and the environment variable CXX.

Note that the user may choose to set CC or CXX to something that includes multiple items, for example CC=powerpc-gcc -I/tmp/powerpc/include.

Therefore, in a shell script, when you want to use the CC command to compile something, the $CC variable needs to be be used unquoted, i.e. $CC and not "$CC" since the latter would force the shell to look for a binary with the spaces inside the filename.

(For the record, the compiler is the program that turns your sourcecode into an object file, e.g. cc foo.c -c -o foo.o)

The linker.

Fortunately with C and C++, unless you do highly unusual things, you will never have to invoke the linker directly. instead you can simply use CC or CXX and they will know from the context that a linker is needed, and call the linker themselves. (For the record, the linker is what takes a couple .o files and turns them into an executable or a shared library, e.g.: cc foo.o bar.o -o mybinary.elf)

Compiler and linker options.

There will be a couple options you will have to use so the compilation works in a certain way. For example, your code may require the flag -std=c99 if you use C99 features.

Additionally, the user will want or need to use certain flags. For this purpose, the environment variable CFLAGS is used.

If the user didn't specify any CFLAGS himself, you may decide to set some sane default optimization flags (the default for GNU autoconf packages is -O2 -g -Wall). The CFLAGS used for the compilation should always put the user-set CFLAGS last in the command line, so the user has the ability to override some defaults he doesn't like. The following logic describes this:

REQUIRED_CFLAGS=-std=c99
CFLAGS_FOR_COMPILE=$(REQUIRED_CFLAGS) $(CFLAGS)

For C++, these flags are called CXXFLAGS, and the logic is precisely the same.

There's also CPPFLAGS, which is used for preprocessor directives such as -DUSE_THIS_FEATURE -DHAVE_OPENGL and include directories for header lookup. More about headers soon. Again, user-supplied CPPFLAGS need to be respected and used after the CPPFLAGS the build system requires.

Last but not least we have LDFLAGS, these are flags used at link time. It contains things such as -L linker library search path directives, -lxxx directives that specify which libraries to link against, and other linker options such as -s (which means "strip the resulting binary"). Here, again, the rule is to respect user-provided LDFLAGS and put them after your own in the linker command.

From here on, whenever we talk about cc or CC or CFLAGS, the exact same applies to c++, CXX and CXXFLAGS for C++.

Libraries and their headers

When writing code in C or C++, you necessarily need to use libraries installed on the end users machine. At least, you would need to use the C or C++ standard library implementation. The former is known as libc, the latter as libstdc++ or libc++. Optionally some other libraries, such as libpng may be needed.

In compiled form, these libraries consist of header files, and the library itself, as either static (.a archive) or dynamic library (.so, .dynlib, .dll). These headers and libs are stored in a location on your user's machine, which is typically /usr/include for headers and /usr/lib for libraries, but this is none of your concern. It's the job of the user to configure his compiler in such a way that when you e.g. #include <stdio.h> it works (usually the user uses his distro-provided toolchain which is properly set up).

Cross-compilation

Cross-compilation means that you compile for a different platform than the one you're using, for example if you want to compile ARM binaries for your raspberry pi from your x86_64 desktop.

It's not really much different than regular compilation, you pass your compiler name as CC, e.g. CC=armv7l-linux-musl-gcc and set your C and CPP flags such that they point into the lib/ and include/ dirs with your other ARM stuff in it. For example, if you prepare a rootfs for your raspberry pi in /tmp/piroot, you'd probably set up your compiler-related environment vars as following:

CC=armv7l-linux-musl-gcc
CPPFLAGS=-isystem /tmp/piroot/include
LDFLAGS=-L/tmp/piroot/lib

In compiler jargon, the armv7l-linux-musl prefix to your toolchain name is the so-called triplet. All components of your toolchain are prefixed with it, for example the ar archiver is called armv7l-linux-musl-ar, the same applies for as, ld, ranlib, strip, objdump, etc.

In Autoconf-based build systems, you pass the triplet as --host=armv7l-linux-musl to ./configure, whereas Makefile-only based systems usually use a CROSS_COMPILE environment variable, which is set to triplet plus a trailing dash, e.g. CROSS_COMPILE=armv7l-linux-musl-. In your own build system, you should follow the GNU autoconf convention though.

What makes cross-compilation a bit tricky is

you can't execute binaries that you created with the crosscompiler toolchain on your build machine.

Some packages require to compile tools that are then executed, for example to generate some headers and similar things. These tools need to be compiled with a different compiler, targeting your host's OS, arch, etc, which in autoconf jargon is confusingly called the "build" compiler, whereas Makefile-only based build systems call it HOSTCC. Note that when such programs, to be executed on the host, require other additional object files that are needed for the target build too, these object files need to be compiled twice, once for the host and once for the target. In such a case it is advisable to give the object files different extensions, maybe .ho for object files meant for the host, and .o for those intended for the target.

This also means your configuration process can't use checks that require to build and then run a binary.
There's a variety of reasons host include/ and lib/ directories can leak into the build process, causing havoc. In the worst case, you only have an include directory leak which has wrong sizes for types in them, which can result in a successfully linked binary that will crash or corrupt memory. If host library directories leak into the build, you will get link errors when the linker tries to link ARM and x86_64 binaries together. One of the worst offenders in this regard is libtool, which should be avoided at all costs.

The Build Process

If you design a build system from scratch, keep in mind that your users probably don't want to spend a lot of time learning about your system. They simply want to get the process done as painlessly and quickly as possible (which implies that the build system itself should have as little external dependencies as possible).

Please do respect existing conventions, and try to model your build system's user interface after the well-established GNU autoconf standards, because it's what's been around for 20+ years and what the majority of packages use, so it's very likely that the user of your package is familiar with its usage. Also, unlike more hip build tools of the day, their user interface is the result of a long evolutionary process. Autoconf does have a lot of ugly sides to it, but from a user perspective it is pretty decent and has a streamlined way to configure the build.

Step1: Configuration

Before we can start building, we need to figure out a few things. If the package has optional functionality, the user needs to be able to specify whether he wants it or not. Some functionality might require additional libraries, etc. This stage in the build process is traditionally done via a script called configure.

Enabling optional functionality

Your package may have some non-essential code or feature, that might pull in a big external library, or may be undesirable for some people for other reasons.

Traditionally, this is achieved by passing a flag such as --disable-libxy or --without-feature, or conversely --with-feature or --enable-libxy.

If such a flag is passed, the script can then write for example a configuration header that has some preprocessor directive to disable the code at compile time. Or such a directive is added to the CPPFLAGS used during the build.

These flags should be documented when the configure script is being run with the --help switch.

System- or Version-specific behaviour

Sometimes one needs to use functionality that differs from system to system, so we need to figure out in which way the user's system provides it.

The wrong way to go about this is to hardcode assumptions about specific platforms (OS/compiler/C standard library/library combinations) with ifdefs like this:

#if OPENSSL_VERSION_NUMBER >= 0x10100000
/* OpenSSL >= 1.1 added DSA_get0_pqg() */
    DSA_get0_pqg(dsa, &p, &q, &g);
#else
    ...
#endif

This is wrong for several reasons:

Linux distros sometimes backport functionality from newer library versions to the older library version they ship. Conversely a newer library version that supposedly should have the functionality, could have been selectively downgraded with a patch fixing a specific bug, which might require to undo a new feature.
For some libraries, as in this case OpenSSL, API-compatible replacements exist (here libressl or Apple's fork boringssl).

The proper way to figure out whether DSA_get0_pqg() exists, is... to actually check whether it exists, by compiling a small testcase using it (more below), and pass a preprocessor flag such as HAVE_DSA_GET0_PQG to the code in question.

Even worse than the above hardcoded version number check is when people assume that a certain C library implementation, for example musl, have a certain bug or behaviour or lack a certain function, because at the time they tested it that was the case. If a __MUSL__ macro would exist , they would just hardcode their assumption into the code, even though the very next version of musl might have fixed the bug or added the function in question, which would then result in compile errors or even worse, bogus behaviour at runtime.

Checking for headers

You should NEVER hardcode any absolute paths for headers or libraries into your build system, nor should you start searching in the user's filesystem for them. This would make it impossible to use your package on systems with a non-standard directory layout, or for people that need to crosscompile it (more on cross-compilation just a little further down).

The majority of third-party libraries install their headers either into a separate sub-directory in the compiler's default include path (for example /usr/include/SDL/*.h), or if there's only one or two headers directly into the include dir (for example /usr/include/png.h). Now when you want to test for whether the user's system has the libpng headers installed, you simply create a temporary .c file with the following contents:

#include <png.h>
typedef int foo;

and then use $CC $CPPFLAGS $CFLAGS -c temp.c and check whether the command succeeded. If it did, then the png.h is available through either the compiler's default include directory search paths, or via a user-supplied -I incdir statement which he can provide if his libpng is installed in a non-standard location such as $HOME/include.

Note that this approach is cross-compile safe, because we didn't need to execute any binary.

If you want to use headers of a library such as SDL that installs a number of headers into a subdir, you should reference them in your code via #include <SDL/SDL.h> and not #include <SDL.h>, because the latter will require the addition of -I path include search path directives.

Checking for functions in libraries

After you've established that the user has libpng's headers installed, you might want to check whether it links correctly and whether it provides a certain function you're using (though testing for this only makes sense if the function is a recent addition).

Again, you check this by writing a temporary .c file, that looks roughly like:

#include <png.h>
int main() {png_set_compression_buffer_size(0, 0);}

the command to test it is: $CC $CPPFLAGS $CFLAGS temp.c -lpng $LDFLAGS.

If the command succeeds, it means that one of libpng.a/.so is available in the compiler's default library search path, (or in some -L directive the user added to his LDFLAGS) and that it contains the function png_set_compression_buffer_size. The latter is established by using a main() function, which forces the linker to fail on missing symbols (also note the omission of -c).

If your aim is only to test whether the libpng library is installed, the test can be written as:

#include <png.h>
int main() {return 0;}

and compiled exactly as the previous. Note that this test actually checks that both the header exists AND the library, so by using this kind of test you don't actually need to test for header and library separately. Again, we merely compiled the testcase and didn't need to execute it.

Pkg-config and derivates

For simple libraries such as zlib you should always try first whether you can simply link to e.g. -lz. If that doesn't work, you can fall back to a tool called pkg-config or one of its clones such as pkgconf, which is widely used. The path to the tool is user provided via the environment variable PKG_CONFIG. If not set, the fall-back is to use pkg-config instead. It can be used like this:

$PKG_CONFIG --cflags gtk+-2.0

This will print a couple of -I include directives that are required to find the headers of gtk+2.

Likewise

$PKG_CONFIG --libs gtk+-2.0

can be used to query the LDFLAGS required for linking gtk+2. Note that by default, pkg-config looks into $(prefix)/lib/pkgconfig, which is not compatible with crosscompilation.

2 solutions exist to make pkg-config compatible with cross-compilation:

the environment variable PKG_CONFIG_SYSROOT_DIR can be set to the crosscompile rootfs root directory, e.g. /tmp/piroot and PKG_CONFIG_LIBDIR to /tmp/piroot/lib/pkgconfig, or
if PKG_CONFIG is not set, but a cross-compile triplet was passed to the configuration process, and a triplet-prefixed pkg-config exists in the PATH, this is being used instead of the host's pkg-config, e.g. armv7l-linux-musl-pkg-config.

Now comes the bummer:

The authors of some packages wrote their own package specific pkg-config replacement, reasoning unknown. For example, on my machine the following proprietary -config programs exist: allegro-config, croco-config, curl-config,freetype-config, gpg-error-config, icu-config, libpng-config, pcap-config, pcre-config, python-config, sdl-config, xml2-config ...

What they all have in common is that they do things differently and they are not cross-compile compatible. Usually, whenever one of them is being used by a build system, cross-compilation breakage follows. Because these tools simply return the include and library directories of the host.

Unfortunately, the authors of some of these programs refuse to write portable pkg-config files instead. OTOH, most of them require no special include dirs, and their --libs invocation simply returns -lfoo. For those few that don't (the worst offenders are apr-1-config tools from Apache Foundation), as a build system author, I suppose, the only correct way to deal with them is to not use them at all, but instead force the user to specify the include and library paths for these libraries with some configuration parameters. Example: --apr-1-cflags=-I/include/apr-1

Checking for sizes of things

In some rare cases, one needs to know e.g. the size of long of the toolchain target at compile time. Since we cannot execute any testbinaries that would run e.g.

printf("%zu\n", sizeof(long));

and then parse their output because we need to stay compatible with cross-compilers, the proper way to do it is by using a "static assertion" trick like here:

/* gives compile error if sizeof(long) is not 8 */
int arr[sizeof(long) == 8 ? 1 : -1];

Compile the testcase with $CC $CPPFLAGS $CFLAGS -c temp.c.

Another way is to run e.g.

$CC $CPPFLAGS -dM -E - </dev/null | grep __SIZEOF_LONG__

This command (without the piped grep) makes GCC and derivates spit out a list of built-in macros. Only GCC and Clang based toolchains that came out during the last couple years support this though, so the static assert method should be prefered.

Checking for endianness

Unfortunately, varying platforms have provided endianness test macros in different headers. Because of that, many build system authors resorted to compiling and running a binary that does some bit tricks to determine the endianness and print a result.

However since we cannot run a binary as we want to stay cross-compile compatible , we need to find another way to get the definition. I've actually spent a lot of effort by trying dozens of compiler versions and target architectures and came up with a public domain single-header solution, that has portable fallback functions that can do endian conversions even if the detection failed, although at a slight runtime cost.

I would advise its usage, rather than trying to hack together a custom thing.

Checking for bugs and similar things

I've also come across a number of checks that required to run a testcase and therefore prevented crosscompilation from working. Mostly, these are tests for a certain bug or odd behaviour. However, it is wrong to assume that because the system the test binary currently runs on has a certain bug, the end user's system will have the same bug. The binary might for example be distributed as a package, and might suddenly start misbehaving if another component that fixes the bug is updated. Therefore the only safe and correct way to deal with this situation is to write a check that's executed when the binary is used at runtime, and then sets a flag like bug=1; and then have two different codepaths, one for a system with the bug and one for a system without it.

Cross-compile specific configuration

In GNU Autoconf, the way to tell it that you're cross-compiling is by setting a --host=triplet parameter with the triplet of the target toolchain, additional to putting the crosscompiler name into the CC environment variable. The triplet is then used to prefix all parts of the toolchain, like

RANLIB=$(triplet)-ranlib
STRIP=$(triplet)-strip

etc. For the build host, there's also a parameter called --build=triplet . If not set, the configure process will try whether gcc or cc is available, and then use that. If set, all toolchain components targeting the host you're on will be prefixed with this triplet. It can be queried by running $CC -dumpmachine. Usually, it is not necessary to set it.

Checking for the target OS

As mentioned it's hugely preferable to test for functionality rather than platform. But if you really think it's necessary to figure out the target OS, do not use uname which is totally bogus. It simply returns the OS of the compiler user, who might use an Apple computer but cross-compile for NetBSD.

You can instead derive the target OS via $CC -dumpmachine, which returns the toolchain target triplet, or by parsing the output of

$CC $CPPFLAGS -dM -E - </dev/null

Configuring paths

Knowledge about system paths is required for 2 reasons. One is that during the Installation stage we need to know where files like the compiled program binary need to be installed in. The other is that our program or library might require some external data files. For example, the program might require a database at runtime.

For this reason, a --prefix variable is passed to the configure step. On most typical linux installations --prefix=/usr would be used for a system install, whereas --prefix=/usr/local is typically used for an alternate installation from source of a package the distribution provides but for some reason is not sufficient for the user. Sabotage Linux and others use an empty prefix, i.e. --prefix=, which means that for example binaries go straight into /bin and not /usr/bin, etc. Many hand-written configure scripts get this wrong and treat --prefix= as if the user hadn't passed --prefix at all, and fall back to the default. The default, btw is traditionally /usr/local.

So in case your program needs a database, let's say leetpackage.sqlite, you would probably hardcode the following db path into your binary:

#define DB_PATH PREFIX "/share/leetpackage/leetpackage.sqlite"

where PREFIX would be set as part of CPPFLAGS or similar according to the user's selection. For more fine-grained control, traditional configure scripts also add options like --bindir, --libdir, --includedir, --mandir, --sysconfigdir, etc additional to --prefix, which, if not set, default to ${prefix}/bin, ${prefix}/lib, ${prefix}/include etc.

More on paths in the Installation chapter.

Step 2: The build

After the configuration step finished, it should have written the configuration data in some form, either a header, or a Makefile include file, which is then included by the actual Makefile (or equivalent). This should include any previously mentioned environment variables, so it is possible to login in a different shell session without any of them set, yet getting the same result when running make. Some users of GNU autotools create the Makefile from a template (usually called Makefile.in) at the end of the configure run, but I personally found this to be really impractical, because when making changes to the Makefile template, configure has to be re-run every single time. Therefore I recommend writing the settings into a file called config.mak, which is included by the Makefile.

The actual compilation is typically run by executing make, which on most systems defaults to GNU make, which is a lot more powerful than the traditional BSD makes. Its code is small and written in portable C, so it's easy to get it bootstrapped quickly on systems that don't have it yet, unlike competitors such as CMake, which is 1) written in C++ which takes a lot longer to parse than C, and 2) consists of > 1 million lines of code and 3) occupies a considerable amount of HDD space once installed. Anyway, GNU make can even be found pre-installed on the BSDs, it's called gmake there.

Here, the following conventions apply:

In order to enable a verbose build showing the full compiler command line rather than just CC foo.c for debugging purposes, a flag V=1, short for verbose, can be passed as in make V=1.
To use a parallel build, for example to compile one C or C++ file per available CPU core, the -jN flag is used, such as in make -j8 if you want to use 8 parallel processes.

If a Makefile is used for building, the build process should be tested using several parallel processes, because failure to document dependencies of files properly often results in broken parallel builds, even though they seem to work perfectly with -j1.

Do note that you should not strip binaries, ever. If the user wants his binaries stripped, he will pass -s as part of his LDFLAGS.

Step 3: Installation

The Installation is typically done using the make install command. Additionally there's an important variable that distro maintainers use for packaging: DESTDIR.

If for example, at configure time, --prefix=/usr was set, then make install DESTDIR=/tmp/foo should cause stuff to be installed into /tmp/foo/usr, so if your package compiles a binary called myprog, it should end up in /tmp/foo/usr/bin/myprog. A typical install rule would look like this:

bindir ?= $(prefix)/bin

...

install: myprog
    install -Dm 755 myprog $(DESTDIR)$(bindir)/myprog

here we use the install program to install the binary myprog to its destination with mode 755 (-m 755) and create all path compontens along the way (-D). Unfortunately, the install program shipped with some BSDs and Mac OS X refuse to implement these practical options, therefore this portable replacement implementation can be used instead.

It is a good idea and the common practice to explicitly set the permissions during the install step, because the user doing the installation might unwittingly have some restrictive umask set, which can lead to odd issues later on.

Even if the build system you intend to write does not use Makefiles, you should respect the existing conventions (unlike CMake & co which NIH'd everything) like V=1, -j8, DESTDIR, --prefix, etc.

Closing thoughts

One of the big advantages of GNU's autotools system is that, from a user's perspective, they require nothing more than a POSIX-compatible shell to execute configure scripts, and GNU make, which as already mentioned is really slim, written in portable C, and widely available while requiring less than one MB of HDD space (my GNU make 3.82 install takes 750KB total including docs).

So in my opinion, the build system of the future, in whatever language it's written in, and how many millions of lines of code it consists of, should do precisely the same: it should at least have the option to generate a configure script and a stand-alone GNU Makefile, which is shipped in release tarballs. That way only the developers of the package need the build toolkit and its dependencies installed on their machine, while the user can use the tools he already has installed, and can interface with the build system in a way he's already familiar with.

Update

19 Apr 2019 19:34 UTC - Added paragraph "Checking for the target OS"

Post or read comments...

Devs on Acid

Introduction to prefixes

Use case

The obligatory rant about pulseaudio

One size fits all

Back to my use case...

Getting our hands dirty

Prepping the rootfs for chroot use

The musl C library

My involvement

Evolution of sabotage and musl

Getting rid of libtool

Getting rid of automake

The Good Part(s)

The Solution

The Ingredients

How it works

configure.ac:

Makefile:

config.mak.in:

foo1.c, foo42.c and main.c:

Testing

Further reading

Showcase 1: GTK+3

Showcase 2: python3

Showcase 3: ip vs ifconfig

Showcase 4: ethernet adapter renaming by systemd/udev

Showcase 5: CMake, meson, and $BUILDSYSTEMOFTHEDAY

Conclusion

A Primer for build system developers and users

Basic Compilation Concepts

The compiler.

The linker.

Compiler and linker options.

Libraries and their headers

Cross-compilation

The Build Process

Step1: Configuration

Enabling optional functionality

System- or Version-specific behaviour

Checking for headers

Checking for functions in libraries

Pkg-config and derivates

2 solutions exist to make pkg-config compatible with cross-compilation:

Now comes the bummer:

Checking for sizes of things

Checking for endianness

Checking for bugs and similar things

Cross-compile specific configuration

Checking for the target OS

Configuring paths

Step 2: The build

Step 3: Installation

Closing thoughts

Update

Earlier posts