Devs on Acid

How to create a minimal debian rootfs

20 Nov 2021 05:09 UTC

Because Ubuntu has removed support for the i386 arch, I was exploring to use a Debian rootfs in order to install wine for temporary usage (wine doesn't work well with musl libc because it depends on non-portable glibc dlclose() semantics) and last time I tried to compile it it was a huge shitfest. If I was to use a 64-bit Ubuntu rootfs for this, I'd have to install many libs that are already installed for 64bit in a 32bit version, i.e. double- bloat.

Debian, unlike Ubuntu, doesn't ship a minimal base rootfs. However, one can quite easily create his own using the debootstrap tool, which consists of a single portable shell script and a directory with some shared data.

1) acquire debootstrap.

2) run the following command as root in your host distro:

DEBOOTSTRAP_DIR=XXX/usr/share/debootstrap/ XXX/usr/sbin/debootstrap --arch=i386 --variant=minbase sid DIRECTORY http://deb.debian.org/debian/

where XXX is the prefix you installed deboot and DIRECTORY where stuff is being installed to. The resulting rootfs will be around 220MB in size. Output from the above command

3) trim the fat part1, installation leftovers:

rm DIRECTORY/var/cache/apt/archives/*.deb
rm DIRECTORY/var/cache/apt/*cache.bin
rm DIRECTORY/var/cache/debconf/*.dat-old
rm -rf DIRECTORY/var/lib/apt/lists/*

4) trim the fat part2, unneeded documentation and translation:

rm -rf DIRECTORY/usr/share/doc/*
rm -rf DIRECTORY/usr/share/locale/*
rm -rf DIRECTORY/usr/share/man/*

Now your rootfs is tidied up and should be around 90MB.

6) edit DIRECTORY/etc/dpkg/dpkg.cfg and add the following 3 lines:

path-exclude=/usr/share/doc/*
path-exclude=/usr/share/locale/*
path-exclude=/usr/share/man/*

this will prevent future package installs to install unneeded things.

I have manually diffed the contents of ubuntu-base rootfs and the one created using these instructions (by looking at /var/lib/dpkg/status), and the following packages are only in the debian rootfs:

gcc-11-base gcc-9-base libcap2 libext2fs2 libgssapi-krb5-2 libk5crypto3 libkeyutils1 libkrb5-3 libkrb5support0 libnsl2 libssl1.1 libtirpc-common libtirpc3 libxxhash0 tzdata.

(full comparison)

These might not actually be needed and you may want to try to remove these on first usage of the rootfs and see whether something breaks. At least removing tzdata seems to be safe.

Prepping the rootfs for chroot use

In order to use the rootfs with a tool like bubblewrap, the following additional steps are necessary:

process is documented here. In case of a i386 rootfs at the time of this writing, you gotta replace e81decfeff8d93782400008dbb7424000085c00f94c0 with e81decfeff8d93782400008dbb7424000031c0909090 in the /usr/bin/tar binary using a hexeditor (recommended: hexedit).

If you have chown-related problems installing a specific program with apt, use my tool idfake (link in the above article).

Post or read comments...

10 years Sabotage Linux - The history of the first musl-based distro

9 Apr 2021 12:02 UTC

On Monday, 5th April 2021, Sabotage Linux had its 10 years anniversary since its first commit by Christian Neukirchen aka chris2, nowadays known as Leah Neukirchen.

It consisted of the following packages:

The build procedure consisted of building a stage0 rootfs, containing a musl targeting C compiler toolchain and a stripped down busybox binary that was barely sufficient to make it possible to chroot into the rootfs and build the rest of the packages without leaks from the host environment. GCC 3.4.6 was chosen as the stage0 compiler because it doesn't require 3rd party libraries like mpc, mpfr, gmp (these were added as a hard dependency to gcc >= 4.3), and because it is much slimmer than GCC 4+ and therefore faster to build.

Once inside the rootfs, GCC 4.5.2 was built (because the linux kernel required extensions only available in recent gccs), GNU m4 (required as a prerequisite for almost every package using GNU autotools), and GNU awk/sed because the sed and awk offered by busybox were too buggy or didn't support extensions used by other package build scripts. Perl was built because the kernel build system required to execute some perl scripts. Also busybox got built a second time with a bigger set of applets.

Everything was built with a set of shell scripts, and there wasn't even an init system yet, so sabotage could only be used inside the chrooted rootfs.

The next day with commit 6fc138a the init system based on busybox runit and the necessary /etc infrastructure was added to make sabotage almost bootable on bare metal. The boot loader extlinux was added a day later, in commit b57b0f1, marking the first version that could be booted. During the next days a couple of tweaks and packages were added (zlib, openssl, git) and on April, 9th the first bootable binary distribution was released.

Shortly after that release some people that tried out sabotage suggested improvements and Chris switched to a build system based on plan9's mk. The editor vim was added, replacing busybox's vi, as well as a couple of other packages including ncurses, bsdtar, xz, automake, and openssh. On 13th of April, another binary release was shipped that was already much more usable, one could even SSH in!

The following days saw addition of python 2.7.1, expat, and a basic set of packages to support Xorg display server. the musl package recipe was switched to the official git repo.

Sabotage was the first distro built upon musl libc and was crucial for getting musl to a point where it could be used with opensource packages from a variety of sources. During the early development chris2 was communicating issues he encountered on a hourly basis to dalias, the musl libc author and the issues got fixed almost immediately. Getting mainstream packages to build usually required hacks and patches, because back then GNU/Linux was a GLIBC-centric monoculture, and many packages used either GLIBC-specific extensions, or worse: private __ prefixed symbols and types that were never meant to be used outside of internal libc code.

Release 2011-04-18 was the first sabotage release that could be used with basic X11 windowing, and when I first tried it out. I've been idling in the #musl IRC channel (at that time consisting of 8 regulars) since a couple weeks, following its development, mainly interested in it due to its ability to create static linked binaries with minimal footprint.

The musl C library

musl is a libc, that means the C standard library. It provides the functions and headers dictated by the C standard, as well as those mandated by POSIX. It's the most important component in UNIX userspace, as almost all software interfaces with it. By doing syscalls to the Linux kernel, it's the interface between userspace and kernel space. The standard in 2011 was the GLIBC libc.

Rich Felker, musl's author, was frustrated with GLIBC because it was designed around a central idiom: dynamic linking. Static linking was only possible to a limited extent (no use of network code involving DNS lookup, as that would pull in the dlopen()ed GLIBC framework that allows to use different name lookup backends), and with a giant footprint: even a simple "hello world" resulted in a 500 KB binary. Another thing that frustrated him was the unreadability of the GLIBC source code, where often the real definition of a function is hidden behind numerous layers of abstractions.

Rich was writing his own libc on an as-needed basis to get the applications he's been interested in into a tiny static executable that could be transfered from one PC to the other and executed without having to ship a bunch of dynamic 3rd party libraries. He's been working on this for a couple years, when he decided in February 2011 to publish the first release 0.5.0 for the i386 architecture.

Musl was already of such high quality back then that he immediately attracted contributors, the first one being Nicholas J. Cain who contributed support for x86_64 already within the first week of musl's public release. Other early adopters tried it out on their favorite programs and reported issues, which were fixed almost in real-time. From 0.5.0 up to 0.7.9, a new version was released typically in less than one week.

Musl 0.7.11, released on June 28, 2011, was the first version to feature a dynamic linker.

Before that, only static linking was available.

My involvement

I started using sabotage after the 2011-04-18 release, mainly to have an isolated rootfs environment where I had a compiler toolchain targeting musl to build my own programs with it.

The alternative was to use the musl-gcc wrapper script, part of the musl release. It was a shell script starting gcc with the right options to pick up musl's include and library directories instead of the host GLIBC ones.

Chris2 continued work mainly to add support for 32bit x86, as sabotage up to that point targeted only the x86_64 architecture, culminating in the 2011-04-30 release. Then he stopped doing any work on sabotage for about 2 months.

At that point it had become clear that we would need dynamic linking at some point, because of the following reasons:

I suspect Chris2 had never intended for sabotage to support dynamic linking, and was frustrated with this situation, causing him to abandon the project. Meanwhile the sabotage users Josiah Worcester aka pikhq (nowadays known as Ada Worcester) and myself were pushing Rich to write a dynamic linker so the above could be addressed, resulting in the 0.7.11 release of musl. It was pikhq who then added basic dynamic linking support to his sabotage fork, which I ultimately picked up to start hacking on sabotage on my own, after a PR I filed in early may was ignored for almost 2 months, and nothing else happened on Chris2's repo. Meanwhile pikhq started to work on his own distro project called bootstrap linux.

It was on July 19th that I decided to do my own thing, as there were a couple issues I had with the upstream way of building packages and reckoning that upstream seemed pretty much dead anyway. First I fixed a couple things that were buggy, and added a couple of packages. 2 days later pikhq merged my changes into his master, and another 2 days later Chris2 suddenly started hacking on sabotage again, yet he merged only the changes made by pikhq but none of mine. He bumped a couple of package revisions, but his activity on sabotage stalled for a second time on July 29th, this time for good.

The major problem I had with the old build system was that it didn't pick up from where it left off when things went wrong. One had to compile the whole set of packages over and over.

I had previous experience with a build-from-source package manager on MacOS, MacPorts, and was frustrated that it did things in a strictly serial way. For example, if one was to build a package with 10 dependencies, the first dependency would be downloaded, then built, then the next dependency downloaded, and so on. Clearly, if one has a slow internet connection, it's much preferable to download package2 while package1 is building, or even better, completely detach the download and build steps, and try to download several packages at once to saturate the available bandwidth, and start building as soon as the first download is complete.

This was the design I had in mind for my new package manager called butch.

Mid September, when it became clear that Chris2's sabotage was abandoned for good, I started hacking on butch and it was finished the next day, on 19th.

The package manager had a new package format, that was composed of ini-style sections, with a build section containing the build instructions, and another section listing download mirrors for source code release tarballs and their checksum for integrity checks.

For greater flexibility, the build section was merged with a shell-script template and executed once the tarball was available. That meant one could adjust things like CFLAGS and other things from a single template rather than hardcoding them into every recipe. It also allowed me to experiment with the location things get installed to.

Another major change that my package manager introduced was per-package installation directories. Ever since I started using Linux I was confused by which file belongs to which package, so my design was to create one directory per package in /opt, .e.g /opt/ncurses, and then symlink the files in there into the main FS root via a relative symlink. This allows one to do a simple ls -la on a file and immediately know which package it belongs too:

$ ls -la /bin/tic
lrwxrwxrwx    1 root     root            28 Jan 18 21:42 /bin/tic -> ../opt/netbsd-curses/bin/tic

But also, to remove a package by simply removing its directory in /opt.

Meanwhile, the butch package manager has been rewritten in POSIX sh in lieu of C, which makes it much more hackable.

Apart from that, sabotage to this day still follows the initial philosophy and file system layout and init system created by Chris2 during the first week of sabotage's existence.

Evolution of sabotage and musl

Sabotage Linux was the only major distro based on musl libc for several years, only in 2014 alpine linux joined the ranks. Until that happened, it's been mainly my feedback to Rich about issues I encountered that turned musl into a libc ready for prime-time. I filed uncountable bug reports to 3rd party packages that relied on buggy behaviour of GLIBC or used GLIBC-only extensions or internal types/data structures/functions, and got quite some of them to use a different, more portable approach. When Alpine Linux joined the ranks, most of the pioneer work was already done, including making musl compatible to GCC's libstdc++. Before that, sabotage was strictly C-only and one of the major issues I faced was that some required C libraries required the build system CMake written in C++. I ended up writing custom Makefiles for some projects only to make them buildable in my C++-less distro.

Fortunately, back in the day almost the entire Linux FOSS infrastructure was based on C, so it was relatively easy to bootstrap most things from source. This is quite different to today with Rust zealots starting to rewrite critical library components in Rust, which is basically almost impossible to bootstrap from source, and only supports a small subset of the architectures supported by sabotage.

I refuse to add Rust to sabotage, and am asking myself whether Rust and the accompanying security theater was created to fragment the FOSS ecosystem, and weaken the status of the C programming language, which is the underlying cause for the huge success and performance, stability and resource-efficiency of the UNIX operating system. The leaked halloween documents prove without a shadow of a doubt that M$ saw Linux/FOSS (already in 1998) as a huge threat to their market monopoly and sought ways to undermine it. Certainly they didn't stop after the leak and were seeking ever new methods to achieve their goal of weakening the FOSS movement. A collaboration with Mozilla and Google (Go with its online micro-dependency concept) seems possible. Just make it too hard to build stuff from source and FOSS will exist only in name.

Meanwhile even GCC switched to C++ as its implementation language as of GCC 4.8. Had that been the case in 2011, it's easy to imagine that a distro based on musl would've given up already during infancy.

During the years, I made sabotage compile on a big variety of architectures, at first using QEMU to build in a native environment, later by adding support for cross-compilation. I even contributed support for powerpc and x32 architectures to musl.

Once Alpine Linux joined the ranks of distros using musl, I back-pedaled my involvement quite a bit, figuring that alpine with its big number of contributors could take over the job of filing upstream bug reports and playing guinea pig for new musl releases.

Alpine Linux got hugely successful once it was chosen as the standard distro for Docker images due to its small footprint and binary package manager, attracting even more users to musl. Other distros like Void Linux joined.

Even though many projects and desktop linux distros still target GLIBC only, musl has become a serious competitor and willingness of upstreams to support it has considerably increased. It is meanwhile used by many projects, even the WebAssembly workgroup has chosen it as its C library implementation as it was already adopted by emscripten.

Sabotage itself always stayed a niche project, since I didn't spend any effort on advertising it or creating a polished website to attract new users, therefore it was most of the time a one-man-show, even though many contributors appeared and disappeared over time. Apart from myself, only AequoreaVictoria who also happens to provide the build server hosting has been with the project since 2012 with regular contributions.

Yet, it still is one of the most stable, mature and versatile musl distros available, and probably the easiest way to get a usable and slim distro cross-compiled for any new architecture or embedded hardware project.

During its development a number of side-projects were released that allowed to side-step the need of bloated dependencies, most notably gettext-tiny and netbsd-curses, which are now used by a number of other distros, but also things like atk-bridge-fake which allows to build GTK+3 without dbus dependency.

Post or read comments...

Modern autotools

9 Mar 2021 14:28 UTC

GNU autotools aka the GNU Build System is a build system designed to produce a portable source code package that can be compiled about everywhere.

The intentions are good: when properly used, a configure script is generated that runs everywhere a POSIX compatible shell is available, and a Makefile that can be used everywhere a make program is available. No further dependencies are required for the user, and the process to build source with ./configure, make and make install is well-established and understood.

From the developer's perspective though, things look a bit different. In order to create the mentioned configure script and Makefile, autotools uses the following 3 main components:

To use them, the developer needs perl and gnu m4 installed in addition to the tools themselves, as well as a basic understanding of m4, shell scripting, Makefiles, and the complex interaction between autoconf and automake.

He also needs a lot of time and patience, because each change to the input files requires execution of the slow autoreconf to rebuild the generated sources, and running ./configure and make for testing.

Libtool is a shell script wrapper around the compiler and the linker with >9000 lines, which makes every compiler invocation about a 100 times slower. It is notorious for breaking static linking of libraries and cross-compilation due to replacing e.g. "-lz" with "/usr/lib/libz.so" in the linker command. Apart from being buggy and full of wrong assumptions, it's basically unmaintained (last release was 6 years ago).

While there's reasonably complete documentation for autoconf and automake available, it is seriously lacking in code examples, and so many supposedly simple tasks become a continuous game of trial and error.

Due to all of the above and more, many developers are overwhelmed and frustrated and rightfully call autotools "autocrap" and switch to other solutions like CMake or meson.

But those replacements are even worse: they trade the complexity of autotools on the developer side against heavy dependencies on the user side.

meson requires a bleeding edge python install, and CMake is a huge C++ clusterfuck consisting of millions of LOC, which takes up >400 MB disk space when built with debug info. Additionally meson and cmake invented their own build procedure which is fundamentally different from the well-known configure/make/make install trinity, so the user has to learn how to deal with yet another build system.

Therefore, in my opinion, the best option is not to switch to another build system, but simply to only use the good parts of autotools.

Getting rid of libtool

It's kinda hard to figure out what libtool is actually good for, apart from breaking one's build and making everything 100x slower. The only legitimate usecase I can see is executing dynamically linked programs during the build, without having to fiddle around with LD_LIBRARY_PATH. That's probably useful to run testcases when doing a native build (as opposed to a cross-compile), but that can also be achieved by simply statically linking to the list of objects that need to be defined in the Makefile anyhow. Libtool being invoked for every source file is the main reason for GNU make's reputation of being slow. If GNU make is properly used, one would need to compile thousands of files for a noticeable difference in speed to the oh-so-fast ninja.

Getting rid of automake

Automake is a major pain in the ass.

Makefiles are generated by the configure script by doing a set of variable replacements on Makefile.in, which in turn is generated by automake from Makefile.am on the developer's end.

The only real advantage that automake offers over a handwritten Makefile is that conditionals can be used that work on any POSIX compatible make implementation, since those are still not standardized to this day, and that dependency information on headers is generated automatically. The latter can be implemented manually using the -M options to gcc, like -MMD.

The Good Part(s)

The only good part of autotools is the generated portable configure script, and the standard way of using the previously mentioned trinity to build. The configure script is valuable for many reasons:

Additionally to the above, autoconf-generated configure scripts have some useful features built in:

On the other hand generated configure scripts tend to be quite big, and, as they are executed serially on a single CPU core, rather slow.

Fortunately this can be fixed by removing the vast majority of checks, just assume C99 support as a given and move on. While you're at it, throw out that check for a 20 year old HP-UX bug, please.

The Solution

A modern project using autotools should only use autoconf to generate the configure script, and a single top-level Makefile that's handwritten. Build configuration can be passed to the Makefile using a single file that's included from the Makefile. The number of configure checks should be reduced to the bare minimum, there's little point in testing e.g. for the existence of stdio.h which is standardized since at least C89, especially if then later the preprocessor macro HAVE_STDIO_H isn't even used and stdio.h is included unconditionally. Used this way, the configure script will be quick in execution and small enough to be included in the VCS which allows the users to checkout and build any commit without having to do the autotools dance. A good guide for writing concise configure scripts is available here.

As for conditionals in make, I'm pretty much in favor of simply assuming GNU make as a given and using its way to do conditionals.

It's in widespread use (default make implementation on any Linux distro) and therefore available as gmake even on BSD installations that usually prefer their own make implementation. Apart from that it's lightweight (my statically linked make 3.82 binary has a mere 176 KB) and one of the most portable programs around.

The alternative is to target POSIX make and do the conditionals using automake-style text substitutions in the configuration file produced by the configure run.

The Ingredients

Our example project uses the following files: configure.ac, Makefile, config.mak.in, main.c, foo1.c and foo42.c with the following contents respectively.

configure.ac:

AC_INIT([my project], 1.0.0, maintainer@foomail.com, myproject)
AC_CONFIG_FILES([config.mak])
AC_PROG_CC
AC_LANG(C)

AC_ARG_WITH(foo,
    AS_HELP_STRING([--with-foo=1,42], [return 1 or 42 [1]]),
    [foo=$withval],
    [foo=1])

AC_SUBST(FOO_SOURCE, $foo)
AC_OUTPUT()

Makefile:

include config.mak

OBJS=main.o $(FOO).o
EXE=foo-app

all: $(EXE)

$(OBJS): config.mak

$(EXE): $(OBJS)
    $(CC) -o $@ $(OBJS) $(LDFLAGS)
clean:
    rm -f $(EXE) $(OBJS)

install:
    install -Dm 755 $(EXE) $(DESTDIR)$(BINDIR)/$(EXE)

.PHONY: all clean install

config.mak.in:

# whether to build foo1 or foo42
FOO=foo@FOO_SOURCE@

PREFIX=@prefix@
BINDIR=@bindir@

CFLAGS=@CFLAGS@
CPPFLAGS=@CPPFLAGS@
LDFLAGS=@LDFLAGS@

main.c:

#include <stdio.h>
extern int foo();
int main() {
    printf("%d\n", foo());
}

foo1.c:

int foo() { return 1; }

foo42.c:

int foo() { return 42; }

You can get these files here.

After having the files in place, run autoreconf -i to generate the configure script. You'll notice that it runs unusually quickly, about 1 second, as opposed to projects using automake where one often has to wait for a full minute.

The configure script provides the usual options like --prefix, --bindir, processes the CC, CFLAGS, etc variables exported in your shell or passed like

CFLAGS="-g3 -O0" ./configure

just as you'd expect it to, and provides the option --with-foo=[42,1] to let the user select whether he wants the foo42 or foo1 option.

How it works

configure.ac:

AC_CONFIG_FILES([config.mak])

Here we instruct autoconf that config.mak is to be generated from config.mak.in when it hits AC_OUTPUT() (which causes config.status to be executed). It will replace all values that we either specified with AC_SUBST(), or the built-in defaults like prefix and bindir (see config.status for the full range) with those specified by the user.

AC_ARG_WITH(...)

This implements our --with-foo multiple choice option. You can read about how it works in the usual autoconf documentation. Other autoconf macros that you will find handy include AC_CHECK_FUNCS, AC_CHECK_HEADERS, AC_CHECK_LIB, AC_COMPILE_IFELSE to implement the various checks that autoconf offers, as well as AC_ARG_ENABLE to implement the typical --enable/--disable switches.

AC_SUBST(FOO_SOURCE, $foo)

This replaces the string @FOO_SOURCE@ in config.mak.in with the value assigned by the user via the AC_ARG_WITH() statement, when config.mak is written.

The rest of the contents in configure.ac are the standard boilerplate for C programs.

Makefile:

include config.mak

This statement in Makefile includes the config.mak generated by configure. If it is missing, running make will fail as it should. config.mak will provide us with all the values in config.mak.in, where each occurence of @var@ is replaced with the results of the configure process.

OBJS=main.o $(FOO).o

This sets OBJS to either main.o foo1.o or main.o foo42.o, depending on the choice of the user via the --with-foo switch. We didn't even have to use conditionals for it.

$(OBJS): config.mak

We let $(OBJS) depend on config.mak, so they're scheduled for rebuild when the configuration was changed with another ./configure execution, as the user might have changed his CFLAGS or --with-foo setting to something else. For bonus points, you could put all build-relevant settings into e.g. config-build.mak, and directory-related stuff stuff into config-install.mak (unless you hardcode directory names into the binary) and make the dependency to config-build.mak only.

    install -Dm 755 $(EXE) $(DESTDIR)$(BINDIR)/$(EXE)

Two important things about this line:

The rest of the Makefile contents are pretty standard. You might notice the absence of a specific rule to build .o files from .c, we use the implicit rule of make for this purpose.

config.mak.in:

FOO=foo@FOO_SOURCE@

config.status will replace the string @FOO_SOURCE@ with either 1 or 42, depending on which --with-foo option was used (1 being the default), shortly before configure terminates and writes config.mak. The values for @CFLAGS@ and the other variables will be replaced with the settings the configure scripts defaults to or those set by the user.

foo1.c, foo42.c and main.c:

... should be self-explanatory.

Testing

You can run ./configure && make now and see that it works - foo-app is created, and make DESTDIR=/tmp/foobar install installs foo-app into /tmp/foobar/bin/foo-app. ./configure --with-foo=42 && make should cause the foo-app binary to print 42 instead of 1.

Further reading

If you want to learn more about the build process and especially how it works in regard to cross-compilation, you can check out my article Mastering and designing C/C++ build systems.

Post or read comments...

Lua vs QuickJS

17 Nov 2020 23:17 UTC

Since I didn't find any performance comparison of QuickJS and Lua, and having a lot of spare time due to being locked into my flat due to the WHO plandemic, I decided to do my own little benchmark.

As there's already a benchmark comparing QuickJS and the V8 JIT, this also let's us reason how Lua would fare in comparison to a heavily tuned JIT.

We compare the performance for "fannkuch redux", a well-established benchmark program which originally featured on debian's shootout game, which was later redacted because it was not seen as "politically correct" to compare speed. Implementors of slow languages felt offended, and successfully lobbied to take it down.

javascript version to test QuickJS requires a small modication from arguments[0] to scriptArgs[1] for quickjs.

lua version

Since the js version is a transliteration of the lua code, this test couldn't be any fairer.

Tested versions are QuickJS version 2020-11-08 and Lua 5.3.6 Copyright (C) 1994-2020 Lua.org, PUC-Rio, running on an AMD Ryzen 7 3700X 8-Core Processor, but on only one core.

Results for QuickJS and Lua compiled with -Os:

$ time qjs fannkuch.js 11
556355
Pfannkuchen(11) = 51

real    0m41.560s
user    0m41.558s
sys     0m0.002s

$ time lua5.3 fannkuch.lua 11
556355
Pfannkuchen(11) = 51

real    0m40.585s
user    0m40.584s
sys     0m0.000s

Results for QuickJS and Lua compiled with -O3 each:

$ time qjs fannkuch.js 11
556355
Pfannkuchen(11) = 51

real    0m40.950s
user    0m40.946s
sys     0m0.000s

$ time lua5.3 fannkuch.lua 11
556355
Pfannkuchen(11) = 51

real    0m35.973s
user    0m35.973s
sys     0m0.000s

When optimized for size, Lua and QuickJS perform almost identically, whereas Lua is slightly faster (about 15%) when compiled for speed. That's a quite impressive result for QuickJS, as Lua is known to be one of the fastest non-jitted programming languages out there and has been around for a while, whereas QuickJS is the new kid on the block. QuickJS does spend a lot more time building though: using make -j 16 it takes 16 seconds, Lua compiles in one second.

Of course this little benchmark tests only a small subset of the functionality of those language interpreters, and is by no means conclusive, but IMO it's sufficient to give a good picture of the ballpark they're in.

An interesting addition would be to compare against the performance of the Dino language which is the fastest scripting language I'm aware of (with JIT turned off).

Post or read comments...

When "progress" is backwards

20 Oct 2020 15:58 UTC

Lately I see many developments in the linux FOSS world that sell themselves as progress, but are actually hugely annoying and counter-productive.

Counter-productive to a point where they actually cause major regressions, costs, and as in the case of GTK+3 ruin user experience and the possibility that we'll ever enjoy "The year of the Linux desktop".

Showcase 1: GTK+3

GTK+2 used to be the GUI toolkit for Linux desktop applications. It is highly customizable, reasonably lightweight and programmable from C, which means almost any scripting language can interface to it too.

Rather than improving the existing toolkit code in a backwards-compatible manner, its developers decided to introduce many breaking API changes which require a major porting effort to make an existing codebase compatible with the successor GTK+3, and keeping support for GTK+2 while supporting GTK+3 at the same time typically involves a lot of #ifdef clutter in the source base which not many developers are willing to maintain.

Additionally GTK+3 made away with a lot of user-customizable themeing options, effectively rendering useless most of the existing themes that took considerable developer effort for their creation. Here's a list of issues users are complaining about.

Due to the effort required to port a GTK+2 application to use GTK+3, many finished GUI application projects will never be ported due to lack of manpower, lost interest of the main developer or his untimely demise. An example of such a program is the excellent audio editor sweep which has seen its last release in 2008. With Linux distros removing support for GTK+2, these apps are basically lost in the void of time.

The other option for distros is to keep both the (unmaintained) GTK+2 and GTK+3 in their repositories so GTK+2-only apps can still be used, however that causes the user of these apps to require basically the double amount of disk and RAM space as both toolkits need to live next to each other. Also this will only work as long as there are no breaking changes in the Glib library which both toolkits are built upon.

Even worse, due to the irritation the GTK+3 move caused to developers, many switched to QT4 or QT5, which requires use of C++, so a typical linux distro now has a mix of GTK+2, GTK+3, GTK+4, QT4 and QT5 applications, where each toolkit consumes considerable resources.

Microsoft (TM) knows better and sees backwards compatibility as the holy grail and underlying root cause of its success and market position. Any 25 year old Win32 GUI application from the Win95 era still works without issues on the latest Windows (TM) release. They even still support 16bit MS-DOS apps using some built-in emulator.

From MS' perspective, the freedesktop.org decision makers played into their hands when they decided to make GTK+3 a completely different beast. Of course, we are taught to never believe in malice but in stupidity, so it is unthinkable that there was actually a real conspiracy and monetary compensations behind this move. Otherwise we would be conspiracy theorist nuts, right ?

Showcase 2: python3

Python is a hugely successful programming/scripting language used by probably millions of programmers.

Whereas python2 development has been very stable for many years, python3 changes at the blink of an eye. It's not uncommon to find that after an update of python3 to the next release, existing code no longer works as expected.

Many developers such as myself prefer to use a stable development environment over one that is as volatile as python3.

With the decision to EOL python2 thousands of py2-based applications will experience the same fate as GTK+2 applications without maintainer: they will be rendered obsolete and disappear from the distro repositories. This may happen quicker than one would expect, as python by default provides bindings to the system's OpenSSL library, which has a history of making backwards-incompatible changes. At the very least, once the web agrees on a new TLS standard, python2 will be rendered completely useless.

Porting python2 to python3 isn't usually as involving as GTK+2 to GTK+3, but due to the dynamic nature of python the syntax checker can't catch all code issues automatically so many issues will be experienced at runtime in cornercases, causing the ported application to throw a backtrace and stopping execution, which can have grave consequences.

Many companies have millions of line of code still in python2 and will have to produce quite some sweat and expenses to make it compatible to python3.

Showcase 3: ip vs ifconfig

Once one had learned his handful of ifconfig and route commands to configure a Linux' box network connections, one could comfortably manage this aspect across all distros. Not any longer, someone had the glorious idea to declare ifconfig and friends obsolete and provide a new, more "powerful" tool to do its job: ip.

The command for bringing up a network device is now ip link set dev eth1 up vs the older ifconfig eth1 up. Does this really look like progress? Worst, the documentation of the tool is non-intuitive so one basically has to google for examples that show the translation from one command to the other.

The same critics apply to iw vs iwconfig.

Showcase 4: ethernet adapter renaming by systemd/udev

Latest systemd-based distros come up with network interface names such as enx78e7d1ea46da or vethb817d6a, instead of the traditional eth0. The interface names assigned by default on Ubuntu 20 are so long a regular human can't even remember them, any configuration attempt requires one to copy/paste the name from ip a output. Yet almost every distro goes along with this Poettering/freedesktop.org-dictated nonsense.

Showcase 5: CMake, meson, and $BUILDSYSTEMOFTHEDAY

While the traditional buildsystem used on UNIX, autoconf, has its warts, it was designed in such a way that only the application developer required the full set of tools, whereas the consumer requires only a POSIX compatible shell environment and a make program.

More "modern" build systems like cmake and meson don't give a damn about the dependencies a user has to install, in fact according to this, meson authors claimed it to be one of their goals to force users to have a bleeding edge version of python3 installed so it can be universally assumed as a given.

CMake is written in C++, consists of 70+ MB of extracted sources and requires an impressive amount of time to build from source. Built with debug information, it takes up 434 MB of my harddisk space as of version 3.9.3. It's primary raison-d'etre is its support for the Microsoft (TM) Visual Studio (R) (TM) solution files, so Windows (TM) people can compile stuff from source with a few clicks.

The two of them have in common that they threw over board the well-known user interface to configure and make and invented their own NIH solution, which requires the user to learn yet another way to build his applications.

Both of these build systems seem to have either acquired a cult following just like systemd, or someone is paying trolls to show up on github with pull requests to replace GNU autoconf with either of those, for example 1 2 . Interestingly also, GNOME, which is tightly connected to freedesktop.org, has made it one of its goals to switch all components to meson. Their porting effort involves almost every key component in the Linux desktop stack, including cairo, pango, fontconfig, freetype, and dozens of others. What might be the agenda behind this effort?

Conclusion

We live in an era where in the FOSS world one constantly has to relearn things, switch to new, supposedly "better", but more bloated solutions, and is generally left with the impression that someone is pulling the rug from below one's feet. Many of the key changes in this area have been rammed through by a small set of decision makers, often closely related to Red Hat/Gnome/freedesktop.org. We're buying this "progress" at a high cost, and one can't avoid asking oneself whether there's more to the story than meets the eye. Never forget, Red Hat and Microsoft (TM) are partners and might even have the same shareholders.

Post or read comments...

Speeding up static regexes in C using re2r and ragel

16 Oct 2020 00:16 UTC

While working on tinyproxy I noticed that its config file parser got notoriously slow when processing big config files with several thousand lines (for example Allow/Deny directives).

The config parser uses a set of static POSIX ERE regexes which are compiled once using regcomp(3p) and then executed on every single line via regexec(3p).

For example, the regex for the "Allow" directive is

(((([0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)(/[0-9]+)?)|(((([0-9a-fA-F:]{2,39}))|(([0-9a-fA-F:]{0,29}:([0-9]+[.][0-9]+[.][0-9]+[.][0-9]+))))(/[0-9]+)?))|([-A-Za-z0-9._]+))

which consists of the more readable parts

"(" "(" IPMASK "|" IPV6MASK ")" "|" ALNUM ")"

as defined using some CPP macros in the source code.

So basically the regex matches either an ipv4 address with a netmask like 10.0.0.0/8, an ipv6 with a netmask, or an alphanumeric domain name.

Parsing 32K lines with Allow statements using the libc's regexec function took about 2.5 seconds, which made me wonder whether we could get this a little bit faster.

POSIX regexec() has the following signature:

int regexec(const regex_t *restrict preg, const char *restrict string,
    size_t nmatch, regmatch_t pmatch[restrict], int eflags);

preg is the compiled regex, string the string to match, nmatch the maximum number of matching groups, and pmatch an array of end/start indices into the string, corresponding to matching groups. Matching groups are the parts enclosed inside parens in the regex. This is a very practical feature as it allows to easily extract submatches.

My idea was to write a wrapper around re2c or ragel (both of which compile a fast finite state automaton), which automatically turns a POSIX-compatible ERE expression into the expected format and generates a regexec()-like wrapper function that provides the same convenient submatch array.

For evaluation, I first created a manual re2c conversion of (a predecessor of) the above "Allow" regex, however that resulted in almost 10K (!) lines of C code emitted. Re2c input

Next I tried the same thing with ragel, and to my pleasant surprise the resulting C code was only a little over 900 lines, i.e. 10% of re2c. Ragel input

This made it quite clear that ragel was the winner of the competition.

After spending some more effort, the product was named re2r (regex to ragel) and is available here.

re2r accepts input on stdin, a machine name followed by a space and a regex per line. For example (from tinyproxy):

logfile "([^"]+)"
pidfile "([^"]+)"
port ([0-9]+)
maxclients ([0-9]+)

which generates the following code:

re2r helpfully prints the message:

 diagnostics: maximum number of match groups: 2

more about that in a minute.

As a size optimization, for multiple identical regexes, the wrapper for that machine simply calls the wrapper for the machine with the identical regex, e.g. re2r_match_pidfile() calls re2r_match_logfile().

The prototype for our regexec()-like match functions looks like:

RE2R_EXPORT int re2r_match_logfile(const char *p, const char* pe, size_t nmatch, regmatch_t matches[]);

RE2R_EXPORT needs to be defined by the user to either "static" or "extern", depending on how he needs the visibility of the function. re2r_match_logfile is the function name generated for the named regex "logfile".

p is a pointer to the start of the string to be matched, and pe to the end (usually it can be defined as as p+strlen(p)). nmatches is just like in the POSIX regexec() signature the maximum number of items that can be stored in the matches array, which is optimally of the size that our diagnostic line earlier notified us about (here: 2). The matches array is of type regmatch_t[] (thus we need to include the header regex.h to get the definition) and it must consist of nmatches items.

Now we only need to run ragel on the re2r output to get a heavily optimized matcher function that returns almost identical results to using the same regex/ string with POSIX regcomp()/regexec(), while having an almost identical function signature, so it's straightforward to replace existing code.

As a trick, the plain output of re2r can be directly compiled using gcc -include regex.h -DRE2R_EXPORT=extern -c foo.c after running ragel on it, without having to embed/include it in other source files.

In the case of tinyproxy, parsing the 32K allow statements using the re2r/ragel reduced the runtime from 2.5 seconds to a mere 236 milliseconds.

re2r also ships a testing tool called re2r_test which can be used as follows:

re2r_test -r "((foo)|bar(baz))"

which then waits for test input on stdin. upon entering "foo", we get the following output:

---------- RE2R  ----------
0: foo
1: foo
2: foo
((foo)|bar(baz))
12   2    3   31
12   2         1
---------- POSIX ----------
0: foo
1: foo
2: foo
((foo)|bar(baz))
12   2    3   31
12   2         1

The first block is the output from the re2r matcher function, the other from POSIX regexec(). The 0, 1, 2 positions show the extracted match groups, then the regex is displayed followed by 2 lines that show

1) the offsets of all possible matching groups, and 2) the matching groups that actually matched.

In this case only the matching group 1 (outer parens pair) and 2 (foo) matched.

Note that POSIX always makes a matching group 0 available, which has start and end offsets of the entire string if it was successfully matched.

If we now enter "barbaz", we get:

---------- RE2R  ----------
0: barbaz
1: barbaz
3: baz
((foo)|bar(baz))
12   2    3   31
1         3   31
---------- POSIX ----------
0: barbaz
1: barbaz
3: baz
((foo)|bar(baz))
12   2    3   31
1         3   31

In this case, we don't have a match for matching group 2, but one for 3. Group 1 matches again, as it surrounds the entire expression.

Note that while re2r itself is GPL licensed, the code it emits is public domain.

I hope that re2r will be helpful in the adoption of fast ragel parsers into C projects, and believe that re2r_test can be a generally useful tool to visualize regexes and matching groups on the terminal.

The result of the re2r/ragel work on tinyproxy can be evaluated in the ragel branch.

Post or read comments...

Restoring accidentally deleted files on Linux

02 May 2019 22:27 UTC

Doh. Through a tiny bug in a Makefile auto-generated by my build system rcb2, I accidentally deleted the C source file I've been working on for almost an hour, and which wasn't checked into git yet.

Fortunately, I know the basic steps to restore a file* in a filesystem-agnostic way.

These are:

First of all though, I sent a SIGSTOP signal to firefox, the most volatile process on my desktop, to prevent it from writing any files onto my harddisk while the restoration was in process, potentially overwriting the blocks occupied by the deleted file. I did this via an extension I wrote for my Window Manager openbox, which adds a menu item "Pause and iconify" to the popup menu on the titlebar of all windows. I usually use this to prevent Firefox from consuming CPU and draining my laptop's battery, while I'm traveling. Other than that, there's almost nothing running on a typical sabotage linux box which could interfere via constant disk writes, unlike GNU/Linux systems with systemd and a gazillion of background daemons installed.

Then I opened /dev/mapper/crypt_home, the blockdevice containing my /home filesystem in my favorite hexeditor, went on the ascii tab on the right side, and started a search for a string I knew was only in that new C file, which was <openDOW/vec2f.h> since I used that file in a hackish way via an include statement.

After hitting ENTER in hexedit's search dialog, CPU usage went to 100%, and it slowly crunched its way through the encrypted harddisk's blockdevice mapper. I left my computer to brew a coffee, and came back after about 5 minutes. From the current offset displayed, I figured that the search was currently only 40GB into the blockdevice. Many more GBs to go, since the file could be at the very end of the SSD. After another break of about 10 mins, I was lucky enough and the string was found at offset 0x13c6ffa0ab, at about 84 GB into the blockdevice.

Using pageup/pagedown in hexedit, the beginning and end offsets of the source file were quickly found. They were 0x13C6FF1FFC and 0x13C6FFB472, respectively.

dd if=/dev/mapper/crypt_home of=/dev/shm/dump bs=1 skip=$((0x13C6FF1FFC)) count=$((0x13C6FFB472 - 0x13C6FF1FFC))

did the rest to restore the file onto /dev/shm, the ramdrive.

Since my SSD is usually a lot faster than this, I decided to write a program to speed up future searches. The plan is simple, one needs to read from the filesys in large chunks, so the time spent in syscalls is neglible, and then search over the memory chunks using an optimized algorithm that compares word-at-a-time, just like musl's memmem() function does. Plus some more logic to find the searchterm even across chunk boundaries. The result can be found here in a small C program.

And indeed, it is a lot faster than hexedit.

# time ./fastfind /dev/mapper/crypt_home '<openDOW/vec2f.h>'
curr: 0x13498f0000
bingo: 0x13c6ffa0ab
^CCommand terminated by signal 2
real    1m 4.26s
user    0m 20.35s
sys     0m 19.38s

at 64 seconds total, it crunched through the blockdevice at a rate of 1.2GB/sec, at least 10x faster than hexedit.

So for future undelete tasks, my fastfind utility will become the first stop, to find an offset, which will then be followed by my good old friend hexedit to find beginning and end position in the neighbourhood of that offset, and to be finished off with dd.

*: This approach works well for smaller files, whereas bigger ones are usually spread over several non-adjacent blocks.

Post or read comments...

Mastering and designing C/C++ build systems

19 Apr 2019 10:36 UTC

A Primer for build system developers and users

As the maintainer of sabotage linux, a distro compiled from source, with >1000 packages, and being involved in the development of musl libc, I've seen a wide variety of odd build systems, or regular build systems used in an odd way. Which resulted in lots of issues trying to get other people's packages building.

The vast majority of build system coders and developers using these build systems for their packages do not understand in detail how their toolchains are supposed to be used, and especially cross-compilation is a topic the majority of people knows nothing about. The intent of this blog post is to explain the basic mechanisms, to change this situation.

But first, let's establish the meaning of some terms. From here on, the term user will be used to mean the person trying to compile your software package from source. We're not concerned here about people using the compilation result via a binary package.

Now we will first take a quick look at the basic concepts involved in compilation, followed by the typical 3 stages of a build process, which are: Configuration, Compilation, Installation.

Basic Compilation Concepts

So in order to get your program compiled on a variety of different hosts, you typically need to interface with the following components:

The compiler.

For the C programming language, the convention is that on the user's system there's a C compiler installed in the default search PATH with the name cc. It can be overridden with the environment variable CC.

so if CC is set to clang, the build system should use clang instead of cc.

A sanely designed build system does something along the lines of:

if is_not_set($CC): CC = cc

For C++, the default binary name is c++ and the environment variable CXX.

Note that the user may choose to set CC or CXX to something that includes multiple items, for example CC=powerpc-gcc -I/tmp/powerpc/include.

Therefore, in a shell script, when you want to use the CC command to compile something, the $CC variable needs to be be used unquoted, i.e. $CC and not "$CC" since the latter would force the shell to look for a binary with the spaces inside the filename.

(For the record, the compiler is the program that turns your sourcecode into an object file, e.g. cc foo.c -c -o foo.o)

The linker.

Fortunately with C and C++, unless you do highly unusual things, you will never have to invoke the linker directly. instead you can simply use CC or CXX and they will know from the context that a linker is needed, and call the linker themselves. (For the record, the linker is what takes a couple .o files and turns them into an executable or a shared library, e.g.: cc foo.o bar.o -o mybinary.elf)

Compiler and linker options.

There will be a couple options you will have to use so the compilation works in a certain way. For example, your code may require the flag -std=c99 if you use C99 features.

Additionally, the user will want or need to use certain flags. For this purpose, the environment variable CFLAGS is used.

If the user didn't specify any CFLAGS himself, you may decide to set some sane default optimization flags (the default for GNU autoconf packages is -O2 -g -Wall). The CFLAGS used for the compilation should always put the user-set CFLAGS last in the command line, so the user has the ability to override some defaults he doesn't like. The following logic describes this:

REQUIRED_CFLAGS=-std=c99
CFLAGS_FOR_COMPILE=$(REQUIRED_CFLAGS) $(CFLAGS)

For C++, these flags are called CXXFLAGS, and the logic is precisely the same.

There's also CPPFLAGS, which is used for preprocessor directives such as -DUSE_THIS_FEATURE -DHAVE_OPENGL and include directories for header lookup. More about headers soon. Again, user-supplied CPPFLAGS need to be respected and used after the CPPFLAGS the build system requires.

Last but not least we have LDFLAGS, these are flags used at link time. It contains things such as -L linker library search path directives, -lxxx directives that specify which libraries to link against, and other linker options such as -s (which means "strip the resulting binary"). Here, again, the rule is to respect user-provided LDFLAGS and put them after your own in the linker command.

From here on, whenever we talk about cc or CC or CFLAGS, the exact same applies to c++, CXX and CXXFLAGS for C++.

Libraries and their headers

When writing code in C or C++, you necessarily need to use libraries installed on the end users machine. At least, you would need to use the C or C++ standard library implementation. The former is known as libc, the latter as libstdc++ or libc++. Optionally some other libraries, such as libpng may be needed.

In compiled form, these libraries consist of header files, and the library itself, as either static (.a archive) or dynamic library (.so, .dynlib, .dll). These headers and libs are stored in a location on your user's machine, which is typically /usr/include for headers and /usr/lib for libraries, but this is none of your concern. It's the job of the user to configure his compiler in such a way that when you e.g. #include <stdio.h> it works (usually the user uses his distro-provided toolchain which is properly set up).

Cross-compilation

Cross-compilation means that you compile for a different platform than the one you're using, for example if you want to compile ARM binaries for your raspberry pi from your x86_64 desktop.

It's not really much different than regular compilation, you pass your compiler name as CC, e.g. CC=armv7l-linux-musl-gcc and set your C and CPP flags such that they point into the lib/ and include/ dirs with your other ARM stuff in it. For example, if you prepare a rootfs for your raspberry pi in /tmp/piroot, you'd probably set up your compiler-related environment vars as following:

CC=armv7l-linux-musl-gcc
CPPFLAGS=-isystem /tmp/piroot/include
LDFLAGS=-L/tmp/piroot/lib

In compiler jargon, the armv7l-linux-musl prefix to your toolchain name is the so-called triplet. All components of your toolchain are prefixed with it, for example the ar archiver is called armv7l-linux-musl-ar, the same applies for as, ld, ranlib, strip, objdump, etc.

In Autoconf-based build systems, you pass the triplet as --host=armv7l-linux-musl to ./configure, whereas Makefile-only based systems usually use a CROSS_COMPILE environment variable, which is set to triplet plus a trailing dash, e.g. CROSS_COMPILE=armv7l-linux-musl-. In your own build system, you should follow the GNU autoconf convention though.

What makes cross-compilation a bit tricky is

The Build Process

If you design a build system from scratch, keep in mind that your users probably don't want to spend a lot of time learning about your system. They simply want to get the process done as painlessly and quickly as possible (which implies that the build system itself should have as little external dependencies as possible).

Please do respect existing conventions, and try to model your build system's user interface after the well-established GNU autoconf standards, because it's what's been around for 20+ years and what the majority of packages use, so it's very likely that the user of your package is familiar with its usage. Also, unlike more hip build tools of the day, their user interface is the result of a long evolutionary process. Autoconf does have a lot of ugly sides to it, but from a user perspective it is pretty decent and has a streamlined way to configure the build.

Step1: Configuration

Before we can start building, we need to figure out a few things. If the package has optional functionality, the user needs to be able to specify whether he wants it or not. Some functionality might require additional libraries, etc. This stage in the build process is traditionally done via a script called configure.

Enabling optional functionality

Your package may have some non-essential code or feature, that might pull in a big external library, or may be undesirable for some people for other reasons.

Traditionally, this is achieved by passing a flag such as --disable-libxy or --without-feature, or conversely --with-feature or --enable-libxy.

If such a flag is passed, the script can then write for example a configuration header that has some preprocessor directive to disable the code at compile time. Or such a directive is added to the CPPFLAGS used during the build.

These flags should be documented when the configure script is being run with the --help switch.

System- or Version-specific behaviour

Sometimes one needs to use functionality that differs from system to system, so we need to figure out in which way the user's system provides it.

The wrong way to go about this is to hardcode assumptions about specific platforms (OS/compiler/C standard library/library combinations) with ifdefs like this:

#if OPENSSL_VERSION_NUMBER >= 0x10100000
/* OpenSSL >= 1.1 added DSA_get0_pqg() */
    DSA_get0_pqg(dsa, &p, &q, &g);
#else
    ...
#endif

This is wrong for several reasons:

The proper way to figure out whether DSA_get0_pqg() exists, is... to actually check whether it exists, by compiling a small testcase using it (more below), and pass a preprocessor flag such as HAVE_DSA_GET0_PQG to the code in question.

Even worse than the above hardcoded version number check is when people assume that a certain C library implementation, for example musl, have a certain bug or behaviour or lack a certain function, because at the time they tested it that was the case. If a __MUSL__ macro would exist , they would just hardcode their assumption into the code, even though the very next version of musl might have fixed the bug or added the function in question, which would then result in compile errors or even worse, bogus behaviour at runtime.

Checking for headers

You should NEVER hardcode any absolute paths for headers or libraries into your build system, nor should you start searching in the user's filesystem for them. This would make it impossible to use your package on systems with a non-standard directory layout, or for people that need to crosscompile it (more on cross-compilation just a little further down).

The majority of third-party libraries install their headers either into a separate sub-directory in the compiler's default include path (for example /usr/include/SDL/*.h), or if there's only one or two headers directly into the include dir (for example /usr/include/png.h). Now when you want to test for whether the user's system has the libpng headers installed, you simply create a temporary .c file with the following contents:

#include <png.h>
typedef int foo;

and then use $CC $CPPFLAGS $CFLAGS -c temp.c and check whether the command succeeded. If it did, then the png.h is available through either the compiler's default include directory search paths, or via a user-supplied -I incdir statement which he can provide if his libpng is installed in a non-standard location such as $HOME/include.

Note that this approach is cross-compile safe, because we didn't need to execute any binary.

If you want to use headers of a library such as SDL that installs a number of headers into a subdir, you should reference them in your code via #include <SDL/SDL.h> and not #include <SDL.h>, because the latter will require the addition of -I path include search path directives.

Checking for functions in libraries

After you've established that the user has libpng's headers installed, you might want to check whether it links correctly and whether it provides a certain function you're using (though testing for this only makes sense if the function is a recent addition).

Again, you check this by writing a temporary .c file, that looks roughly like:

#include <png.h>
int main() {png_set_compression_buffer_size(0, 0);}

the command to test it is: $CC $CPPFLAGS $CFLAGS temp.c -lpng $LDFLAGS.

If the command succeeds, it means that one of libpng.a/.so is available in the compiler's default library search path, (or in some -L directive the user added to his LDFLAGS) and that it contains the function png_set_compression_buffer_size. The latter is established by using a main() function, which forces the linker to fail on missing symbols (also note the omission of -c).

If your aim is only to test whether the libpng library is installed, the test can be written as:

#include <png.h>
int main() {return 0;}

and compiled exactly as the previous. Note that this test actually checks that both the header exists AND the library, so by using this kind of test you don't actually need to test for header and library separately. Again, we merely compiled the testcase and didn't need to execute it.

Pkg-config and derivates

For simple libraries such as zlib you should always try first whether you can simply link to e.g. -lz. If that doesn't work, you can fall back to a tool called pkg-config or one of its clones such as pkgconf, which is widely used. The path to the tool is user provided via the environment variable PKG_CONFIG. If not set, the fall-back is to use pkg-config instead. It can be used like this:

$PKG_CONFIG --cflags gtk+-2.0

This will print a couple of -I include directives that are required to find the headers of gtk+2.

Likewise

$PKG_CONFIG --libs gtk+-2.0

can be used to query the LDFLAGS required for linking gtk+2. Note that by default, pkg-config looks into $(prefix)/lib/pkgconfig, which is not compatible with crosscompilation.

2 solutions exist to make pkg-config compatible with cross-compilation:
Now comes the bummer:

The authors of some packages wrote their own package specific pkg-config replacement, reasoning unknown. For example, on my machine the following proprietary -config programs exist: allegro-config, croco-config, curl-config,freetype-config, gpg-error-config, icu-config, libpng-config, pcap-config, pcre-config, python-config, sdl-config, xml2-config ...

What they all have in common is that they do things differently and they are not cross-compile compatible. Usually, whenever one of them is being used by a build system, cross-compilation breakage follows. Because these tools simply return the include and library directories of the host.

Unfortunately, the authors of some of these programs refuse to write portable pkg-config files instead. OTOH, most of them require no special include dirs, and their --libs invocation simply returns -lfoo. For those few that don't (the worst offenders are apr-1-config tools from Apache Foundation), as a build system author, I suppose, the only correct way to deal with them is to not use them at all, but instead force the user to specify the include and library paths for these libraries with some configuration parameters. Example: --apr-1-cflags=-I/include/apr-1

Checking for sizes of things

In some rare cases, one needs to know e.g. the size of long of the toolchain target at compile time. Since we cannot execute any testbinaries that would run e.g.

printf("%zu\n", sizeof(long));

and then parse their output because we need to stay compatible with cross-compilers, the proper way to do it is by using a "static assertion" trick like here:

/* gives compile error if sizeof(long) is not 8 */
int arr[sizeof(long) == 8 ? 1 : -1];

Compile the testcase with $CC $CPPFLAGS $CFLAGS -c temp.c.

Another way is to run e.g.

$CC $CPPFLAGS -dM -E - </dev/null | grep __SIZEOF_LONG__

This command (without the piped grep) makes GCC and derivates spit out a list of built-in macros. Only GCC and Clang based toolchains that came out during the last couple years support this though, so the static assert method should be prefered.

Checking for endianness

Unfortunately, varying platforms have provided endianness test macros in different headers. Because of that, many build system authors resorted to compiling and running a binary that does some bit tricks to determine the endianness and print a result.

However since we cannot run a binary as we want to stay cross-compile compatible , we need to find another way to get the definition. I've actually spent a lot of effort by trying dozens of compiler versions and target architectures and came up with a public domain single-header solution, that has portable fallback functions that can do endian conversions even if the detection failed, although at a slight runtime cost.

I would advise its usage, rather than trying to hack together a custom thing.

Checking for bugs and similar things

I've also come across a number of checks that required to run a testcase and therefore prevented crosscompilation from working. Mostly, these are tests for a certain bug or odd behaviour. However, it is wrong to assume that because the system the test binary currently runs on has a certain bug, the end user's system will have the same bug. The binary might for example be distributed as a package, and might suddenly start misbehaving if another component that fixes the bug is updated. Therefore the only safe and correct way to deal with this situation is to write a check that's executed when the binary is used at runtime, and then sets a flag like bug=1; and then have two different codepaths, one for a system with the bug and one for a system without it.

Cross-compile specific configuration

In GNU Autoconf, the way to tell it that you're cross-compiling is by setting a --host=triplet parameter with the triplet of the target toolchain, additional to putting the crosscompiler name into the CC environment variable. The triplet is then used to prefix all parts of the toolchain, like

RANLIB=$(triplet)-ranlib
STRIP=$(triplet)-strip

etc. For the build host, there's also a parameter called --build=triplet . If not set, the configure process will try whether gcc or cc is available, and then use that. If set, all toolchain components targeting the host you're on will be prefixed with this triplet. It can be queried by running $CC -dumpmachine. Usually, it is not necessary to set it.

Checking for the target OS

As mentioned it's hugely preferable to test for functionality rather than platform. But if you really think it's necessary to figure out the target OS, do not use uname which is totally bogus. It simply returns the OS of the compiler user, who might use an Apple computer but cross-compile for NetBSD.

You can instead derive the target OS via $CC -dumpmachine, which returns the toolchain target triplet, or by parsing the output of

$CC $CPPFLAGS -dM -E - </dev/null

Configuring paths

Knowledge about system paths is required for 2 reasons. One is that during the Installation stage we need to know where files like the compiled program binary need to be installed in. The other is that our program or library might require some external data files. For example, the program might require a database at runtime.

For this reason, a --prefix variable is passed to the configure step. On most typical linux installations --prefix=/usr would be used for a system install, whereas --prefix=/usr/local is typically used for an alternate installation from source of a package the distribution provides but for some reason is not sufficient for the user. Sabotage Linux and others use an empty prefix, i.e. --prefix=, which means that for example binaries go straight into /bin and not /usr/bin, etc. Many hand-written configure scripts get this wrong and treat --prefix= as if the user hadn't passed --prefix at all, and fall back to the default. The default, btw is traditionally /usr/local.

So in case your program needs a database, let's say leetpackage.sqlite, you would probably hardcode the following db path into your binary:

#define DB_PATH PREFIX "/share/leetpackage/leetpackage.sqlite"

where PREFIX would be set as part of CPPFLAGS or similar according to the user's selection. For more fine-grained control, traditional configure scripts also add options like --bindir, --libdir, --includedir, --mandir, --sysconfigdir, etc additional to --prefix, which, if not set, default to ${prefix}/bin, ${prefix}/lib, ${prefix}/include etc.

More on paths in the Installation chapter.

Step 2: The build

After the configuration step finished, it should have written the configuration data in some form, either a header, or a Makefile include file, which is then included by the actual Makefile (or equivalent). This should include any previously mentioned environment variables, so it is possible to login in a different shell session without any of them set, yet getting the same result when running make. Some users of GNU autotools create the Makefile from a template (usually called Makefile.in) at the end of the configure run, but I personally found this to be really impractical, because when making changes to the Makefile template, configure has to be re-run every single time. Therefore I recommend writing the settings into a file called config.mak, which is included by the Makefile.

The actual compilation is typically run by executing make, which on most systems defaults to GNU make, which is a lot more powerful than the traditional BSD makes. Its code is small and written in portable C, so it's easy to get it bootstrapped quickly on systems that don't have it yet, unlike competitors such as CMake, which is 1) written in C++ which takes a lot longer to parse than C, and 2) consists of > 1 million lines of code and 3) occupies a considerable amount of HDD space once installed. Anyway, GNU make can even be found pre-installed on the BSDs, it's called gmake there.

Here, the following conventions apply:

If a Makefile is used for building, the build process should be tested using several parallel processes, because failure to document dependencies of files properly often results in broken parallel builds, even though they seem to work perfectly with -j1.

Do note that you should not strip binaries, ever. If the user wants his binaries stripped, he will pass -s as part of his LDFLAGS.

Step 3: Installation

The Installation is typically done using the make install command. Additionally there's an important variable that distro maintainers use for packaging: DESTDIR.

If for example, at configure time, --prefix=/usr was set, then make install DESTDIR=/tmp/foo should cause stuff to be installed into /tmp/foo/usr, so if your package compiles a binary called myprog, it should end up in /tmp/foo/usr/bin/myprog. A typical install rule would look like this:

bindir ?= $(prefix)/bin

...

install: myprog
    install -Dm 755 myprog $(DESTDIR)$(bindir)/myprog

here we use the install program to install the binary myprog to its destination with mode 755 (-m 755) and create all path compontens along the way (-D). Unfortunately, the install program shipped with some BSDs and Mac OS X refuse to implement these practical options, therefore this portable replacement implementation can be used instead.

It is a good idea and the common practice to explicitly set the permissions during the install step, because the user doing the installation might unwittingly have some restrictive umask set, which can lead to odd issues later on.

Even if the build system you intend to write does not use Makefiles, you should respect the existing conventions (unlike CMake & co which NIH'd everything) like V=1, -j8, DESTDIR, --prefix, etc.

Closing thoughts

One of the big advantages of GNU's autotools system is that, from a user's perspective, they require nothing more than a POSIX-compatible shell to execute configure scripts, and GNU make, which as already mentioned is really slim, written in portable C, and widely available while requiring less than one MB of HDD space (my GNU make 3.82 install takes 750KB total including docs).

So in my opinion, the build system of the future, in whatever language it's written in, and how many millions of lines of code it consists of, should do precisely the same: it should at least have the option to generate a configure script and a stand-alone GNU Makefile, which is shipped in release tarballs. That way only the developers of the package need the build toolkit and its dependencies installed on their machine, while the user can use the tools he already has installed, and can interface with the build system in a way he's already familiar with.

Update

19 Apr 2019 19:34 UTC - Added paragraph "Checking for the target OS"

Post or read comments...

benchmarking python bytecode vs interpreter speed and bazaar vs git

07 Apr 2019 00:39 UTC

A couple weeks ago, after an upgrade of libffi, we experienced odd build errors of python only on systems where python had previously been installed with an older libffi version:

error: [Errno 2] No such file or directory: '/lib/libffi-3.0.13/include/ffi.h'

There was no reference to libffi-3.0.13 anywhere in the python source, and it turned out that it was contained in old python .pyc/.pyo bytecode files that survived a rebuild due to a packaging bug, and apparently were queried as authorative during the python build.

/lib/python2.7/_sysconfigdata.pyc:/lib/libffi-3.0.13/include
/lib/python2.7/_sysconfigdata.pyo:/lib/libffi-3.0.13/include

The packaging bug was that we didn't pre-generate .pyc/.pyo files just after the build of python, so they would become part of the package directory in /opt/python, but instead they were created on first access directly in /lib/python2.7, resulting in the following layout:

$ la /lib/python2.7/ | grep sysconfigdata
lrwxrwxrwx    1 root     root            48 Mar  4 03:11 _sysconfigdata.py -> ../../opt/python/lib/python2.7/_sysconfigdata.py
-rw-r--r--    1 root     root         19250 Mar  4 03:20 _sysconfigdata.pyc
-rw-r--r--    1 root     root         19214 Jun 30  2018 _sysconfigdata.pyo

So on a rebuild of python, only the symlinks pointing to /opt/python were removed, while the generated-on-first-use .pyc/.pyo files survived.

Annoyed by this occurence I started researching how generation of these bytecode file could be suppressed, and it turned out that it can be controlled using a sys.dont_write_bytecode variable, which in turn is set from the python C code. Here's a patch doing that.

However, before turning off a feature that can potentially be a huge performance boost, a responsible distro maintainer needs to do a proper benchmarking study so he can make an educated decision.

So I developed a benchmark, that runs a couple of tasks using the bazaar VCS system, which is written in python and uses a large amount of small files, so the startup overhead should be significant. The task is executed 50 times, so small differences in the host's CPU load due to other tasks should be evened out.

The task is to generate a new bazaar repo, check 2 files and a directory into bazaar in 3 commits, and print a log at the end.

With bytecode generation disabled, the benchmark produced the following results:

real    3m 15.75s
user    2m 15.40s
sys     0m 4.12s

With pregenerated bytecode, the following results were measured:

real    1m 24.25s
user    0m 20.26s
sys     0m 2.55s

We can see, that in the case of a fairly big application like bazaar with hundreds of python files, the precompilation does indeed make a quite noticable difference. It is more than twice as fast.

What's also becoming apparent is that bazaar is slow as hell. For the lulz, I replaced the bzr command in the above benchmark with git and exported PAGER=cat so git log wouldn't interrupt the benchmark. As expected, git is orders of magnitude faster:

real    0m 0.48s
user    0m 0.02s
sys     0m 0.05s

Out of curiosity, I fiddled some more with python and added a patch that builds python so its optimization switch -O is always active, and rebuilt both python and bazaar to produce only .pyo files instead of .pyc. Here are the results:

real    1m 23.88s
user    0m 20.18s
sys     0m 2.54s

We can see that the optimization flag is next to useless. The difference is so small it's almost not measurable.

Now this benchmark was tailored to measure startup compilation cost for a big project, what about a mostly CPU-bound task using only a few python modules?

I modified a password bruteforcer to exit after a couple thousand rounds for this purpose, and ran it 30x each without bytecode, with .pyc and .pyo each.

Here are the results:

No bytecode:

real    3m 50.42s
user    3m 50.25s
sys     0m 0.03s

.pyc bytecode:

real    3m 48.68s
user    3m 48.60s
sys     0m 0.01s

.pyo bytecode:

real    3m 49.14s
user    3m 49.06s
sys     0m 0.01s

As expected, there's almost no difference between the 3. Funnily enough, the optimized bytecode is even slower than the non-optimized bytecode in this case.

From my reading of this stackoverflow question it appears to me as if the .pyo bytecode differs from regular bytecode only in that it lacks instructions for the omitted assert() calls, and eventually debug facilities.

Which brings us back to the original problem: In order to have the .pyc files contained in the package directory, they need to be generated manually during the build, because apparently they're not installed as part of make install. This can be achieved by calling

./python -E Lib/compileall.py "$dest"/lib/python2.7

after make install finished. With that achieved, i compared the size of the previous /opt/python directory without .pyc files with the new one.

It's 22.2 MB vs 31.1MB, so the .pyc files add roughly 9MB and make the package almost 50% bigger.

Now it happens that some python packages, build scripts and the like call python with the optimization flag -O. this causes our previous problem to re-appear, now we will have stray .pyo files in /lib/python2.7.

So we need to pregenerate not only .pyc, but also .pyo for all python modules. This will add another 9MB to the python package directory.

OR... we could simply turn off the ability to activate the optimised mode, which as we saw, is 99.99% useless. This seems to be the most reasonable thing to do, and therefore this is precisely what I now implemented in sabotage linux.

Post or read comments...

the rusty browser trap

06 Apr 2019 11:55 UTC

If you're following sabotage linux development, you may have noticed that we're stuck on Firefox 52esr, which was released over a year ago. This is because non-optional parts of Firefox were rewritten in the "Rust" programming language, and all newer versions now require to have a Rust compiler installed.

And that is a real problem.

The Rust compiler is written in Rust itself, exposing the typical hen-and-egg problem. Its developers have used previous releases in binary form along the path of evolution of the language and its compiler. This means in practice that one can only build a rust compiler by using a binary build supplied by a third party, which in turn basically means that one has to trust this third party. Assuming that the binary actually works on one's own system.

As sabotage linux is based on musl, the latter is not self-evident.

Traditionally, the only binary thing required to bootstrap sabotage linux was a C compiler. It was used to build the stage0 C compiler, which was then used to build the entire system. A sabotage user can have high confidence that his OS does not contain any backdoors in the userland stack. Of course, it's impossible to read all the millions of lines of code of the linux kernel, nor is it possible to know the backdoors inside the CPU silicon or in the software stack that runs on the BIOS level or below. Still, it is a pretty good feeling to have at least a trustworthy userland.

So Rust developers want you to slap a binary containing megabytes of machine instructions on your PC and execute it.

If we assume for one moment that we are OK with that, the next problem is that we now need a different binary for every architecture we support. There's no mechanism in sabotage that allows to download a different thing per-architecture. All existing packages are recipes on how to build a piece of software from source, and that's done with the identical sources for all platforms.

Additionally, Rust doesn't actually support all architectures we support. It's a hipster thing, and not a professional product. And the hipsters decided to support only a very small number of popular architectures, such as AMD64 and x86. Others are either not supported at all, or without guarantee that it'll work.

So even if we embrace Rust, there will be some architectures that can't have a working Firefox - ever?

Now somebody who probably likes Rust, decided he wants to write a compiler for it in C++, so people can use it to bootstrap from source. However, he targets a pretty old version of it, so in order to get a version compiled that's recent enough to build Firefox's sources, one needs to build a chain of 12+ Rust versions. A member of our team actually embarked on this voyage, but the result was pretty disillusioning.

After our team member spent about 3 nights on this endeavour, he gave up, even though we had support from somebody of "adelie linux", who went throught the entire process already. unfortunately, that person didn't take any step-by-step notes, there's only a repository of mostly unsorted patches and other files and a patched version of rust 1.19.0 to start with. (Here's a blog post from adelie linux authors about rust, btw).

So could it be done? Most likely yes, but it would require me to spend about 2 estimated weeks of work, digging in the C++ turd of LLVM and Rust. Certainly not anything I would like to spend my time on. Unlike the people from adelie linux, my goal is not to create a single set of bootstrap binaries to be used in the future, but package recipes, so a user can build the entire set of rust versions from source. Building them all will probably require almost two full days of CPU time on a very fast box, so this is something not everybody can even afford to do.

So from my point of view, it looks pretty much as if Firefox is dead. By choosing to make it exclusive to owners of a Rust compiler, mozilla chose to make it hard-to-impossible for hobbyists and source code enthusiasts like myself to compile their browser themselves.

Not that it was easy in the past either, every version bump required about a half day of effort to fix new issues, introduced in this giant pile of C++ copy-pasted from dozens of differents projects, and held together by a fragile build system mix of python, shell, perl, ancient autoconf etc etc...

None of those upstream sources were ever tested on musl-based linux systems by their developers, and sabotage's unconventional filesystem layout adds yet another layer of possible breakage especially regarding the python virtualenv based build system.

So, Firefox is dead. What's the alternative?

Chromium? Possibly, but it's a clusterfuck itself. The source tarball is about 0.5 GB compressed. and requires 2+GB hdd space just to unpack the sources, and probably another 5 GB for temporary object files during the build. And it will takes hours and hours to build, if you even have enough RAM. That's not really compatible with a hobbyist project, besides the numerous privacy issues with this browser.

The only viable option left might be a webkit based browser or palemoon, a fork of firefox without rust.

I even considered for a while to run a QEMU VM with ReactOS with a binary windows-based precompiled browser, but funnily enough, around the same time mozilla started giving the boot to open-source enthusiasts by requiring Rust, they also removed support for Windows XP. And subsequently for ReactOS, since it is based on the Win2K3 API.

So the future looks pretty grim. We need to invest a lot of work trying to get Palemoon to compile, and hopefully it will stay rust-free and usable for a couple more years. If not, we will be forced to run a VM with a bloated GLIBC-based linux distro and the full X11 stack, just to run a browser.

Because unfortunately, without an up-to-date browser, a desktop system is almost worthless.

Post or read comments...

Earlier posts