Devs on Acid

sqlite's anal gamation

25 Sep 2013

sqlite's slogan: "Small. Fast. Reliable. Choose any three."

i always wondered though, how such a small or "lite" package can take such a considerable amount of time to build.

as the main author of the sabotage linux distribution, building software is my daily bread, so i own a pretty fast build box. it's an 8 core machine with 3.1 GHz, which builds a complete 3.11 linux kernel in less than 5 minutes, making use of all 8 cores via the nice parallel build feature of GNU make.

make -j8

when invoking make like this, it first determines the dependencies between the translation units, and then runs up to 8 build processes, one per cpu core, each one building a different .c file.

GCC 3.4.6, a C compiler with full C99 support builds in 43 sec:

$ time butch rebuild gcc3
2013.09.25 12:13:50 building gcc3 (/src/build/build_gcc3.sh) -> /src/logs/build_gcc3.log
2013.09.25 12:14:33 done.
real 0m 43.97s
user 1m 36.66s
sys 0m 13.74s

however, for sqlite, a supposedly small package, build times are comparatively huge:

$ time butch rebuild sqlite
2013.09.25 12:18:27 building sqlite (/src/build/build_sqlite.sh) -> /src/logs/build_sqlite.log
2013.09.25 12:19:21 done.
real 0m 54.03s
user 0m 52.02s
sys 0m 1.51s

nearly one minute, a fifth of the time used to build the linux kernel and 10 seconds more than the gcc compiler.

the full-blown postgresql database server package, takes less time to build as well:

$ time butch rebuild postgresql
2013.09.25 12:19:21 building postgresql (/src/build/build_postgresql.sh) -> /src/logs/build_postgresql.log
2013.09.25 12:19:57 done.
real 0m 36.63s
user 1m 53.34s
sys 0m 12.03s

how is it possible that postgresql, shipping 16 MB of compressed sources, as opposed to 1.8MB of sqlite, builds 33% faster ?

if you look at the user times above, you start getting an idea. the user time (i.e. the entire cpu time burnt in userspace) for postgresql is 1m53, while the total time that actually passed was only 36s.

that means that the total work of 113 seconds was distributed among multiple cpu cores. dividing the user time through the real time gives us a concurrency factor of 3.13. not perfect, given that make was invoked with -j8, but much better than sqlite, which apparently only used a single core.

let's take a look at sqlite's builddir

$ find . -name '*.c'
./sqlite3.c
./shell.c
./tea/generic/tclsqlite3.c
./tea/win/nmakehlp.c

ah, funny. there are only 4 C files total. that partially explains why 8 cores didn't help. the 2 files in tea/ are not even used, which leaves us with

$ ls -la *.c
-rw-r--r-- 1 root root 91925 Jan 16 2012 shell.c
-rw-r--r-- 1 root root 4711082 Jan 16 2012 sqlite3.c

so in the top level builddir, there are just 2 C files, one being 90 KB, and the other roughly 5MB. the 90KB version is built in less than 1 second, so after that the entire time spent is waiting for the single cpu core building the huge sqlite3.c.

so why on earth would somebody stuff all source code into a single translation unit and thereby defeat makefile parallellism ?

after all, the IT industry's mantra of the last 10 years was "parallellism, parallellism, and even more parallellism".

here's the explanation: https://www.sqlite.org/amalgamation.html

it's a "feature", which they call amalgamation.

i call it anal gamation.

In addition to making SQLite easier to incorporate into other projects, the amalgamation also makes it run faster. Many compilers are able to do additional optimizations on code when it is contained with in a single translation unit such as it is in the amalgamation.

so they have 2 reasons for wasting our time:

reason 1: easier to incorporate
reason 2: generated code is better as the compiler sees all code at once.

let's look at reason 1: what they mean with incorporation is embedding the sqlite source code into another projects source tree.

it is usually considered bad practice to embed third-party source code into your own source tree, for multiple reasons:

every program that uses its own embedded copy of library X does not benefit from security updates when the default library install is updated.
we have multiple different versions on the harddrive and loaded in RAM, wasting system resources
having multiple incompatible versions can lead to a lot of breakage when it's used from another lib: for example application X uses lib Y and lib Z, and lib Z uses a "incorporated" version of lib Y. so we have a nice clash of 2 different lib Y versions. if lib Y has global state, it will get even worse.
if the library in question is using some unportable constructs, wrong ifdefs etc., it needs to be patched to build. having to apply and maintain different sets of patches against multiple different versions "incorporated" into other packages, represents a big burden for the packager.

instead, the installed version of libraries should be used.

pkg-config can be used to query existence, as well as CFLAGS and LDFLAGS needed to build against the installed version of the library. if the required library is not installed or too old, just throw an error at configure time and tell the user to install it via apt-get or whatever.

conclusion: "incorporation" of source code is a bad idea to begin with.

now let's look at reason 2 (better optimized code): it possibly sometimes made sense to help the compiler do its job in the 70ies, when everything started. however, it's 2013 now. compilers do a great job optimizing, and they get better at it every day.

since GCC 4.5 was released in 2010, it ships with a feature called LTO it builds object files together with metadata that allows it to strip off unneeded functions and variables, inline functions that are only called once or twice, etc at link time - pretty much anything the sqlite devs want to achieve, and probably even more than that.

conclusion: pseudo-optimizing C code by stuffing everything into a big file is obsolete since LTO is widely available.

LTO does a better job anyway - not that it matters much, as sqlite spends most time waiting for I/O. every user who wants to make sqlite run faster, can simply add -flto to his CFLAGS. there's no need to dictate him which optimization he wants to apply. following this logic, they could as well just ship generated assembly code…

but hey - we have the choice ! here's actually a tarball containing the ORIGINAL, UN-ANAL-GAMATED SOURCE CODE...

... just that it's not a tarball.

it's a fscking ZIP file. yes, you heard right. they distribute their source as ZIP files, treating UNIX users as second-class citizens.

additionally they say that you should not use it:

sqlite-src-3080002.zip (5.12 MiB) A ZIP archive of the complete source tree for SQLite version 3.8.0.2 as extracted from the version control system. The Makefile and configure script in this tarball are not supported. Their use is not recommended. The SQLite developers do not use them. You should not use them either. If you want a configure script and an automated build, use either the amalgamation tarball or TEA tarball instead of this one. To build from this tarball, hand-edit one of the template Makefiles in the root directory of the tarball and build using your own customized Makefile.

Note how the text talks about "this tarball" despite it being a ZIP file.

Fun. there's only a single TARball on the entire site, so that's what you naturally pick for your build system. and that one contains the ANAL version. Note that my distro's build system does not even support zip files, as i don't have a single package in my repo that's not building from a tarball. should i change it and write special case code for one single package which doesn't play by the rules ? i really don't think so.

funny fact: they even distribute LINUX BINARY downloads as .zip. i wonder in which world they live in.

why do i care so much about build time ? it's just a minute after all. because the distribution gets built over and over again. and it's not just me building it, but a lot of other people as well - so the cumulated time spent waiting for sqlite to finish building its 5 MB file gets bigger and bigger each day. in the past i built sqlite more than 200 times, so my personal cumulated wasted time on it already exceeds the amount of time i needed to write this blog post.

so what i hope to see is sqlite

using tarballs for all their sources (and eventually distribute an additional .zip for windows lusers)
using tarballs for their precompiled linux downloads
either getting rid of the anal version entirely, now that they learned about LTO, or offer the anal version as an additional download and do not discourage users from using the sane version.

Update:

I just upgraded sqlite from 3071000 to 3080002

2013.09.27 02:23:24 building sqlite (/src/build/build_sqlite.sh) -> /src/logs/build_sqlite.log
2013.09.27 02:25:31 done.

it now takes more than 2 minutes.

sqlite's anal gamation

conclusion: "incorporation" of source code is a bad idea to begin with.

conclusion: pseudo-optimizing C code by stuffing everything into a big file is obsolete since LTO is widely available.

Update:

this blog post originally appeared on my currently defunct wordpress blog