Devs on Acid

Restoring accidentally deleted files on Linux

02 May 2019 22:27 UTC

Doh. Through a tiny bug in a Makefile auto-generated by my build system rcb2, I accidentally deleted the C source file I've been working on for almost an hour, and which wasn't checked into git yet.

Fortunately, I know the basic steps to restore a file* in a filesystem-agnostic way.

These are:

First of all though, I sent a SIGSTOP signal to firefox, the most volatile process on my desktop, to prevent it from writing any files onto my harddisk while the restoration was in process, potentially overwriting the blocks occupied by the deleted file. I did this via an extension I wrote for my Window Manager openbox, which adds a menu item "Pause and iconify" to the popup menu on the titlebar of all windows. I usually use this to prevent Firefox from consuming CPU and draining my laptop's battery, while I'm traveling. Other than that, there's almost nothing running on a typical sabotage linux box which could interfere via constant disk writes, unlike GNU/Linux systems with systemd and a gazillion of background daemons installed.

Then I opened /dev/mapper/crypt_home, the blockdevice containing my /home filesystem in my favorite hexeditor, went on the ascii tab on the right side, and started a search for a string I knew was only in that new C file, which was <openDOW/vec2f.h> since I used that file in a hackish way via an include statement.

After hitting ENTER in hexedit's search dialog, CPU usage went to 100%, and it slowly crunched its way through the encrypted harddisk's blockdevice mapper. I left my computer to brew a coffee, and came back after about 5 minutes. From the current offset displayed, I figured that the search was currently only 40GB into the blockdevice. Many more GBs to go, since the file could be at the very end of the SSD. After another break of about 10 mins, I was lucky enough and the string was found at offset 0x13c6ffa0ab, at about 84 GB into the blockdevice.

Using pageup/pagedown in hexedit, the beginning and end offsets of the source file were quickly found. They were 0x13C6FF1FFC and 0x13C6FFB472, respectively.

dd if=/dev/mapper/crypt_home of=/dev/shm/dump bs=1 skip=$((0x13C6FF1FFC)) count=$((0x13C6FFB472 - 0x13C6FF1FFC))

did the rest to restore the file onto /dev/shm, the ramdrive.

Since my SSD is usually a lot faster than this, I decided to write a program to speed up future searches. The plan is simple, one needs to read from the filesys in large chunks, so the time spent in syscalls is neglible, and then search over the memory chunks using an optimized algorithm that compares word-at-a-time, just like musl's memmem() function does. Plus some more logic to find the searchterm even across chunk boundaries. The result can be found here in a small C program.

And indeed, it is a lot faster than hexedit.

# time ./fastfind /dev/mapper/crypt_home '<openDOW/vec2f.h>'
curr: 0x13498f0000
bingo: 0x13c6ffa0ab
^CCommand terminated by signal 2
real    1m 4.26s
user    0m 20.35s
sys     0m 19.38s

at 64 seconds total, it crunched through the blockdevice at a rate of 1.2GB/sec, at least 10x faster than hexedit.

So for future undelete tasks, my fastfind utility will become the first stop, to find an offset, which will then be followed by my good old friend hexedit to find beginning and end position in the neighbourhood of that offset, and to be finished off with dd.

*: This approach works well for smaller files, whereas bigger ones are usually spread over several non-adjacent blocks.