I just committed a little binary analysis tool to moxiedev. You can use it to perform simple static analysis on moxie binaries. The kinds of things I’m looking for are compiler bugs (because I know there’s still one there that is triggered by -frerun-cse-after-loop), and instruction statistics. For instance, which registers are used as load offsets, and how often? The tool uses a primitive plugin architecture that should make it easy to add new analysis tools in the future. It’s called the moxielyzer, and here is the initial commit. Run it with no arguments to get a list of plugins. Run it with just a plugin name, and it will describe the plugin. Run it with a plugin name as well as an ELF moxie executable filename, and the analysis will be performed.
I had written a similar tool for ggx back in the bad old days. Another option was to hack this stuff into gas, but I prefer to keep gas “clean” (translation: I want the freedom to maintain hacky analysis code).
BTW – I’m also rolling out a new libffi in a few weeks. You can keep track of the release candidate test results on the wiki here.
This gets us to booting the kernel, loading BusyBox, running some shell code and… crashing on the first fork. No problemo. Nothing a small matter of programming can’t fix. However, there are some other distractions…
Verilog is lots of fun! It looks like regular programming, but it feels more like building a kinetic sculpture.
There’s also the small matter of not having an interrupt controller! So there’s some work here to design an interrupt controller, implement it in verilog, simulate it qemu (and possibly the gdb sim), and port the kernel over to using it. This should be interesting…
Tonight I got a hello world app to use uClibc’s puts() routine! This is a big deal because it’s the first time I’ve had system calls coming in from userland. I haven’t checked the changes in yet, because they’re a mess, but here’s a basic run-down of what I had to do…
First, uClibc had to be taught how to make system calls to the moxie uClinux kernel. This was straight forward, except I came across one surprise which I’ll describe below.
Next, I needed to add more files to my initfs. Specifically, I needed a /dev/console. Fortunately, the kernel build process makes this easy. I decided to use the “text file” approach to populating the initramfs as described in this document.
Finally, I had to create a tty device for my default console that spoke through the gdb simulator via software interrupts. Fortunately the ia64 port had a similar tty device for talking through one of HP’s simulators that I was able to mostly copy.
Once all this was done, I was able to build a standard Hello World app with moxie-uclinux-gcc, and it just worked!
What about the system call surprise? Despite what I read somewhere that said that Linux system calls had a maximum of 5 parameters — that’s not quite true. Some take 6 (are there any with 7? more?). This thwarted my attempt to get busybox running tonight, because it uses mmap, and mmap is one of those 6-argument system calls. There are a few ways to fix this. I think I’ll just hack the compiler to use 6 register arguments and see what that does to code size/performance.
If there are any GDB hackers reading this… I have one question for you… The kernel is loading and relocating my “init” program, then execve’ing it. When I run the kernel in gdb, it would be nice for gdb to load the debug info for init so I could see what it’s doing when I step into userland. Is there some way to do this manually?
I’ve been taking advantage of the nice summer weather recently, so it’s taken me a while to get around to this… but here’s the first moxie userland app!
#include <string.h>
#define MSG "Hello, World!\n"
void __attribute__((noinline)) gloss_write (int fd, char *ptr, int len)
{
asm("swi 5"); // "write" via the gdb simulator
}
int main()
{
while (1)
gloss_write (0, MSG, strlen(MSG));
return 0;
}
If you build this with moxie-uclinux-gcc, name it init and point the linux kernel build machinery at it, you’ll get a kernel that boots, loads the init BFLT binary from a ramfs, and performs an execve system call on it! The program loops forever, printing “Hello, World!” via the gdb simulator IO interrupt because I haven’t fixed up uClibc to perform system calls yet. Baby steps, my friends! Baby steps! We will get there!
The main bit of work needed to get this going was to fix up the software interrupt handler for system calls. I’m saving registers in a pt_regs struct just prior to calling the execve system call. execve then manipulates these saved registers so we end up running the newly exec’d program when we “return” from the system call. This was all done in linux-2.6/arch/moxie/kernel/exception_handler.S, which you can see here.
Next, I’ll get uClibc to make system calls into the kernel so we can try the same program with libc.a’s puts().
Sooo….. it turns out there’s lots to take care of before userland apps like BusyBox can run.
The root filesystem. This one is easy. I just built a short Hello World application in C with moxie-uclinux-gcc. This produces an executable in BFLT format which I call ‘init’. The kernel build machinery takes this and produces a compressed root filesystem image linked to the vmlinux binary. The good news is that the kernel is able to boot, detect this initramfs, decompress it and load the init executable (which involves fixing up all of init’s relocations). My Hello World doesn’t actually use the C library or any system calls. It just writes Hello through direct communication with the simulator via our software interrupt (swi) instruction. I thought this would let me avoid dealing with system calls for now. I was wrong…
System calls. This one is harder. Obviously (in retrospect!) the kernel creates the init process via the execve system call. Implementing system call support involves lots of platform dependent stuff. For instance, how do we invoke system calls? How are parameters passed? How do we switch back and forth between userland and the kernel? The first question is easy: I’ll use our trusty software interrupt (swi) instruction to invoke system calls. This means creating an exception handler and installing it as described in this old post. As an aside, the swi instruction takes a 32-bit immediate operand. We currently use this to identify calls to the simulator via libgloss. This works well for escaping to the simulator, but isn’t the best way to identify system calls to the kernel. The Linux kernel is going to ignore this operand, and we’ll pass the system call ID in a register instead. This avoids us having to do complex instruction decoding in the exception handler processing the interrupt (also trashing any future data cache). Libgloss and the sim only need a small number of IDs, so I’m going to chop the swi instruction down from 48-bits to 16-bits in a future build of the tools. Passing arguments to the system calls was also interesting to sort out…
System call argument passing. The moxie ABI currently only has two registers being used to hold function arguments. The remaining arguments must live on the stack. This decision goes back to when we only had 8 registers to play with. It turns out that Linux kernel system calls can have a maximum of 5 arguments. In order to avoid tricky argument marshaling, I’ve decided to try changing the general ABI accordingly, so that up to 5 registers may be used to hold function arguments. This involves changes to the compiler, debugger and a smattering of assembly language in libgloss. The great thing about having integrated benchmarks into the moxiedev environment is that you can easily compare before and after performance for ABI changes like this. Running “ant benchmark” runs through the MiBench benchmark suite and saves a nice report for easy comparison. It turns out that switching from 2 to 5 register arguments is almost universally a win in terms of both code size and instruction trace length (an approximation of run time). The consumer jpeg benchmarks were slightly larger and slower, but only by less than 1%. Every other benchmark result was slightly better. The one outlier was the “network_dijkstra” benchmark which ended up 44% “faster” (44% fewer instructions being executed).
The first real moxie compiler bug. Sometimes things just don’t work! This is especially true when you’re tracking the bleeding edge from upstream. I won’t go into the details, but I discovered a rare bug in the compiler where it would assume that compare results could live across function calls. Fortunately I was able to track down the guilty compilation pass and disable it with -fno-rerun-cse-after-loop. I know that some people have brought up kernels without the benefit of a nice debugger, but I just don’t see how that is possible. The simulator, and a solid gdb port with reverse debugging capabilities have proven to be invaluable!
There’s still lots to figure out and implement in the system call space, but it’s clear that we’re getting very close to running our first Linux program!
Before we can start building BusyBox, we need a few more bits of technology…
uClibc: this is a popular embedded C library, like newlib, but used more often in Linux environments. I ported uClibc to the moxie core just like every other bit of software in this project: quickly! My strategy has always been to make things link as quickly as possible, and then sort out the details later. This seems to be a workable strategy in the presence of good testsuites and the like.
elf2flt: this utility turns moxie ELF binaries into the “Binary Flat” (BFLT) format currently required by my Linux port. The BFLT format is required because: (a) we don’t have an MMU yet, so there’s a single address space for the kernel and all applications, and (b) my moxie tools port doesn’t yet support something like the FR-V’s FDPIC ABI that would allow for proper shared library support in the absence of an MMU. elf2flt ends up wrapping the installed linker, so builds actually produce BFLT binaries without any extra step.
a moxie-uclinux toolchain: I build this from the same sources as the moxie-elf toolchain, but with a sysroot containing the kernel and uClibc header files.
This is all built and committed to moxiedev, which means that you can check it out and build it yourself with a single “ant build”. I haven’t tried using it yet, and I know it will fail in its current state. The next step is to build BusyBox with the moxie-uclinux toolchain and create an initramfs that we can link directly to the kernel binary. That’s when the debugging fun begins…
I’ve spent a lot of time in airports/planes/hotels recently, which is good news for the moxie linux port. It runs about 6.5M instructions, booting up to the point where a couple of kernel threads are created. However, a few context switches later it all comes tumbling down. I didn’t have any of my kernel books with me, so I stopped hacking at that point rather than try to guess/decode how some of the internals are supposed to work.
My port is using a device tree to describe the system architecture. This makes it easier to build a single kernel image that can boot on multiple moxie implementations. There’s a good paper on this relatively new infrastructure here: http://ols.fedoraproject.org/OLS/Reprints-2008/likely2-reprint.pdf. If you’ve been following this project, you may recall that console I/O is implemented differently on the gdb and qemu simulators. For the gdb simulator we use a software interrupt instruction (swi) to escape to the simulator, but the qemu port uses a real simulated serial device. This means they need different console devices in the kernel to print boot messages. The device tree is a nice way to describe differences like this and have a single kernel image to boot in both environments.
Also, as predicted, I actually used moxie gdb’s reverse debugging feature to help debug my kernel bring-up. It was really useful a couple of times and has probably saved me the amount of effort required to implement it in the first place already!
The next week is going to be very busy for me, so I don’t expect to get much done. We’ll see…
I’ve just checked the start of the kernel port into moxiedev. Running “ant build” will produce tools, simulators, u-boot and now a vmlinux you can run with moxie-elf-run or in gdb. It crashes on startup right now, but that’s to be expected. I just got it to the point where it links. More to come…
A few weeks ago I happened to be in Palo Alto and met up with my friend and long-time GDB hacker Michael Snyder. He told me about a new feature in GDB called “process recording”. The basic idea is that when you tell GDB to enter into “record mode”, it records undo information for every instruction executed during the debug process. This lets you switch direction and start stepping through your code backwards in time! It’s a pretty amazing feature.
I was anxious to implement it for moxie, but only got around to it this weekend. The moxie ISA is relatively small, so it wasn’t much work. The patch looks something like this. And, as promised, you can now step forwards and backwards through moxie code. Reverse “continue” and “finish” also work. It’s going to be really handy when I get back to working the Linux kernel port.
Some GDB front-ends already have the controls in place for reverse debugging. Here’s a webinar showing reverse debugging on Eclipse. I mostly use Emacs as my moxie-elf-gdb frontend, but I’m not sure if it supports the reverse instructions nicely yet (of course you can “set exec-direction reverse” and use the normal step/next/continue commands).
Thanks to Micheal for pointing me at this new feature, and to Tea Water for implementing process recording in the first place.
UPDATE: Emacs support for reverse debugging should be arriving in 23.2. I’m not sure what the schedule for that is, but 23.1 is supposed to come out next week (July 22).
+ src/gdb/.#dwarf2read.c.1.353 + src/gdb/.#moxie-tdep.c.1.8 m src/gdb/29k-share/CVS/Entries m src/gdb/CVS/Entries m src/gdb/ChangeLog m src/gdb/amd64fbsd-nat.c m src/gdb/coffread.c m src/gdb/config/CVS/Entries m src/gdb/darwin-nat-info.c m src/gdb/darwin-nat.c m src/gdb/dbxread.c m src/gdb/defs.h m src/gdb/doc/CVS/Entries m src/gdb/doc/ChangeLog m src/gdb/do […]
m src/CVS/Entries m src/ChangeLog m src/Makefile.def m src/Makefile.in m src/Makefile.tpl m src/bfd/CVS/Entries m src/bfd/ChangeLog m src/bfd/Makefile.am m src/bfd/Makefile.in m src/bfd/aout-target.h m src/bfd/aout-tic30.c m src/bfd/archive.c m src/bfd/bfd-in.h m src/bfd/bfd-in2.h m src/bfd/coff-alpha.c m src/bfd/coff-rs6000.c m src/bfd/coff64-rs6000.c m src […]
m BUILDNUM m ChangeLog m scripts/mkrpms.sh m scripts/moxie-elf-gcc.spec.in m scripts/moxie-elf-gdb.spec.in m scripts/moxie-elf-newlib.spec.in m scripts/moxie-elf-qemu.spec.in More RPM build fixes. […]