The Moxielyzer

October 14th, 2009

I just committed a little binary analysis tool to moxiedev. You can use it to perform simple static analysis on moxie binaries. The kinds of things I’m looking for are compiler bugs (because I know there’s still one there that is triggered by -frerun-cse-after-loop), and instruction statistics. For instance, which registers are used as load offsets, and how often? The tool uses a primitive plugin architecture that should make it easy to add new analysis tools in the future. It’s called the moxielyzer, and here is the initial commit. Run it with no arguments to get a list of plugins. Run it with just a plugin name, and it will describe the plugin. Run it with a plugin name as well as an ELF moxie executable filename, and the analysis will be performed.

I had written a similar tool for ggx back in the bad old days. Another option was to hack this stuff into gas, but I prefer to keep gas “clean” (translation: I want the freedom to maintain hacky analysis code).

BTW – I’m also rolling out a new libffi in a few weeks. You can keep track of the release candidate test results on the wiki here.

Summer is over, so put away the white pants and start submitting patches!

September 10th, 2009

It’s been a while since my last update. What can I say… summer was nice.

But now, back to business! I’ve just committed some long overdue patches to the upstream GNU tools:

This gets us to booting the kernel, loading BusyBox, running some shell code and… crashing on the first fork. No problemo. Nothing a small matter of programming can’t fix. However, there are some other distractions…

Verilog is lots of fun! It looks like regular programming, but it feels more like building a kinetic sculpture.

There’s also the small matter of not having an interrupt controller! So there’s some work here to design an interrupt controller, implement it in verilog, simulate it qemu (and possibly the gdb sim), and port the kernel over to using it. This should be interesting…

More hello world progress with uClibc/uClinux, and a GDB question.

August 18th, 2009

Tonight I got a hello world app to use uClibc’s puts() routine! This is a big deal because it’s the first time I’ve had system calls coming in from userland. I haven’t checked the changes in yet, because they’re a mess, but here’s a basic run-down of what I had to do…

  • First, uClibc had to be taught how to make system calls to the moxie uClinux kernel. This was straight forward, except I came across one surprise which I’ll describe below.
  • Next, I needed to add more files to my initfs. Specifically, I needed a /dev/console. Fortunately, the kernel build process makes this easy. I decided to use the “text file” approach to populating the initramfs as described in this document.
  • Finally, I had to create a tty device for my default console that spoke through the gdb simulator via software interrupts. Fortunately the ia64 port had a similar tty device for talking through one of HP’s simulators that I was able to mostly copy.

Once all this was done, I was able to build a standard Hello World app with moxie-uclinux-gcc, and it just worked!

What about the system call surprise? Despite what I read somewhere that said that Linux system calls had a maximum of 5 parameters — that’s not quite true. Some take 6 (are there any with 7? more?). This thwarted my attempt to get busybox running tonight, because it uses mmap, and mmap is one of those 6-argument system calls. There are a few ways to fix this. I think I’ll just hack the compiler to use 6 register arguments and see what that does to code size/performance.

If there are any GDB hackers reading this… I have one question for you… The kernel is loading and relocating my “init” program, then execve’ing it. When I run the kernel in gdb, it would be nice for gdb to load the debug info for init so I could see what it’s doing when I step into userland. Is there some way to do this manually?

First moxie-linux userland app runs!

August 17th, 2009

I’ve been taking advantage of the nice summer weather recently, so it’s taken me a while to get around to this… but here’s the first moxie userland app!

#include <string.h>

#define MSG "Hello, World!\n"

void __attribute__((noinline)) gloss_write (int fd, char *ptr, int len)
{
  asm("swi 5"); // "write" via the gdb simulator
}

int main()
{
  while (1)
    gloss_write (0, MSG, strlen(MSG));
  return 0;
}

If you build this with moxie-uclinux-gcc, name it init and point the linux kernel build machinery at it, you’ll get a kernel that boots, loads the init BFLT binary from a ramfs, and performs an execve system call on it! The program loops forever, printing “Hello, World!” via the gdb simulator IO interrupt because I haven’t fixed up uClibc to perform system calls yet. Baby steps, my friends! Baby steps! We will get there!

The main bit of work needed to get this going was to fix up the software interrupt handler for system calls. I’m saving registers in a pt_regs struct just prior to calling the execve system call. execve then manipulates these saved registers so we end up running the newly exec’d program when we “return” from the system call. This was all done in linux-2.6/arch/moxie/kernel/exception_handler.S, which you can see here.

Next, I’ll get uClibc to make system calls into the kernel so we can try the same program with libc.a’s puts().

Speed bumps on the road to moxie userland

July 30th, 2009

Sooo….. it turns out there’s lots to take care of before userland apps like BusyBox can run.

  • The root filesystem. This one is easy. I just built a short Hello World application in C with moxie-uclinux-gcc. This produces an executable in BFLT format which I call ‘init’. The kernel build machinery takes this and produces a compressed root filesystem image linked to the vmlinux binary. The good news is that the kernel is able to boot, detect this initramfs, decompress it and load the init executable (which involves fixing up all of init’s relocations). My Hello World doesn’t actually use the C library or any system calls. It just writes Hello through direct communication with the simulator via our software interrupt (swi) instruction. I thought this would let me avoid dealing with system calls for now. I was wrong…
  • System calls. This one is harder. Obviously (in retrospect!) the kernel creates the init process via the execve system call. Implementing system call support involves lots of platform dependent stuff. For instance, how do we invoke system calls? How are parameters passed? How do we switch back and forth between userland and the kernel? The first question is easy: I’ll use our trusty software interrupt (swi) instruction to invoke system calls. This means creating an exception handler and installing it as described in this old post.
    As an aside, the swi instruction takes a 32-bit immediate operand. We currently use this to identify calls to the simulator via libgloss. This works well for escaping to the simulator, but isn’t the best way to identify system calls to the kernel. The Linux kernel is going to ignore this operand, and we’ll pass the system call ID in a register instead. This avoids us having to do complex instruction decoding in the exception handler processing the interrupt (also trashing any future data cache). Libgloss and the sim only need a small number of IDs, so I’m going to chop the swi instruction down from 48-bits to 16-bits in a future build of the tools.
    Passing arguments to the system calls was also interesting to sort out…
  • System call argument passing. The moxie ABI currently only has two registers being used to hold function arguments. The remaining arguments must live on the stack. This decision goes back to when we only had 8 registers to play with. It turns out that Linux kernel system calls can have a maximum of 5 arguments. In order to avoid tricky argument marshaling, I’ve decided to try changing the general ABI accordingly, so that up to 5 registers may be used to hold function arguments. This involves changes to the compiler, debugger and a smattering of assembly language in libgloss.
    The great thing about having integrated benchmarks into the moxiedev environment is that you can easily compare before and after performance for ABI changes like this. Running “ant benchmark” runs through the MiBench benchmark suite and saves a nice report for easy comparison. It turns out that switching from 2 to 5 register arguments is almost universally a win in terms of both code size and instruction trace length (an approximation of run time). The consumer jpeg benchmarks were slightly larger and slower, but only by less than 1%. Every other benchmark result was slightly better. The one outlier was the “network_dijkstra” benchmark which ended up 44% “faster” (44% fewer instructions being executed).
  • The first real moxie compiler bug. Sometimes things just don’t work! This is especially true when you’re tracking the bleeding edge from upstream. I won’t go into the details, but I discovered a rare bug in the compiler where it would assume that compare results could live across function calls. Fortunately I was able to track down the guilty compilation pass and disable it with -fno-rerun-cse-after-loop. I know that some people have brought up kernels without the benefit of a nice debugger, but I just don’t see how that is possible. The simulator, and a solid gdb port with reverse debugging capabilities have proven to be invaluable!

There’s still lots to figure out and implement in the system call space, but it’s clear that we’re getting very close to running our first Linux program!

The start of a uClinux userland

July 28th, 2009

Before we can start building BusyBox, we need a few more bits of technology…

  • uClibc: this is a popular embedded C library, like newlib, but used more often in Linux environments. I ported uClibc to the moxie core just like every other bit of software in this project: quickly! My strategy has always been to make things link as quickly as possible, and then sort out the details later. This seems to be a workable strategy in the presence of good testsuites and the like.
  • elf2flt: this utility turns moxie ELF binaries into the “Binary Flat” (BFLT) format currently required by my Linux port. The BFLT format is required because: (a) we don’t have an MMU yet, so there’s a single address space for the kernel and all applications, and (b) my moxie tools port doesn’t yet support something like the FR-V’s FDPIC ABI that would allow for proper shared library support in the absence of an MMU. elf2flt ends up wrapping the installed linker, so builds actually produce BFLT binaries without any extra step.
  • a moxie-uclinux toolchain: I build this from the same sources as the moxie-elf toolchain, but with a sysroot containing the kernel and uClibc header files.

This is all built and committed to moxiedev, which means that you can check it out and build it yourself with a single “ant build”. I haven’t tried using it yet, and I know it will fail in its current state. The next step is to build BusyBox with the moxie-uclinux toolchain and create an initramfs that we can link directly to the kernel binary. That’s when the debugging fun begins…

The first moxie linux boot output…

July 26th, 2009

Userland, here I come! Check out moxiedev, run “ant build”, then do the following…

$ ./root/usr/bin/moxie-elf-run linux-2.6/vmlinux
Linux version 2.6.31-rc3-gb006656-dirty (green@dev.moxielogic.com) (gcc version 4.5.0 20090715 (experimental) [trunk revision 149693] (GCC) ) #6 Sun Jul 26 12:03:14 EDT 2009
console [earlyser0] enabled
setup_cpuinfo: initialising
setup_memory: Main mem: 0x0-0x1000000, size 0x01000000
setup_memory: kernel addr=0x00001000-0x002cc000 size=0x002cb000
setup_memory: max_mapnr: 0x1000
setup_memory: min_low_pfn: 0x0
setup_memory: max_low_pfn: 0x1000
On node 0 totalpages: 4096
free_area_init_node: node 0, pgdat 002621b0, node_mem_map 002cd000
  Normal zone: 32 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 4064 pages, LIFO batch:0
Built 1 zonelists in Zone order, mobility grouping off.  Total pages: 4064
Kernel command line: lpj=1000
PID hash table entries: 64 (order: 6, 256 bytes)
Dentry cache hash table entries: 2048 (order: 1, 8192 bytes)
Inode-cache hash table entries: 1024 (order: 0, 4096 bytes)
Memory: 13376k/16384k available
start_kernel(): bug: interrupts were enabled *very* early, fixing it
NR_IRQS:32
 #0 at 0x00000000, num_irq=0, edge=0x0
 #0 at 0x00000000, irq=0
start_kernel(): bug: interrupts were enabled early
ODEBUG: 3 of 3 active objects replaced
ODEBUG: selftest passed
Calibrating delay loop (skipped) preset value.. 0.20 BogoMIPS (lpj=1000)
Mount-cache hash table entries: 512
NET: Registered protocol family 16
bio: create slab  at 0
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 512 (order: 0, 4096 bytes)
TCP bind hash table entries: 512 (order: -1, 2048 bytes)
TCP: Hash tables configured (established 512 bind 512)
TCP reno registered
NET: Registered protocol family 1
ROMFS MTD (C) 2007 Red Hat, Inc.
msgmni has been set to 26
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
brd: module loaded
nbd: registered device at major 43
uclinux[mtd]: RAM probe address=0x2cba18 size=0x0
Creating 1 MTD partitions on "RAM":
0x000000000000-0x000000000000 : "ROMfs"
mtd: partition "ROMfs" is out of reach -- disabled
TCP cubic registered
NET: Registered protocol family 17
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
VFS: Cannot open root device "" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
Rebooting in 120 seconds..
Machine restart...

Stack:
  00823e50 00823e68 000044e6 fffffff3 002288e0 fa3c0600 00823e7c 0001e196
  19981fc0 00000000 0001d4bf 00823ec0 0003ef30 00000000 00229ba8 00000078
  00000000 19981fc0 00000000 0003ef84 000fe422 0007091a 00008001 00000018
Call Trace: 

[<000044e6>] machine_restart+0x14/0x1a
[<0001e196>] emergency_restart+0xe/0x10
[<0001d4bf>] sys_rt_sigtimedwait+0x15/0x1de
[<0003ef30>] panic+0x11e/0x172
[<0003ef84>] printk+0x0/0x1a
[<000fe422>] strchr+0x0/0x4a
[<0007091a>] sys_mount+0x0/0xf4
[<00008001>] update_curr.clone.4+0x115/0x178
[<0003ef84>] printk+0x0/0x1a
[<00266bca>] mount_block_root+0x2d0/0x2f4
[<00008001>] update_curr.clone.4+0x115/0x178
[<00266d94>] mount_root+0x7c/0x86
[<00266f38>] prepare_namespace+0x19a/0x1e6
[<00001314>] do_one_initcall+0x0/0x280
[<00001314>] do_one_initcall+0x0/0x280
[<002667ba>] kernel_init+0xea/0x104
[<000025c8>] kernel_thread_helper+0x8/0x14
[<002666d0>] kernel_init+0x0/0x104
[<000025c0>] kernel_thread_helper+0x0/0x14

program stopped with signal 2.


There are lots of short cuts that need to be cleaned up, but it seems that I’m basically at the point where I need to worry about userland.

Busybox, I’m looking at you!

Kernel update: device trees and kernel threads

July 25th, 2009

I’ve spent a lot of time in airports/planes/hotels recently, which is good news for the moxie linux port. It runs about 6.5M instructions, booting up to the point where a couple of kernel threads are created. However, a few context switches later it all comes tumbling down. I didn’t have any of my kernel books with me, so I stopped hacking at that point rather than try to guess/decode how some of the internals are supposed to work.

My port is using a device tree to describe the system architecture. This makes it easier to build a single kernel image that can boot on multiple moxie implementations. There’s a good paper on this relatively new infrastructure here: http://ols.fedoraproject.org/OLS/Reprints-2008/likely2-reprint.pdf. If you’ve been following this project, you may recall that console I/O is implemented differently on the gdb and qemu simulators. For the gdb simulator we use a software interrupt instruction (swi) to escape to the simulator, but the qemu port uses a real simulated serial device. This means they need different console devices in the kernel to print boot messages. The device tree is a nice way to describe differences like this and have a single kernel image to boot in both environments.

Also, as predicted, I actually used moxie gdb’s reverse debugging feature to help debug my kernel bring-up. It was really useful a couple of times and has probably saved me the amount of effort required to implement it in the first place already!

The next week is going to be very busy for me, so I don’t expect to get much done. We’ll see…

The Moxie Linux port

July 23rd, 2009

I’ve just checked the start of the kernel port into moxiedev. Running “ant build” will produce tools, simulators, u-boot and now a vmlinux you can run with moxie-elf-run or in gdb. It crashes on startup right now, but that’s to be expected. I just got it to the point where it links. More to come…

Reverse debugging!

July 12th, 2009

A few weeks ago I happened to be in Palo Alto and met up with my friend and long-time GDB hacker Michael Snyder. He told me about a new feature in GDB called “process recording”. The basic idea is that when you tell GDB to enter into “record mode”, it records undo information for every instruction executed during the debug process. This lets you switch direction and start stepping through your code backwards in time! It’s a pretty amazing feature.

I was anxious to implement it for moxie, but only got around to it this weekend. The moxie ISA is relatively small, so it wasn’t much work. The patch looks something like this. And, as promised, you can now step forwards and backwards through moxie code. Reverse “continue” and “finish” also work. It’s going to be really handy when I get back to working the Linux kernel port.

Some GDB front-ends already have the controls in place for reverse debugging. Here’s a webinar showing reverse debugging on Eclipse. I mostly use Emacs as my moxie-elf-gdb frontend, but I’m not sure if it supports the reverse instructions nicely yet (of course you can “set exec-direction reverse” and use the normal step/next/continue commands).

Thanks to Micheal for pointing me at this new feature, and to Tea Water for implementing process recording in the first place.

UPDATE: Emacs support for reverse debugging should be arriving in 23.2. I’m not sure what the schedule for that is, but 23.1 is supposed to come out next week (July 22).