Archive for the ‘moxie’ Category

Fetching instructions

Tuesday, September 7th, 2010

Moxie requires some interesting instruction fetch logic.

For my initial implementation I’m assuming a 32-bit path to instruction memory. But moxie has both 16- and 48-bit instructions, so it’s not like simple RISC cores that can feed the pipeline on every cycle. My solution is to feed 32-bit instruction memory words into a Instruction FIFO. 16- and 48-bit instructions pop out of the other end of the FIFO on every cycle (or a NOP bubble when we’re waiting for the last 16 bits of a 48-bit instruction). My initial Instruction FIFO is 64-bits long. From my simple testing it looks like this does a reasonable job of keeping the instruction memory path busy, and issuing instructions as often as possible (I’m just eyeballing the gtkwave output, reproduced below). I can experiment with a longer Instruction FIFO later.

This image shows a few signals from the Instruction FIFO. valid_o tells us that we’re popping off a valid instruction from the FIFO, whereas full_o tells us not to write any data to the FIFO because it’s full. So far, so good – decoupling the fetching of instruction memory from the rest of the pipeline is obviously the right thing to do.

One more complication that I’m going to punt on for now is PC tracking. Eventually we’ll want to pass the PC down the pipeline so we get accurate exception addresses. Tracking the PC through the Instruction FIFO is just one more little complication that I’ll tackle after I make more progress on the rest of the microarchitecture.

I’ve only done some behavioral simulation so far, but I believe the code is synthesizable. The code is in github here: http://bit.ly/9yVQ7U. Running make should build everything, then just run “a.out”.

Note that I’m using magic instruction memory: an array populated with a hello world app built like so…
$ moxie-elf-gcc -o hello.x -O2 hello.c -Tsim.ld
$ moxie-elf-objcopy -O verilog hello.x hello.vh

And the verilog simulator reads hello.vh directly. Pretty cool!

(I just realized I wrote about fetching instructions almost 18 months ago – that took too long!)

Still hacking…

Tuesday, May 11th, 2010

…Just in case you were wondering!

But it’s been slow going. The good news is that moxie is front and centre for me again, so let’s see what we can do over the next few weeks. And over these next few weeks I’m promising myself not to touch a C compiler until the moxie HDL code starts executing a few instructions.

Half Dome at Yosemite

In other news, I visited one of my favourite places this past weekend. Yosemite is a great place to reset. Go there!

Summer is over, so put away the white pants and start submitting patches!

Thursday, September 10th, 2009

It’s been a while since my last update. What can I say… summer was nice.

But now, back to business! I’ve just committed some long overdue patches to the upstream GNU tools:

This gets us to booting the kernel, loading BusyBox, running some shell code and… crashing on the first fork. No problemo. Nothing a small matter of programming can’t fix. However, there are some other distractions…

Verilog is lots of fun! It looks like regular programming, but it feels more like building a kinetic sculpture.

There’s also the small matter of not having an interrupt controller! So there’s some work here to design an interrupt controller, implement it in verilog, simulate it qemu (and possibly the gdb sim), and port the kernel over to using it. This should be interesting…

More hello world progress with uClibc/uClinux, and a GDB question.

Tuesday, August 18th, 2009

Tonight I got a hello world app to use uClibc’s puts() routine! This is a big deal because it’s the first time I’ve had system calls coming in from userland. I haven’t checked the changes in yet, because they’re a mess, but here’s a basic run-down of what I had to do…

  • First, uClibc had to be taught how to make system calls to the moxie uClinux kernel. This was straight forward, except I came across one surprise which I’ll describe below.
  • Next, I needed to add more files to my initfs. Specifically, I needed a /dev/console. Fortunately, the kernel build process makes this easy. I decided to use the “text file” approach to populating the initramfs as described in this document.
  • Finally, I had to create a tty device for my default console that spoke through the gdb simulator via software interrupts. Fortunately the ia64 port had a similar tty device for talking through one of HP’s simulators that I was able to mostly copy.

Once all this was done, I was able to build a standard Hello World app with moxie-uclinux-gcc, and it just worked!

What about the system call surprise? Despite what I read somewhere that said that Linux system calls had a maximum of 5 parameters — that’s not quite true. Some take 6 (are there any with 7? more?). This thwarted my attempt to get busybox running tonight, because it uses mmap, and mmap is one of those 6-argument system calls. There are a few ways to fix this. I think I’ll just hack the compiler to use 6 register arguments and see what that does to code size/performance.

If there are any GDB hackers reading this… I have one question for you… The kernel is loading and relocating my “init” program, then execve’ing it. When I run the kernel in gdb, it would be nice for gdb to load the debug info for init so I could see what it’s doing when I step into userland. Is there some way to do this manually?

First moxie-linux userland app runs!

Monday, August 17th, 2009

I’ve been taking advantage of the nice summer weather recently, so it’s taken me a while to get around to this… but here’s the first moxie userland app!

#include <string.h>

#define MSG "Hello, World!\n"

void __attribute__((noinline)) gloss_write (int fd, char *ptr, int len)
{
  asm("swi 5"); // "write" via the gdb simulator
}

int main()
{
  while (1)
    gloss_write (0, MSG, strlen(MSG));
  return 0;
}

If you build this with moxie-uclinux-gcc, name it init and point the linux kernel build machinery at it, you’ll get a kernel that boots, loads the init BFLT binary from a ramfs, and performs an execve system call on it! The program loops forever, printing “Hello, World!” via the gdb simulator IO interrupt because I haven’t fixed up uClibc to perform system calls yet. Baby steps, my friends! Baby steps! We will get there!

The main bit of work needed to get this going was to fix up the software interrupt handler for system calls. I’m saving registers in a pt_regs struct just prior to calling the execve system call. execve then manipulates these saved registers so we end up running the newly exec’d program when we “return” from the system call. This was all done in linux-2.6/arch/moxie/kernel/exception_handler.S, which you can see here.

Next, I’ll get uClibc to make system calls into the kernel so we can try the same program with libc.a’s puts().

Speed bumps on the road to moxie userland

Thursday, July 30th, 2009

Sooo….. it turns out there’s lots to take care of before userland apps like BusyBox can run.

  • The root filesystem. This one is easy. I just built a short Hello World application in C with moxie-uclinux-gcc. This produces an executable in BFLT format which I call ‘init’. The kernel build machinery takes this and produces a compressed root filesystem image linked to the vmlinux binary. The good news is that the kernel is able to boot, detect this initramfs, decompress it and load the init executable (which involves fixing up all of init’s relocations). My Hello World doesn’t actually use the C library or any system calls. It just writes Hello through direct communication with the simulator via our software interrupt (swi) instruction. I thought this would let me avoid dealing with system calls for now. I was wrong…
  • System calls. This one is harder. Obviously (in retrospect!) the kernel creates the init process via the execve system call. Implementing system call support involves lots of platform dependent stuff. For instance, how do we invoke system calls? How are parameters passed? How do we switch back and forth between userland and the kernel? The first question is easy: I’ll use our trusty software interrupt (swi) instruction to invoke system calls. This means creating an exception handler and installing it as described in this old post.
    As an aside, the swi instruction takes a 32-bit immediate operand. We currently use this to identify calls to the simulator via libgloss. This works well for escaping to the simulator, but isn’t the best way to identify system calls to the kernel. The Linux kernel is going to ignore this operand, and we’ll pass the system call ID in a register instead. This avoids us having to do complex instruction decoding in the exception handler processing the interrupt (also trashing any future data cache). Libgloss and the sim only need a small number of IDs, so I’m going to chop the swi instruction down from 48-bits to 16-bits in a future build of the tools.
    Passing arguments to the system calls was also interesting to sort out…
  • System call argument passing. The moxie ABI currently only has two registers being used to hold function arguments. The remaining arguments must live on the stack. This decision goes back to when we only had 8 registers to play with. It turns out that Linux kernel system calls can have a maximum of 5 arguments. In order to avoid tricky argument marshaling, I’ve decided to try changing the general ABI accordingly, so that up to 5 registers may be used to hold function arguments. This involves changes to the compiler, debugger and a smattering of assembly language in libgloss.
    The great thing about having integrated benchmarks into the moxiedev environment is that you can easily compare before and after performance for ABI changes like this. Running “ant benchmark” runs through the MiBench benchmark suite and saves a nice report for easy comparison. It turns out that switching from 2 to 5 register arguments is almost universally a win in terms of both code size and instruction trace length (an approximation of run time). The consumer jpeg benchmarks were slightly larger and slower, but only by less than 1%. Every other benchmark result was slightly better. The one outlier was the “network_dijkstra” benchmark which ended up 44% “faster” (44% fewer instructions being executed).
  • The first real moxie compiler bug. Sometimes things just don’t work! This is especially true when you’re tracking the bleeding edge from upstream. I won’t go into the details, but I discovered a rare bug in the compiler where it would assume that compare results could live across function calls. Fortunately I was able to track down the guilty compilation pass and disable it with -fno-rerun-cse-after-loop. I know that some people have brought up kernels without the benefit of a nice debugger, but I just don’t see how that is possible. The simulator, and a solid gdb port with reverse debugging capabilities have proven to be invaluable!

There’s still lots to figure out and implement in the system call space, but it’s clear that we’re getting very close to running our first Linux program!

The start of a uClinux userland

Tuesday, July 28th, 2009

Before we can start building BusyBox, we need a few more bits of technology…

  • uClibc: this is a popular embedded C library, like newlib, but used more often in Linux environments. I ported uClibc to the moxie core just like every other bit of software in this project: quickly! My strategy has always been to make things link as quickly as possible, and then sort out the details later. This seems to be a workable strategy in the presence of good testsuites and the like.
  • elf2flt: this utility turns moxie ELF binaries into the “Binary Flat” (BFLT) format currently required by my Linux port. The BFLT format is required because: (a) we don’t have an MMU yet, so there’s a single address space for the kernel and all applications, and (b) my moxie tools port doesn’t yet support something like the FR-V’s FDPIC ABI that would allow for proper shared library support in the absence of an MMU. elf2flt ends up wrapping the installed linker, so builds actually produce BFLT binaries without any extra step.
  • a moxie-uclinux toolchain: I build this from the same sources as the moxie-elf toolchain, but with a sysroot containing the kernel and uClibc header files.

This is all built and committed to moxiedev, which means that you can check it out and build it yourself with a single “ant build”. I haven’t tried using it yet, and I know it will fail in its current state. The next step is to build BusyBox with the moxie-uclinux toolchain and create an initramfs that we can link directly to the kernel binary. That’s when the debugging fun begins…

The first moxie linux boot output…

Sunday, July 26th, 2009

Userland, here I come! Check out moxiedev, run “ant build”, then do the following…

$ ./root/usr/bin/moxie-elf-run linux-2.6/vmlinux
Linux version 2.6.31-rc3-gb006656-dirty (green@dev.moxielogic.com) (gcc version 4.5.0 20090715 (experimental) [trunk revision 149693] (GCC) ) #6 Sun Jul 26 12:03:14 EDT 2009
console [earlyser0] enabled
setup_cpuinfo: initialising
setup_memory: Main mem: 0x0-0x1000000, size 0x01000000
setup_memory: kernel addr=0x00001000-0x002cc000 size=0x002cb000
setup_memory: max_mapnr: 0x1000
setup_memory: min_low_pfn: 0x0
setup_memory: max_low_pfn: 0x1000
On node 0 totalpages: 4096
free_area_init_node: node 0, pgdat 002621b0, node_mem_map 002cd000
  Normal zone: 32 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 4064 pages, LIFO batch:0
Built 1 zonelists in Zone order, mobility grouping off.  Total pages: 4064
Kernel command line: lpj=1000
PID hash table entries: 64 (order: 6, 256 bytes)
Dentry cache hash table entries: 2048 (order: 1, 8192 bytes)
Inode-cache hash table entries: 1024 (order: 0, 4096 bytes)
Memory: 13376k/16384k available
start_kernel(): bug: interrupts were enabled *very* early, fixing it
NR_IRQS:32
 #0 at 0x00000000, num_irq=0, edge=0x0
 #0 at 0x00000000, irq=0
start_kernel(): bug: interrupts were enabled early
ODEBUG: 3 of 3 active objects replaced
ODEBUG: selftest passed
Calibrating delay loop (skipped) preset value.. 0.20 BogoMIPS (lpj=1000)
Mount-cache hash table entries: 512
NET: Registered protocol family 16
bio: create slab  at 0
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 512 (order: 0, 4096 bytes)
TCP bind hash table entries: 512 (order: -1, 2048 bytes)
TCP: Hash tables configured (established 512 bind 512)
TCP reno registered
NET: Registered protocol family 1
ROMFS MTD (C) 2007 Red Hat, Inc.
msgmni has been set to 26
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
brd: module loaded
nbd: registered device at major 43
uclinux[mtd]: RAM probe address=0x2cba18 size=0x0
Creating 1 MTD partitions on "RAM":
0x000000000000-0x000000000000 : "ROMfs"
mtd: partition "ROMfs" is out of reach -- disabled
TCP cubic registered
NET: Registered protocol family 17
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
VFS: Cannot open root device "" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
Rebooting in 120 seconds..
Machine restart...

Stack:
  00823e50 00823e68 000044e6 fffffff3 002288e0 fa3c0600 00823e7c 0001e196
  19981fc0 00000000 0001d4bf 00823ec0 0003ef30 00000000 00229ba8 00000078
  00000000 19981fc0 00000000 0003ef84 000fe422 0007091a 00008001 00000018
Call Trace: 

[<000044e6>] machine_restart+0x14/0x1a
[<0001e196>] emergency_restart+0xe/0x10
[<0001d4bf>] sys_rt_sigtimedwait+0x15/0x1de
[<0003ef30>] panic+0x11e/0x172
[<0003ef84>] printk+0x0/0x1a
[<000fe422>] strchr+0x0/0x4a
[<0007091a>] sys_mount+0x0/0xf4
[<00008001>] update_curr.clone.4+0x115/0x178
[<0003ef84>] printk+0x0/0x1a
[<00266bca>] mount_block_root+0x2d0/0x2f4
[<00008001>] update_curr.clone.4+0x115/0x178
[<00266d94>] mount_root+0x7c/0x86
[<00266f38>] prepare_namespace+0x19a/0x1e6
[<00001314>] do_one_initcall+0x0/0x280
[<00001314>] do_one_initcall+0x0/0x280
[<002667ba>] kernel_init+0xea/0x104
[<000025c8>] kernel_thread_helper+0x8/0x14
[<002666d0>] kernel_init+0x0/0x104
[<000025c0>] kernel_thread_helper+0x0/0x14

program stopped with signal 2.


There are lots of short cuts that need to be cleaned up, but it seems that I’m basically at the point where I need to worry about userland.

Busybox, I’m looking at you!

Kernel update: device trees and kernel threads

Saturday, July 25th, 2009

I’ve spent a lot of time in airports/planes/hotels recently, which is good news for the moxie linux port. It runs about 6.5M instructions, booting up to the point where a couple of kernel threads are created. However, a few context switches later it all comes tumbling down. I didn’t have any of my kernel books with me, so I stopped hacking at that point rather than try to guess/decode how some of the internals are supposed to work.

My port is using a device tree to describe the system architecture. This makes it easier to build a single kernel image that can boot on multiple moxie implementations. There’s a good paper on this relatively new infrastructure here: http://ols.fedoraproject.org/OLS/Reprints-2008/likely2-reprint.pdf. If you’ve been following this project, you may recall that console I/O is implemented differently on the gdb and qemu simulators. For the gdb simulator we use a software interrupt instruction (swi) to escape to the simulator, but the qemu port uses a real simulated serial device. This means they need different console devices in the kernel to print boot messages. The device tree is a nice way to describe differences like this and have a single kernel image to boot in both environments.

Also, as predicted, I actually used moxie gdb’s reverse debugging feature to help debug my kernel bring-up. It was really useful a couple of times and has probably saved me the amount of effort required to implement it in the first place already!

The next week is going to be very busy for me, so I don’t expect to get much done. We’ll see…

The Moxie Linux port

Thursday, July 23rd, 2009

I’ve just checked the start of the kernel port into moxiedev. Running “ant build” will produce tools, simulators, u-boot and now a vmlinux you can run with moxie-elf-run or in gdb. It crashes on startup right now, but that’s to be expected. I just got it to the point where it links. More to come…