The Moxie Linux port

July 23rd, 2009

I’ve just checked the start of the kernel port into moxiedev. Running “ant build” will produce tools, simulators, u-boot and now a vmlinux you can run with moxie-elf-run or in gdb. It crashes on startup right now, but that’s to be expected. I just got it to the point where it links. More to come…

Reverse debugging!

July 12th, 2009

A few weeks ago I happened to be in Palo Alto and met up with my friend and long-time GDB hacker Michael Snyder. He told me about a new feature in GDB called “process recording”. The basic idea is that when you tell GDB to enter into “record mode”, it records undo information for every instruction executed during the debug process. This lets you switch direction and start stepping through your code backwards in time! It’s a pretty amazing feature.

I was anxious to implement it for moxie, but only got around to it this weekend. The moxie ISA is relatively small, so it wasn’t much work. The patch looks something like this. And, as promised, you can now step forwards and backwards through moxie code. Reverse “continue” and “finish” also work. It’s going to be really handy when I get back to working the Linux kernel port.

Some GDB front-ends already have the controls in place for reverse debugging. Here’s a webinar showing reverse debugging on Eclipse. I mostly use Emacs as my moxie-elf-gdb frontend, but I’m not sure if it supports the reverse instructions nicely yet (of course you can “set exec-direction reverse” and use the normal step/next/continue commands).

Thanks to Micheal for pointing me at this new feature, and to Tea Water for implementing process recording in the first place.

UPDATE: Emacs support for reverse debugging should be arriving in 23.2. I’m not sure what the schedule for that is, but 23.1 is supposed to come out next week (July 22).

A Disassembler in Verilog

June 22nd, 2009

I’ve been playing around a little more with verilog. Here’s a mostly complete moxie disassembler module written in verilog.

And here’s a little driver for it. The driver reads a hex dump file into an array representing memory. On every clock cycle it updates the instruction and data output registers and increments the program counter. The disassembler samples those values on every cycle, and tells the driver how far to increment the PC. Pretty basic stuff!

$ moxie-elf-gcc -o hello.x hello.c -Tsim.ld
$ moxie-elf-objdump hello.x -O verilog hello.vh
$ iverilog test-iprinter.v ../../iprinter.v
$ ./a.out
        ldi.l   $sp ,   0x00400000
        ldi.l   $fp ,    0x00000000
        dec     $sp ,     12
        ldi.l   $r0 ,    0x000128b4

etc etc etc

Nothing too impressive really. I’ve stuck this test code in a directory hierarchy that would be useful for dejagnu, as I plan on using dejagnu for regression testing the various HDL modules.

ISA improvements

June 11th, 2009

I’ve committed the PC-relative branch instruction changes upstream. But this is just one of many ISA improvements that need to happen. Here are a handful of other ideas off the top of my head. None of these projects should be particularly difficult.

  • Shorten load/store offsets to 16-bits. They are currently 32-bits, but for all of the benchmarks I’ve looked at the upper 16-bits are always 0×0000 or 0xffff. If the compiler ever really wants to use an offset > 16-bits, it should revert to computing the target address in registers. I don’t expect that much code would require this.
  • Introduce shift instructions with immediate operands. There’s plenty of opcode space for us to add 16-bit shift instructions that include a 5-bit immediate shift value (so we can shift up to 32-bits in either direction). Right now we load a 32-bit immediate shift value into a register which burns that register as well as wastes 32-bits of code space per shift.
  • Get the compiler to generate 16-bit immediate loads. All immediates are 32-bits right now, but the vast majority of these constants are < 16 bits long.
  • Push/pop multiple registers to the stack with one instruction. Although we have 16-registers, the ABI doesn’t have us pushing all 16 to the stack on function entry. We should be able to have a single 16-bit instruction that pushes/pops all of the relevant registers in one go. The instruction would include a bitmap identifying the registers we need to push/pop. ARM has something like this. The only drawback I can think of is that it could increase interrupt latencies as we’d probably have to retire the entire instruction (~10 memory writes/reads) before servicing an interrupt.
  • Many register rich ISAs include one register that is hardwired to zero. We could try this to see if it makes a difference, but I doubt it would be a win. Another idea would be to create a cmpz instruction to compare a register to zero so we don’t have to burn a register for this common operation. Maybe cmp1 might even make sense. This is easy to measure.

Those are some of the obvious ones, and all I have time to write about now.

Moxie GCC port is upstream!

June 9th, 2009

The moxie port has been accepted by the GCC steering committee!

I just checked it in.

That is all.

Everything is relative (finally!)

June 7th, 2009

The Moxie ISA still needs quite a bit of tuning. Take branches, for instance. A beq instruction currently encoded like so

00001111xxxxxxxx iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

…where the “x“s represent “don’t care” bits, and “i“s are a 32-bit absolute branch target. That’s right — branch targets are not PC relative! This is hugely wasteful.

I’ve finally got around to fixing this. Here’s how I did it…

  1. I recoded all branch instructions as “Form 3″ instructions, and tweaked the as-of-yet unused Form 3 encodings so they look like this:
    
      FORM 3 instructions start with a bits "11"...                                 
    
        11oooovvvvvvvvvv
        0              F                                                            
    
       oooo         - form 3 opcode number
       vvvvvvvvvv   - 10-bit immediate value.
    

    This gives us 16 opcodes with a 10-bit immediate value. There are only 9 branch instructions, so we have a bit of room left in the Form 3 opcode space.

  2. I introduced a new 10-bit PC-relative Moxie relocation in BFD. This tells the linker and friends how to process PC-relative relocations.
  3. I hacked the assembler to generate these new relocations instead of simply emitting a 32-bit absolute address.
  4. I hacked the disassembler to print the new Form 3 instructions out nicely.
  5. Finally, I taught the compiler how to emit valid branch instructions. It’s not that they look any different now; it’s just that you need to worry about branch targets that exceed our 10-bit range. Actually, we have an 11-bit range because we know that all instructions are 16-bit aligned. This lets us drop the bottom bit from the encoding since we know it will always be 0.
    An 11-bit range lets us branch about 1k backwards to 1k forwards. If the compiler detects that a branch target is out of range, we want it to do something like the following transformation…

        beq    .FAR_TARGET

    …becomes…

        bne    . + 8
        jmpa   .FAR_TARGET

    The “bne .+8” line means branch forward 8 bytes from the current PC. This would skip the unconditional jump to .FAR_TARGET (a 6-byte instruction + 2-bytes for the branch = 8). Note that we have to reverse the logic from “beq” to “bne” for this to make sense.

    This is only possible if GCC can tell how far away the branch targets are. Fortunately, we’re able to annotate instructions in the machine description file (moxie.md) with their length; currently either 2 or 6 bytes long. GCC then processes these annotations to determine branch distances.

    Now that we know branch distances at compile time, the compiler can do smart instruction selection to deal with out-of-range branches. The changes were quite simple and limited to the .md file in the backend.

The savings after this ISA change are substantial. For instance, the consumer_jpeg_c benchmark in MoxieDev is more than 15% smaller when we use PC-relative branches! The u-boot binary, on the other hand, is “only” 7% smaller.

I hope to commit these changes to SRC and GCC once the GCC port is merged upstream. Fingers crossed…

An even newer git repo

May 29th, 2009

So it turns out that hosting a git repo that is only accessible via slow http is no fun.

Fortunately, the great team at github.com were willing to take this on, so now the real repository is accessible via this command:

$ git clone git://github.com/atgreen/moxiedev.git

I’m retiring all other repos.

Also, check out the project page here: http://github.com/atgreen/moxiedev/tree/master. Note the early-90s era ASCII graphics in the new README file. Sweet.

New git repo

May 26th, 2009

I’ll bottom line this one quickly:

  • moxiedev is now maintained with git. Check it out like so..
    $ git clone http://moxielogic.org/moxiedev.git
  • moxiedev now contains a partial u-boot port. It’s “partial” because I fat fingered some commands and blew away four or five important files. They will have to be recreated before this thing builds.

Lessons learned:

  • hg is much more intuitive than git. Unfortunately hg and/or my hg hoster was having problems with the size of moxiedev, necessitating a change. Hosting a public git repo on my own system seemed like the easiest thing.
  • Make sure you backup everything. I am cursing myself for not having pushed out the u-boot port much sooner (but I had to move off of the hg system first).

BTW – still waiting on GCC steering committee decision on inclusion of moxie port.

The Race For A New Game Machine: great book!

May 15th, 2009

I just read The Race For A New Game Machine today on a cross-country flight, and wow.. fun read!

If you’ve ever read Tracy Kidder’s great book The Soul of a New Machine, you’ll know what to expect. But this book chronicles the SONY/Toshiba/IBM Cell partnership, and the creation of the processor core at the heart of both the PS3 and XBOX360 from the point of view of lead architect David Shippy. Not only is it full of interesting technical details**, but it exposes a dark story of manipulation, deception, betrayal and broken friendships. Some of the story is so strange it’s hard to believe.

Many years ago I was involved in a some work with Toshiba and SONY around the Emotion Engine, the MIPS-based core used in the PlayStation 2. The team at Cygnus/Red Hat had done lots of work on PS2 development tools, and we all liked working with the Toshiba and SCEI people. It was disappointing to learn that we weren’t going to participate in the Cell project but, after reading this, maybe it was for the best!

** This book introduced me to clock gating, a trick used by ASIC developers to save power. Shippy’s core passed a “power token” through the processor pipeline, ensuring that at any one time the only pipeline logic being clocked was the logic being used. Neat trick, but apparently the savings aren’t that great for FPGAs.

RPMs

May 7th, 2009

There’s a little more detail on the wiki, but basically…

$ rpm -ivh http://moxielogic.org/download/moxielogic-repo-1-1.noarch.rpm
$ yum groupinstall moxiedev

Your welcome.

Seriously though… release engineering is at times both tedious and fascinating. But mostly tedious. Still, I was interested in tackling a little releng work this week before responding to Ian’s first compiler review comments or continuing the verilog hacking. Now, to move on to more interesting things…