QEMU 1.5 was just released the other day, and in the “And much more…” category I’m happy to say that it includes Moxie support!
This release contains basic Moxie core support, with the imaginary “moxiesim” board support. I have some local changes that provide Marin SoC emulation, and can run the on-chip bootloader I recently wrote about. In this example, for instance, we’re sending the u-boot bootloader program in srecord format to qemu’s emulated serial port on stdin. It looks just like the real hardware does…
$ cat ~/u-boot.srec | qemu-system-moxie --machine marin --kernel bootrom.elf --nographic
MOXIE On-Chip Bootloader v1.0
Copyright (c) 2013 Anthony Green
Waiting for S-Record Download...
Jumping to code at 0x30000000.
Using default environment
U-BOOT for "marin"
We’re just a few baby steps away from being able to do really cool things!
Good news: we can access external memory! The logic for my pseudo-static RAM controller is working, and big programs can finally run on hardware.
You may recall that I had previously only been accessing fake memory that was configured directly out of limited FPGA resources. I could squeeze a tiny C program in there, but not use anything like newlib, the embedded C runtime library. This new memory controller lets the moxie-based Marin SoC access 16MB of external RAM on the Nexys3 board.
When we were limited to on-chip resources, the C binary would be coupled with the synthesized logic and loaded directly into the FPGA. This means any changes to the code meant resynthesizing the logic to rebuild the FPGA bitstream (I think there are ways around this, but I never got there with my work-flow). Now that I have access to external RAM, I can separate my code from my logic.
The trick is to use an on-chip bootloader – code that is loaded with the FPGA bitstream as described above. It does some basic hardware initialization, and sends this message to the serial port:
MOXIE On-Chip Bootloader v1.0
Copyright (c) 2013 Anthony Green
Waiting for S-Record Download...
At which point I can send any program I like over my laptop’s serial port in the form of an S-Record ASCII file. This generally looks like…
$ moxie-elf-gcc -Os hello.c marin.S -T../../moxie-marin.ld -o hello.elf -lnosys
$ moxie-elf-objcopy -O srec --srec-forceS3 hello.elf hello.srec
$ cat hello.srec > /dev/ttyUSB1
And then, back on the Nexys3 serial port I see:
Jumping to code at 0x30000000.
A couple of things can happen now:
with a little bit of dejagnu hacking, we can get the GCC testsuite to run directly on hardware. The simple thing here is to just have libgloss’ _exit() jump back to the on-chip bootloader @ 0×1000.
test the “stage-2″ bootloader, u-boot. U-Boot was one of the first programs I ever ported to moxie. I’ve run it on the simulator, but never on hardware.
The interrupt controller is working now, as is the timer and my exception handling firmware. So now I’m able to write a basic stop-watch application, where the 7-segment display simply increments the count every second. Yes, this sounds basic, but there’s a lot of complexity under the hood! This is all with the MoxieLite-based Marin SoC. Next up: one of the following…
1. Finish the hardware GDB remote protocol handler, or
2. Implement a Marin board emulator in QEMU, or
3. Add support for external RAM (I’m currently just using limited FPGA BRAM), or
4. Add interrupt support and timer ticks to RTEMS, or
5. Hook up the bus watchdog to the processor (should this go through the interrupt controller? Or directly to the core?)
It’s been a while since my last update.. let me bring you up to speed.
A couple of libffi releases got in the way of moxie hacking (although libffi 3.0.13 now includes full moxie support!), but things are picking up speed again.
On the software side of things, the moxie RTEMS and QEMU ports have both been accepted upstream. So now it’s possible to build, run and debug RTEMS applications on QEMU purely with upstream project sources. You may notice that I’m doing much less work in the moxiedev repository these days. This was mostly just a staging area for moxie software support (tools, OS), and there’s little use for it now that most everything is upstream. All of the moxie HDL work now happens in the moxie-cores git tree.
As for the hardware side of things, here are some of the recent changes:
bad (illegal) instructions now cause an illegal instruction exception
A simple interrupt controller has been added to the marin SoC. I have the Nexys3 momentary switches hooked up as interrupt sources, so I can trigger interrupts and handle them in software by pressing those buttons.
A trivial timer has been hooked up to the interrupt controller, so I can now generate ‘tick’ interrupts for RTEMS in support of preemptive multitasking (everything was cooperative up ’til now).
I’m actually just debugging the timer ticks right now, but it’s very close.
A typical software debug solution for an embedded systems might involve a JTAG connection to the board, and then some kind of protocol translation software that handles communication between GDB’s remote serial protocol and the target JTAG port (see OpenOCD, for instance). The FPGA systems I’m working with include JTAG ports, and the vendors also provide JTAG IP cores for interfacing them to your digital logic. On the other hand, these systems also have nice UARTs that are easy to talk to. We have the opportunity to dramatically simplify the debug toolchain by including support for GDB’s remote protocol directly on chip. This would be a hardware implementation of the protocol – no software stubs required.
The GDB Target Engine IP core is essentially a state machine that reads GDB packets coming over the UART (a microusb connection to my laptop). It has direct access to MoxieLite core through some additional wires for extracting register values. It also acts as a bus master to read/write directly to/from memory. The Marin SoC only has one bus master – the moxie core. The nice thing here is that we don’t have to add any new bus arbitration logic for the second master, because only one master will ever be active at a time. We’re either running in debug mode (active connection to GDB over the UART), in which case the GDB Target Engine is the bus master, or we’re running in regular mode, where the moxie core is in control.
The GDB remote protocol includes many commands these days, but only a small number are required to be supported by the target: read/write registers, read/write memory, step and continue.
Current status is that I can connect GDB directly to the SoC using “target remote /dev/ttyUSB0″, at which point GDB negotiates with the target to determine what features are supported. I can hit Ctrl-C in GDB to tell the SoC to enter debug mode. The Target Engine core then talks to MoxieLite to extract register values, converts them to ASCII text and sends them back to the debugger over the wire. This includes the PC, so GDB knows where to go. Given that this is working, I’m not too worried about the rest of it – but only time will tell…
I’ve just committed the bits required to run a C program on the Marin SoC.
Rather than hook up the Nexys3 external RAM module, I’m using extra space on the FPGA itself for RAM. Most of the hard work was sorting out the linker script magic required to generate an appropriate image.
I’ve moved all memory mapped IO devices up to 0xF0000000. So, for instance, the 7-segment display LED is at 0xF0000000, and the UART transmit register is at 0xF0000004. I’ll just keep going from there.
Next comes libgloss hacking to map stdout/stdin to the UART (which I talk to with minicom on my Linux box). We’re very close to “Hello World” now!
“Marin” is the name of my test SoC consisting of a wishbone wrapped 75Mhz big-endian MoxieLite bus master, along with two slave devices: embedded ROM and the Nexys3′s 7-segment display. So, right now I can write some code into FPGA embedded ROM to manipulate the display. For example…
# This is where 7-segment display is mapped to memory
ldi.l $r1, 0x1234
ldi.l $r3, 0x0
loop: sta.s DISPLAY_ADDR, $r1
dec $r1, 1
ldi.l $r2, 5000000
delay: dec $r2, 1
cmp $r2, $r3
This displays a countdown on the hex display starting at 1234.
Here’s what I think will be next:
I need to be able to access RAM, which means implementing a module to support the Nexys3′s CellularRAM device and wrapping that up as a wishbone slave.
Once I can access RAM, I can test C compiler output, but only small code that I can embed into the FPGA’s ROM.
Next comes a UART wishbone slave so I can talk to it over the microusb serial port from my Linux host. I’ll need to hack up libgloss to map I/O to my memory-mapped UART.
One of the annoying things about this Xilinx toolchain is that AFAICT Digilent doesn’t provide the tool you need for programming memory (Flash, RAM, or otherwise) from your Linux host. So I plan on writing some ROMable firmware to download code (srecords?) over the UART (xmodem?) to program memory. This is the point at which we should be able to run larger programs. I already have a u-boot port, so I think that will be first on my list.
It’s great to have Brad Robinson’s MoxieLite implementation for Marin. It’s very small but can still run at quite a clip. Once the surrounding infrastructure is working, however, I’m going to get back to Muskoka, which is my 4-stage pipelined moxie SoC to see if I can crank up the MHz.
As usual everything is in github. However, the HDL cores and SoC designs are no longer embedded in the moxiedev tree. They’re in a new top-level git repo called moxie-cores. Check it out here: http://github.com/atgreen/moxie-cores
Brad Robinson just sent me this awesome shot of MoxieLite in action. His Xilinx Spartan-6 FPGA based SoC features a moxie core handling VGA video, keyboard and FAT-on-flash filesystem duties using custom firmware written in C. This is all in support of a second z80-based core on the same FPGA used to emulate an ’80s era computer called the MicroBee. Those files in the listing above are actually audio cassette contents used to load the MicroBee software. The moxie core is essentially a peripheral emulator for his final product.
Keep up the great work, Brad!
The most recent compiler patch was the addition of -mno-crt0, which tells the compiler not to include the default C runtime startup object at link time. This is common practice for many embedded projects, where some system specific house keeping is often required before C programs can start running. For instance, you may need to copy the program’s .data section from ROM into RAM before jumping to main().
I’m going back to my pipelined moxie implementation. Last I looked I had to move memory reads further up the pipeline…
There’s a working hardware implementation of moxie in the wild!
Intrepid hacker Brad Robinson created this moxie-compatible core as a peripheral controller for his SoC. He had been using a simple 8-bit core, but needed to address more memory than was possible with the 8-bit part. Moxie is a nice alternative because it has a compact instruction encoding, a supported GNU toolchain and a full 32-bit address space. FPGA space was a real concern, so he started with a non-pipelined VHDL implementation, and by all accounts it is running code and flashing LEDs on a Nexys3 board!
The one major “ask” was that there be a little-endian moxie architecture and toolchain in addition to the default big-endian design. I had somewhat arbitrarily selected big-endian for moxie, noting that this is the natural byte order for TCP. In Brad’s design, however, the moxie core will handling FAT filesystem duties, which is largely a little-endian task. At low clock speeds every cycle counts, so I agreed to produce a bi-endian toolchain and, for the most part, it’s all committed in the upstream FSF repositories (with the exception of gdb and the simulator). moxie-elf-gcc is big-endian by default, but compile with -mel and you’ll end up with little-endian binaries.
Brad also suggested several other useful tweaks to the architecture, including changing the PC-relative offsets encodings for branches. They had originally been encoded relative to the start of the branch instruction. Brad noted, however, that changing them to be relative to the end of the branch instruction saved an adder in his design. I made this change throughout the toolchain and (*cough*) documentation.
I’ll write more about this as it develops… Have to run now.