Debugging your soft-core CPU within an FPGA
We’ve already looked at the requirements for debugging a CPU in general, as well as how to debug a CPU within a Verilator based simulation. Let’s now return to this topic and take a look at how to modify your soft-core CPU so that you can debug it once it is placed within an FPGA.
When we discussed the general needs of a debugger, we used a figure similar to Fig 1. to describe a CPU’s debugging needs. We addressed the left column, debugging the CPU while in simulation, in a previous post. Today, the figure at the right has been modified to highlight todays discussion and focus: how to add the necessary logic into a soft-core CPU to support debugging.
As shown in the diagram, the basic operations we’re going to need to support are resetting, starting, halting, and stepping a CPU, as well as examining and changing CPU, state registers. You may wish to review how the ZipCPU handles pipeline control, since the logic we shall discuss today needs to fit nicely into that context.
That H/W Debugging Interface
If you’ve never done this before, please don’t start by trying to implement GDB’s remote serial protocol within Verilog. The protocol is very powerful, and we’ll discuss how to use it later to connect your CPU to GDB. The problem is that the protocol is complex, and it will take a lot of work to process it within hardware. Keep reading, there’s an easier way.
As a first step, think for a moment about what debugging your CPU will require. In particular, you’ll want to be able to read and write both registers and memory.
-
Reading from memory requires the address you wish to read from as well as a strobe signal to indicate your desire to read
If your address space is big enough, this same sort of command and interface can work for reading CPU registers just like it does for memory.
-
Writing to memory requires an address, a value, and a strobe to tell you when to write the value to the given address.
As with reading, if you can allocate an address for each register within your CPU, the same interface you used for reading registers could also work to writing registers within your CPU.
The approach can even be expanded to include not only registers values, but also internal (debugging) state variables from within your CPU.
As a third example, a control register could also be used to tell the CPU when to execute an instruction, and when to hold in reset.
All of these interactions, therefore, are easily understood as things that could take place across a “bus” with both memory and memory mapped peripherals on it. Therefore, one might consider giving the CPU a bus slave interface, and hooking it up to the debugging bus we’ve been working with (or something similar) as shown in Fig 2.
This approach has a couple of advantages. First, the debugging bus can be used to debug both the peripherals and the memory the CPU will need to work with later. Second, if the CPU and debugging bus are each given the same view of the peripheral set, then no separate address map and decoder needs to be created. Third, this approach creates a means, independent of the CPU, of reading and writing to memory. This could be very important later when building a program loader for the CPU, since it would then allow you to load the CPU’s program into memory and test it, without relying on any internal ROM within the CPU that would cause the design to need to be resynthesized anytime something changes.
The downside of this approach is that, depending upon your implementation, the bus arbiter may slow the CPU’s access to memory by a clock (or two).
This was the approach taken by the ZipCPU, as shown in Fig 3, so we’ll use the ZipCPU as our example of this approach in our discussion below. The ZipCPU was given two address locations on the debugging bus: a control and data location. (These are both discussed and defined in the ZipCPU specification document.) A small wrapper around the ZipCPU proper, called ZipBones, connects to the control register of the debug slave port and controls the reset and halt lines into the CPU. These are used to implement reset, halt, start, and step operations as we’ll see shortly.
The ZipCPU also has a second wrapper with more functionality to it, called the ZipSystem, but since the logic within the ZipBones is simpler, we’ll focus on it.
Our discussion will focus on the reads and writes of these two locations, the control and data ports, although you may wish to give your own CPU more registers than just these two.
Reseting, halting, and stepping the CPU
Let’s look at the control register for the
ZipCPU for a moment. Writes to
this control register have the side-effect of controlling the
i_halt
and i_rst
(reset) lines within the
CPU.
These side effects will cause the
ZipCPU
to run, halt, step, or even reset as requested.
The first side effect to be discussed is the reset. Like many digital logic cores, the ZipCPU has a reset line going into it. Controlling this reset is also quite possibly the simplest interaction with the bus. Specifically, any time the control register is written with the reset bit set, the CPU is reset. Further, this reset line into the CPU is initialized high, to make sure that the CPU always starts from a reset state.
Inside the ZipCPU, this reset line causes the ZipCPU to reboot. While it only (re-)initializes a minimum of variables, it is enough to get the CPU start from (nearly) known conditions. In particular, all error conditions, cache valid indications, and pipeline valid flags are cleared on reset. Further, the CPU is sent to a pre-programmed address. What doesn’t happen is that the CPU registers are not re-initialized (the program counter and flags registers are though). This allows some amount of fault recovery in software, if desired, prior to setting all of the registers to known conditions.
The second control line going into the
CPU
is a master halt line, i_halt
. This line, if set, will cause the
CPU
to halt in such a way that no instructions will go
into the
ALU,
memory
or divide
units, but instructions that have already
entered these units will be allowed to finish. It does this by setting the
stall logic associated with units, as we discussed during our
CPU pipeline signaling
post.
The neat thing about the master halt line concept is that the
CPU
is designed to halt at a stopping point between instructions when using it.
Instructions that have entered the
ALU,
memory
or divide
stages are allowed to complete, but further instructions are not allowed
to enter these stages. As a result, the
CPU
can be started, stepped, or halted by adjusting this master enable (i.e.
i_halt
) line.
This i_halt
line into the
CPU
is calculated from a couple of pieces of logic in the
ZipBones
wrapper. The first is the cmd_halt
register which is controlled by writes
to the control register. On a reset, the
CPU
will start in a halted mode (if the boolean parameter START_HALTED
is set to
true). Ever afterwards, any write to the halt bit in the
ZipBones
status register will set or clear this bit with two exceptions:
cmd_step
and cpu_break
.
The first exception is the cmd_step
logic. If the halt bit is set
at the same time the
CPU
is instructed to step forward by one clock, then the halt request is ignored
until the cmd_step
has been true for one cock. We’ll come back to this
exception in a moment.
The second exception is the cpu_break
signal. This is shown in Fig. 3 as
the hardware break signal. This is the signal the
CPU
creates when it has encountered an unrecoverable fault–such as trying
to execute an unimplemented instruction while in the supervisor
(i.e. interrupt) state. Other faults within the supervisor state
will also cause the
CPU
to fault as well, such as the break instruction, a divide by zero fault from
within supervisor mode, or a
wishbone bus
error. This cmd_halt
state captures that fault, and then holds the
CPU
in a halted state for the debugger to come by and examine it. (Alternatively,
the CPU
could be programmed to just reboot.)
On that note, let’s return to looking at the step bit. If the step bit is ever
set, the ZipBones
wrapper will release the halt line for one clock and then set it immediately
again. This will cause one instruction to enter the
ALU/memory pipeline stage.
It works in conjunction with the cmd_halt
bit above, so that if the step
register is ever true, the cmd_halt
register will get set on the next
instruction.
While this description may sound simple, the devil is in the details. For example, what happens when the CPU is in the middle of an atomic operation? What if an interrupt comes in while the debugger has the CPU halted? (It gets ignored.) What if the CPU is in the middle of executing a pair of instructions from a compressed instruction set word? (The ZipCPU has no ability to restart a compressed instruction word mid-way through …) What if the CPU is loading a cache line, and the memory is slow to respond? (i.e. broken)
All of these details can make this halt line difficult to implement.
Clearing the Cache
Before we move on to gaining access to CPU registers, the control register offers one more big capability–that of clearing the cache.
This is one of those annoying details that you may not think of initially.
If the
CPU
is halted, the debugger is free to change memory, right? Hence,
the debugger might wish to swap a normal instruction for a BREAK
instruction or vice versa. The problem lies in whether the
CPU
has already read that instruction into its cache. If the instruction the
debugger wishes to change is already in the cache, then the
CPU
might not notice the fact that the debugger has changed that memory.
(The ZipCPU
cache has no bus snooping capability … yet.)
This command also clears the CPU’s pipeline for essentially the same reason–lest the instruction the debugger wished to change was also within the pipeline already and just waiting to execute. We discussed how this was done earlier, when we discussed how the ZipCPU implemented its pipeline logic.
Reading and Setting Registers
While a proper bus protocol makes sense for reading from CPU registers, as we discussed above, the ZipCPU’s debug implementation isn’t quite a full bus implementation. Perhaps this interaction is ready for redesign. For now, I’ll just explain it as it is.
The ZipCPU control register contains a set of six address bits. Writes to the control register can be used to set these six address bits as well other flags such as those we discussed above. These then become the wishbone address of a register within the CPU. Ever after, reads from (or writes to) the ZipCPU data register will adjust the CPU. register, addressed by these six address bits.
Remember how we discussed earlier that a register read from a bus is just a big case statement? The same is true of the ZipCPU. The only difference within the ZipCPU is that 28 of the 32 ZipCPU registers, are stored in an on-chip RAM while the other four are collected from a set of control and status bits, and the two program counters. Reading from the bus, therefore, is almost the same as the big case statement we discussed earlier:
Writes are a touch more difficult, since the debugger needs to insert any register writes into the processing chain of the CPU.
The ZipCPU. handles such writes by creating a module parallel with the ALU and memory. This module (really only a register and about 4 lines of code) is only active if the CPU is halted.
Likewise, the register (and value) the ALU would’ve written upon completion is modified during a halt as well:
This then sets the write values on the clock before writeback. (The
adf_ce_unconditional
flag is a piece of the
ZipCPU’s
pipeline logic that we may come back and address in more detail later in
a post on pipelining.)
Finally, so that the debugger can know that this write has occurred, the ZipCPU holds the stall register high any time it the CPU hasn’t completely halted.
You may notice, if you look at the code, that there’s no acknowledgement line. Indeed, the acknowledgement line is generated at the bottom of the ZipBones file based upon the fact that any request made to the CPU, as long as the CPU isn’t stalled, is successful.
Conclusion
You may notice that the logic above only depends upon a couple of wires, and that these wires have a very simple amount of logic assicated with them. This is how digital design should be. The trick to every problem is knowing how to make the problem simple.
In our case, this problem is simplified by first creating some form of debugging bus to get a bus access point to our hardware and peripherals, as well as understanding several various pipeline strategies, and then second understanding how a simple CPU can use such a strategy.
This still leaves us with many more ZipCPU topics to discuss, such as how to add or remove peripherals by simply adding or removing parameters from an AutoFPGA command line. However, we are going to postpone that discussion until after I have the opportunity to discuss AutoFPGA at ORCONF this year.
In the meantime, then, I’d like to turn this blog’s attention to the DSP topics of both sine wave generation and digital filtering. We’ll come back to the ZipCPU later–if for no other reason than I’ve been asked to discuss how to modify GCC to support a new CPU backend.
No man can serve two masters: for either he will hate the one, and love the other; or else he will hold to the one, and despise the other. Ye cannot serve God and mammon. (Matt 6:24)