I’ve been working for some time, off and on, on a port of the ZipCPU to the ICO board. The port isn’t complete (yet), and it’s not paid for, so the work isn’t going very fast. However, I came across something very interesting in this process that I thought would be worth sharing.
To understand the distinction, consider the following piece of code from within the ZipCPU.
The register file itself is declared as a memory of thirty two 32-bit words. Internally, these are split into two register sets–a user register set and a supervisor register set, but for now all that matters is that this is declared as a RAM.
The ZipCPU reads from this RAM after
decoding the registers
that will be used by the
instruction. Once the
dcd_B are available from the
from its register set.
Notice how this isn’t a clocked read. This will come back into play in a moment.
This is the simple part. The next step are the many paths from here to the input to the ALU.
The ZipCPU has
16 registers in its working
set. Of these sixteen
one is the program counter
(register 15). This
program counter, however,
isn’t maintained in the
regset. Instead, it is maintained in one of two separate registers:
ipc for the supervisor or interrupt level program
upc for the user program
flag (global interrupt enable) controls which of the two
is currently in use.
When this information comes from the
dcd_pc contains a pointer to the next
in the current
stream. If we want to read our own program
counter for this
that is if the
gie flag coming from the decoder or
dcd_gie matches the
in question, then we’ll want the value to be
dcd_pc. Otherwise, it can
only be the supervisor mode reading the the user mode program
We’ll pick from between these two program counter registers here.
We’ll need this value in another moment, so hang on to this thought. For
now, let’s return to the read from the
above, where we left the result in
The next thing you need to remember from the
instruction set is that
(just about) every
can have an immediate attached. The two inputs, therefore, to any
are a register I call operand
A, and a second value
B which includes an
B, plus an immediate. If no
register is read, the
only references the immediate.
To handle this selection, the
produces a flag,
dcd_rB. This will be true any time the
B operand comes
from a register.
There’s also another special register–the condition
codes. This register
contains a variety of
mode selection bits together with the traditional flags, zero (
C), negative (
N), and overflow (
V). As with the
program counter above,
there are two versions of this register–one for
supervisor mode and one for user mode.
In our next step, we are going to calculate the value of the B input in all respects except the immediate.
First, if there’s no register read, then the input (not including the immediate) is zero.
Second, in the case of a concurrent register write, the register value is the result of the register write. This allows the CPU to bypass the register file, so that ALU outputs can be immediately available for the next instruction–making the CPU run that much faster.
Third, if this is a reference to the condition codes register, then we’ll set that based upon some special flags.
Once we get past all of that logic, we can finally deal with the expected
B operand comes from the result of the register
we just read above.
As a final step, we’ll add the immediate value associated with the
dcd_I, to our value and register the result for the next stage.
Well, not quite. That’s the last step on that clock. Before we send
this value to the ALU,
we’ll need to double check that the value isn’t being written to the register
on the same clock (again). In this case,
wr_reg_ce is a flag indicating that a value is being written to the
wr_reg_id is the address of that register, and
is the value of that register.
If you got lost in the explanation above, relax. My whole point here is that this logic is complicated.
Because of this complexity, this was one of those pieces of logic that I was
most interested in formally verifying within the
ZipCPU. Along the way I came
across several pipeline hazards I wasn’t
example, what happens if you set
the result of a register plus an immediate, only to have the register
updated on the next clock? Pipeline
primarily captured by the
op_ce flag, needs to prevent that from happening.
Here’s the problem: the iCE40 doesn’t support memory reads unless the result is immediately registered, as in:
While yosys will try to work around this problem using flip-flops and LUTs, it does so at the cost of about 3k or more LUTs on a 7.6k LUT device. That’s bigger than the ZipCPU itself! This so badly broke the bank for the ZipCPU that unless I could change things there was no way it would work on the iCE40.
The basic solution to this problem is to move the register read logic one clock earlier–to shuffle logic from one clock to the next. In other words, I needed to read from the block RAM in the decode stage, before I got to the selection logic above. My fear was, how could I patch this change in and remain certain that the CPU would still work?
Hint: this is one of the reasons why I like formal methods.
So, rather than clocking the register address into
outputs, I left it as combinatorial logic alone–making two additional outputs
for that purpose. I could now read the register one clock earlier, and
clock the value in with the same control signal that was used to clock the
See the difference at the top of the diagram? Let’s walk through this updated logic in steps below. Not only do we read from the register file on a clock, but we also keep track of whether the register we need is being written to the register file on this clock as well.
Now, on the next clock we can select which of the two values we should be using.
My fear in all of this, though, was that I would somehow get this logic wrong. I mean, there’s so much that can go wrong with this change, how can I rest assured that the result works?
Enter formal methods.
Within the ZipCPU
(on the dev branch currently),
there’s a set of properties rebuilding the logic we worked through above
describing the values going into the
divide, etc). Getting
the CPU and
these properties right took the majority of the work using formal methods on
to date. What makes these properties special is that they aren’t clocked.
They don’t need to be.
don’t need to meet timing requirements like synthesizable logic does. As a
result, I can assert that both
op_Bv have the right value using simpler logic than the logic we expressed
above. Even better, I can make that assertion independent of any
pipeline stalls or
hazards. The result is that the property tends to find hazards I wasn’t
The end of this chain of logic is a short set of assertions.
The important part is in the middle. First, I assert that the value of the
A register going into the
ALU is as it should
be, then the
Yes, there are a lot of caveats to this assertion–the values only need to be correct if the illegal instruction flag isn’t set, if the pipeline isn’t being cleared, if the debugging port isn’t going to force the pipeline to be cleared on the next clock, and so on.
The good news? Once I made the change outlined above, I could go back and
time with the
selected. Once the proof passed, I knew the change “worked”.
One of the topics I discuss in the two-day formal verification course I teach is the idea of moving logic between clocks. In the course, we adjust a flag indicating that a counter is non-zero to the clock before the counter’s value in question. To me, this is one of the very important uses of formal verification–being able to tell when you make a change, no matter how complex, that the properties you had asserted before remain true. In this case, I was able to prove that the ZipCPU maintained the correct register inputs, in spite of moving the logic earlier by one clock.
Does this mean that the ZipCPU works on the iCE40? Well, let’s just say that now it builds for the iCE40 whereas it wouldn’t build before. The project remains ongoing. I would like to implement the ZipCPU both on the ICO board as well as on Luke Valenty’s new TinyFPGA. Before I get there, though, I’ve still got a lot of peripheral simulation work to get through.
God is our refuge and strength, a very present help in trouble. Ps 46:1