It seems that the more I work with the ZipCPU, the more options it accumulates.
First it was the prefetch: I built a basic prefetch, then one that can issue two bus requests at once, then one with a fullly integrated I-cache. In each case, I wanted more performance but didn’t want to give up the ability of building the ZipCPU in an extremely low logic configuration.
Then it was the multiplies: not every board I worked with had the embedded DSP elements required for hardware accelerated multiplies. Indeed, some that had embedded DSP elements couldn’t handle a 32x32 bit multiply within a single clock (Spartan 6). This forced me into two separate multiplication implementations. Then, the fact that the iCE40 has no DSPs lead to an additional multiplication implementation just to handle that environment.
The initial memory subsystem, while working, was cripplingly slow. A faster memory subsystem was written to replace it, and then a proper data cache was written to replace that. On top of that, the ZipCPU can optionally support a lock instruction for atomic access, a compressed instruction set, a set of CPU-local peripherals, and more.
If those weren’t enough changes, the instruction decoder needs to support all of the various instruction set extensions. Lock instructions need to be honored by the memory module, unless no lock instruction is implemented. Multiplication instructions need to be honored if enabled, while creating illegal instruction exceptions if not. Some of the options need special decoding, such as the simulation instructions or the early branching instructions.
While the ZipCPU is highly configurable, it doesn’t make sense to test every permutation of the various configurations. Therefore, let’s look over some of the more common configuration settings used by the ZipCPU. More than that, we’ll give each of these configurations names. These names will us remember the configuration set, as well as helping to make our SymbiYosys configuration simpler in a moment.
This new option describes a configuration that uses the brand new (formally verified, Xilinx Series-7 proven) data cache. If you want performance from the ZipCPU, this is no-holds barred attempt at high speed and full features.
Prior to the data cache, this configuration was the no-holds barred high speed, full featured ZipCPU version. It describes a fully pipelined implementation containing an instruction cache, multiplies, divides, compressed instruction set, and early branching capability. (The early branching capability allows you to branch from the decode stage, without waiting for the later stages. It applies only to unconditional branches.)
This is the same thing, only we shut down everything we don’t need: no pipeline CPU (forcing a minimum of 3 clocks per instruction), no multiplies, no divide instructions, but still supports the compressed instruction set.
Much to my surprise, the no-pipelined option wasn’t as small as I needed to get for some implementations, so this version drops the compressed instruction set support.
Didn’t we get rid of enough stuff yet?
The iCE40 doesn’t support distributed RAM. All reads from the register file need to go directly into a register first, and only on the next clock can we do anything with them.
Using SymbiYosys, we can declare different “tasks” to verify each of these configurations. The following code declares five such “tasks”, one on each line.
Notice the format of this section. It starts with a
[tasks] line. Every
line thereafter begins with the name of a task. In our case, these represent
the various configurations we just outlined above. The second half of the line
is more interesting. This consists of a series of labels which will also be
accepted as task names later. That way, we can specify
ice40 and get
all of the
nopipe (non-pipelined) options, together with the
option and the more critical
How might we use this? There are two basic approaches. First, we can begin
any line following in our
script with a task name followed by a colon. Once done, everything
following is only executed if that given task name is the active
task. This applies to the aliased names as well, such as
So far, this is kind of interesting but not all that useful.
Let’s consider some other things yosys offers.
Using yosys, you can set a macro. You might recognize macros by their Verilog usage:
One problem with this sort of declaration is that it isn’t clear whether the macro defined in one file will remain active in another.
This defines the macro
NO_DISTRIBUTED_RAM across all input files, but only
nobkram is the active task. Likewise,
ZIPCPU is defined for all
tasks. This latter definition is how I handle telling submodules if they
are being verified as separate modules or as submodules
Another very useful yosys
command is the
command. You can use this to change
the value of any parameter within your logic. As examples, the
ZipCPU has several high
level parameters. Perhaps you may have noticed some of them in Fig 1. above.
IMPLEMENT_FPU is a single bit parameter
that controls whether the (still not yet existent) floating point unit (FPU)
is included. As a more relevant example,
OPT_LGDCACHE controls the size
and whether the
is included at all. If this value is set to zero, no
will be included in the build, whereas if it is non-zero it sets the size
of the cache.
For this, we’ll use the second approach for specifying task-configurations, as shown in Fig. 4 on the right. In this case, we can start a set of task-specific commands using the task name and a colon on a line by itself.
This process is continued until either another task name, or
until a line containing two
-s by themselves. Hence, the definition of the
full_proof ZipCPU configuration shown
below, as well as the minimal
nopipe-lined option and the even more
lowlogic option. When done, the
-- line specifies that all the
tasks join together again for the lines following.
Now, just the one command
will attempt to
ZipCPU in all of these various
configurations. For each configuration,
will create a directory,
zipcpu_lowlogic. Within this
directory, you’ll find the
logfile.txt containing the standard output from
the run. You can use this to find out whether your design passed that proof,
or if not what assertion or cover statement failed. You’ll also find an
engine_0 directory with any trace files within it. (Why
Because I tend to only ever use one
engine. Otherwise you might have other engines as well.)
Using Make to drive SymbiYosys
There’s one other thing
provides that is very valuable from a scripting standpoint. Upon
will create an empty file in the newly created results
directory indicating the results of the run. Example files include
UNKNOWN. This file makes it easy to create a
to support several
runs, and we only need use the
PASS file to do it.
Let’s pick an example component to verify, such as the ZipTimer that we discussed earlier. The timer itself has no real configuration options, so to verify it we could just place the following two lines into our Makefile.
however, will make a directory called
ziptimer upon every run.
Once this directory exists, make will no longer run our proof. However, if
we tell make that
ziptimer is just a
name of something to do and not a file, we can create the functionality we want.
Here is where make starts to shine. Since
ziptimer/PASS is a file, created
upon successful completion of the
pass, we can tell
that this file is created from the files the
Now, anytime the
changes, make will attempt to
it again. Further, should the proof fail, the
PASS file will not get
created, and so the next time we call
make it will attempt to create this
file again until our proof passes.
What if we wanted to verify a lot of things? Rather than running make many times, once per target, we might instead start our Makefile with a list of proofs, containing both the components as well as larger proofs.
Let’s clean this up a little more with some definitions. Since I like to keep
my formal verification
separate from my Verilog
files, it might help
to use a name for that path to simplify it. Here, we use
defined, this value will get substituted anytime we reference
Likewise we’ll shorten the name of our target
We can also define the names of our formal wishbone property sets.
Putting all this together, the script now says that …
$(TMR)is a “phony” target that doesn’t build a file.
- To build
$(TMR)we need to first build
$(TMR)/PASS. Why the extra step? Just so that we can run
make ziptimerlater, rather than
make ziptimer/PASSor worse
make ziptimer_timerconfiguration/PASSfor every timer configuration we might have.
- Finally, we can call
to formally verify
This proof is dependent upon not only the code for the timer
but also a list of formal wishbone
These are called dependency files, or sometimes just dependencies
for short, because the validity of our proof depends upon these files.
By listing these dependencies to the right of the
:, make will only formally verify our timer if either the
$(TMR)/PASSfile is missing, or if one of the dependency files is newer than the
This is exactly what we want from a Makefile! Running make will run all of our formal verification proofs, but once all the logic is verified, running make will just return a message telling us it has nothing to do.
But how shall we handle the multiple configurations we discussed earlier?
Why not set up one proof per configuration?
Once accomplished, one simple command,
Once completed, or even before, you can run
to see what proofs have yet to be completed, or equivalently which proofs have failed–assuming in the latter case that make has completed.
Of course, the problem with using
make -n to determine which proofs have
failed is what happens after only one proof fails. After the first proof
fails, make will give you no
information about whether or not the remaining proofs might pass.
Alternatively, if you don’t want
make to stop on the first failed proof,
you can instead run,
Want to use all of your host processors multiple cores? You could specify
make -k -j <ncpus> to use all
<ncpus> of them. However, this might
leave some of your build cores overloaded during induction. For
this reason, I’ve considered running
make -n -j <ncpus/2> instead.
What’s all this good for? Well, for me this means I can verify my brand new
several arbiters, … and the ZipCPU
make command. If the
make command fails, I can go back and
examine the respective
logfile.txts to see why. Likewise, if I want
to know what proofs need to be re-accomplished, I can just type
to see what
make would try to build if you ran it again.
With a little more work, I could split the list of Verilog files,
into a more finer grained list, so that all of the CPU configurations
don’t need to be re-verified every time one file, used only by some
When I first started using this approach of testing multiple configurations
I immediately found several errors within code that I had assumed was working.
The first bug was in the memory
where it wasn’t properly handling a misaligned address exception (an option
that wasn’t checked by default). The second bug was within the
that I am hoping to blog about soon. Since I had only verified the safety
properties within it, that is the
assert()s given the assumptions, I hadn’t
noticed that certain assumptions rendered the proof
Now, using these multiple configurations, I can both prove the assertions
and run cover to be even more certain that the various
ZipCPU components work.
So even though I’ve only started using this approach in the most recent
release of the ZipCPU, because of the
bugs I’ve found with it using this new approach, I now like it so much that
I’m likely to slowly modify all of the
within my various projects to use this
approach as I have opportunity. Even if the core in question isn’t
configurable, I’m going to make certain I do this in order to guarantee that
cover() checks run automatically.
Then I saw, and considered it well: I looked upon it, and received instruction. (Prov 24:32)