ZipCPU Lesson: If it's not tested, it doesn't work.
has had a problem.
It’s kind of fundamental to digital design, so let’s chat about it.
It begins with the goal of the ZipCPU. The ZipCPU is designed to be low logic, and that’s where the problem begins. The problem is simply that low logic means different things to different people. Low logic means one thing on an iCE40 with only 8k 4-LUTs. Low logic means something else on a Spartan 6, and something else entirely on a 20k 6-LUT Artix-7. It means one thing when driving a Wishbone bus, and another thing when driving an AXI bus.
The natural consequence of trying to support multiple design requirements and different targets is that the CPU is highly parameterized. In general, this is a good thing.
We’ve discussed how to handle formally verifying parameterized
That’s not all that hard to do, although the
article needs a bit
of updating. Specifically,
chparam in Yosys should only be used as an
argument to a
hierarchy command, but that aside handling multiple parameters
is still quite easy to accomplish formally.
My problem is that my formal proofs don’t quite capture everything. Yes, those things they do capture they do so exhaustively, but I still keep finding a bug every now and again at integration time when two things don’t work together like they should.
Frankly, I need a simulation solution that can test the ZipCPU in each and every one of its many potential configurations. That’s what I’d like to discuss today.
First approach: The ZBasic System
My previous approach at testing the ZipCPU was the separate ZBasic repository. This is simply a demonstration system connecting the ZipCPU to a variety of other components. Most notable among these other components are the block RAM memory, the serial port, and the debugging bus. Other less notable components include the SPI based SD card protocol controller and the GPIO controller.
That makes the ZBasic system into a pretty complete great demonstration system–for simulating a single configuration.
It offers as much block RAM as your simulation environment will allow
The serial port has both transmit and receive functionalities, so you can interact with the CPU.
When using Verilator, the SD card can be treated as a full SD card of whatever size is necessary
This is all great, but it’s hardly ideal for testing the CPU–even though that’s what I’ve used it for.
To that end, I built several CPU tests that I have kept in the ZBasic repository to help me know if the CPU is working.
There’s the standard CPU test, which is designed to check the performance of (almost) every instruction in isolation.
There’s a Hello World test, which I’ve included in order to flesh out any obvious problems with the C-Library.
There’s another Hello World test that works by stepping through the classic Hello World program one instruction at a time: a supervisor mode task sets up the program in user mode, and then steps through it.
Finally, there’s a
LOCKinstruction checking program. This one generates three (or four) concurrent tasks all attempting to get access to the same MUTEX and then verify that they have said MUTEX.
These programs are what actually tests the CPU–which was the original purpose of the ZBasic repository.
The most obvious problem I’ve alluded to so far is that this repository only tests a single configuration. That configuration tends to be a full pipeline, cache enabled, Wishbone design. The next big problem is that the test is dependent on many other (non-CPU) components for success.
The result of all of this is that I’ve often published changes to the ZipCPU repository that … broke one part or another of the CPU and then never realized it.
Frankly, I needed a better testing environment.
Envisioning a better simulation test
When thinking over what I needed, I decided upon three goals for a new simulation environment. Obviously, it needed to test multiple configurations. That was my first goal. But that also lead to my second goal, which was that I wanted my simulation environment to test both the Wishbone and AXI front ends. Finally, I also wanted this new simulation environment to be all Verilog.
The first step was to identify a series of “supported” configurations. I chose to define 22 such configurations, of which I’m (currently) only testing the 14 I’ve ever used in practice. The 22 configurations fall into six basic groups:
ASM: The Assembly only configuration is the lowest logic configuration of the ZipCPU. It doesn’t have full instruction support, neither does it support user mode. Missing instructions include shifts by more than one bit, multiplies, and divides. Worse, without user mode, there’s no way to trap on one of these instructions being illegal. As a result, you can’t really use this configuration with GCC. GCC will often attempt to implement one (or more) of these instructions, and without the ability to trap on them there’s no real way to rescue a program so generated for this configuration.
While this configuration is defined, there are no tests assigned to it. Yet. (Technically, that makes this configuration unusable. Remember, if it’s not tested then it doesn’t really work.)
What I really like about this configuration is that this represents the lowest logic configuration of the ZipCPU: requiring only 584 Xilinx 7-Series 6-LUTs.
TRAP: This configuration now supports shifts, user mode, and the lock instruction. Since it supports user mode, it also supports traps, and so the CPU can now support the divide and multiply instructions from user mode–assuming I ever build the trap software to handle such instructions properly.
Without the software to support this TRAP configuration, or for that matter without a GCC flag implemented to replace divide and multiply instructions with soft equivalents, there are no tests of this configuration yet.
MIN: This is the minimum CPU configuration supported by a generic ZipCPU backend for GCC. It includes support for multiplies, divides, shifts, lock instructions, user mode, and the compressed instruction set. This configuration uses the most basic instruction fetch and memory controllers. This also the only configuration where the ZipCPU is not running its full pipeline. Finally, this configuration is the first of several that allows the CPU to be externally configured: the CPU may be reset, halted, or stepped externally and registers within the CPU may be now read and written externally,
This is the minimum configuration that I can currently test automatically.
PIPE: This is the minimum pipelined configuration. Yes, the ZipCPU is pipelined by design, but that can require too much logic for some hardware to handle. Therefore, the ZipCPU supports both pipelined and non-pipelined configurations. This configuration is the first of the pipelined configurations. It also uses (naturally) the piped fetch and memory controllers–allowing multiple bus requests to be outstanding at any given time. As an extra bonus with pipelining, this is the first configuration supporting early branching.
In this case, early branching is defined by any branch recognized by the instruction decoder, which can be forwarded to the instruction fetch prior to the associated instruction making its way through the rest of the ZipCPU’s pipeline.
CACHE: Here’s where we get to a more traditional CPU configuration. In this configuration, both instructions and data are kept in a (nominally 4kB) cache. Because this configuration is cached, this is also the first one that has a chance of keeping the CPU’s pipeline fully loaded.
Low Power: The final configuration has been optimized for low power. This configuration is the same as the CACHE configuration above, save for two changes. First, unused signals have been zero’d out to prevent any unnecessary toggling. Since this costs extra logic to do, it’s not the primary or default configuration by any means. Second, this configuration enables the clock gating feature that we’ve discussed before. As a result, whenever the ZipCPU is sleeping (i.e. waiting for an interrupt), the CPU’s clock will be turned off when using this configuration.
No, it’s not likely I’ll be able to use this in any FPGA projects, but I do use it from time to time on simulation projects and so it’s nice to know it can be done.
That’s six basic configurations, of which I have tests defined for only four at present.
Then, for each of these four configurations, I want to test the ZipCPU in one of four environments: using a basic Wishbone wrapper I call the ZipBones, a second Wishbone wrapper with an attached peripheral set (timers, interrupt controllers, some performance counters, a DMA, etc.) called the ZipSystem, an AXI-Lite wrapper called ZipAXIL, or finally a Full AXI wrapper I call ZipAXI. You can see these configurations enumerated in Fig. 3. It’s not quite 24 total configurations, simply because it hasn’t (yet) made any sense to build an AXI-Lite cache. Therefore, the cache and low–power configurations only test the AXI wrapper to the ZipCPU, not the AXI-Lite wrapper.
This sort of comes with a rather derived requirement: I’ll need a simulation environment that can be uniform enough to support all (or most) of these configurations. This is just to minimize the amount of rework necessary to go from a test of one configuration to another. However, since the AXI environment is so different from the Wishbone one, I eventually settled on two top level simulation drivers: one for AXI, and one for Wishbone.
My last requirement was that this simulation environment be all Verilog. This is sort of a new requirement to me, since I normally use Verilator for my simulations. Five reasons drive this requirement:
Sometime back, I needed to build a simulation with a CPU for a bus driver and the ZipCPU fit that role nicely
Some of my recent ASIC projects have required driving a bus from Verilog. While an all Verilog model of an ARM might have worked here, the ZipCPU fits this role nicely in its absence.
I’ve now been burned, more than once, by a model that works just fine in Verilator that then failed to work in a simulator that supports ‘X propagation. This has bit me in two ways:
My ODDR design failed miserably here. Where I struggled was with a design that just needed to create a register that simply toggles within it. That register didn’t need to be reset, it just needed to toggle with every clock. However, if you include this design in an ASIC environment, where initial statements aren’t allowed, then you either need to add a reset or the ‘X propagation will kill you–even if the design would’ve worked.
My first “solution” to this problem was to replace things like
===and so forth. That worked great until the post place and route simulation. So … I bit the bullet and added a reset to that design. (Who cares, right? It’s an ASIC! Logic is cheap in ASICs.)
I’ve often found myself using
always @(*)blocks to set something to a constant. This works great when using either Verilator or synthesis tools, because you get the value you want. Unfortunately, this is not Verilog language compliant. With a true Verilog compliant simulator, any registers set within such a block will be set to ‘X (undefined) since nothing ever triggers such an always block.
Frankly, if I want to deliver “working” IP to any customers, then that IP really needs to work on their simulator as well.
External device models require assignment delays, and often encrypted IP.
I’ve also recently needed to run simulations against external device models that include assignment delays within them, and I’ve wanted to drive these simulations with the ZipCPU.
Sometimes these models are of my own creation. In this case, Icarus Verilog has handled the problem quite nicely. At other times, these simulations are proprietary, encrypted, models provided by various device vendors. This necessitates being able to use proprietary simulation tools.
So far I’ve tried three proprietary tools:
- NC Verilog (which doesn’t like my use of
- XCellium, which can currently (for me) handle all but Xilinx’s proprietary IP. (I must be missing something–Xilinx says it is supported …)
- Xilinx’s Vivado, which SegFaulted on the first project I tried it on. Since then, I’ve now gotten it to the point where I can run batch simulations on it–just like with XCellium–so both simulators are quite usable for me.
- NC Verilog (which doesn’t like my use of
MCY, or Mutation Coverage with Yosys, works nicely with an all Verilog simulation model to start from.
In case you are not familiar with MCY, MCY is a means of testing the test bench. Want to know what bugs your test bench will catch? Mutate the design (i.e. break it), and see if the test bench can find the mutation. A good test bench should be able to find any mutation, or at least a high percentage of them.
Sadly, although MCY can be used with formal methods, it doesn’t integrate well with them. (i.e., you need to be careful that you don’t mutate any formal properties.) This has really slowed my adoption of MCY.
Post place-and-route timing simulations
Simulating internal timing requires an all RTL model–ideally one that achieves a high level of coverage. So, any test script that passes an MCY check should (ideally) be able to exercise all of the paths within a design even after place and route.
Finally, several of the customers I’ve worked with have asked for all Verilog test benches. Verilator C++ models were simply unacceptable to them.
Bottom line is, if I want to work with others in the IP community, then I’ll need a Verilog–only test bench.
Building the simulation environment
The first step was to build a simulation environment that would meet these needs.
My first problem was the ZipCPU’s configuration. Prior to defining a common configuration set, the ZipCPU was completely configured via an external Verilog “header” file that defined a set of macros used to configure the CPU. These macros controlled whether or not the CPU was pipelined, which fetch or memory controller was used, which multiply implementation was used, which portions of the instruction set were implemented and more.
My problem with this external configuration file was that it was hard to automatically override the definitions within it. The easy way to override things is with parameters (Generics, when using VHDL). So my first step was to rewrite the ZipCPU to get rid of any and all “ifdef”s and to replace them with configuration parameters.
I then needed a top level simulation environment. A minimum CPU needs memory and a console. My favorite CPU test program also requires a timer, and my clock gating test software requires interrupts from an automatically reloading timer. I also threw a CPU logic analyzer in there for good measure although I don’t yet have a test that uses it.
This leads to a test environment looking like Fig. 6.
It’s not just a single test environment either. I built two near–identical test environments: one for AXI, and another for Wishbone. Further, since I wanted to test the same executable logic in each environment, I made sure that the address space controlled by each test environment was the same between both AXI and Wishbone test benches.
As a side note, I never would’ve considered a test setup this complex early on in my own personal development. A crossbar just to test a CPU? That’s a project in and of itself! Or how about the bus resizing elements, which are required to test the CPU on a non–32bit bus? All of these extra parts and pieces were never things that I had considered to be necessary components of a CPU repository, yet the C library won’t run without the console, and so the testing the design necessitates having a crossbar on hand. Similarly, either I need to build a bus width agile console port, or alternatively I just need to suck up the reality of crossing bus widths.
I then ran into a problem when trying to figure out how to support both the ZipBones CPU wrapper, the ZipSystem wrapper, as well as the ZipAXI wrapper from a common addressing space. For background, the core ZipCPU is just that: a CPU. It doesn’t come with many of the peripherals necessary for most CPU environments. For this reason, the ZipCPU initially came with two wrappers. (There are now four.) You could either use the ZipBones wrapper or the ZipSystem. The difference between these two is that the ZipSystem contained a locally mapped set of peripherals: timers, counters, one or two interrupt controllers, and a DMA. The ZipBones wrapper had none of these, so it might be lighter in logic area. Then, when I later built the AXI-Lite wrapper and later the AXI (full) wrapper, I left these near-peripherals out.
How, then, should I guarantee that the ZipCPU’s software can interact with these external peripherals regardless of the wrapper used?
The obvious answer is to guarantee within the test bench that each of these wrappers can see the same set of necessary peripherals–regardless of whether or not they come pre–packaged within the CPU wrapper or not. Hence, I included an AXI-Lite CPU peripheral set into the AXI testbench top, and several ZipSystem peripherals directly into the Wishbone top for the ZipBones system to interact with. All that remained was to make sure these peripherals all mapped to the same addresses.
This wasn’t (yet) enough.
One of the recent drivers of this work has been my desire to operate the ZipCPU in environments with non–32bit wide buses. One project I’m working on requires a 64-bit bus. A second project, based on Enclustra’s Mercury+ KX7 board, will require a 512-bit bus if I only want to be able to keep up with the memory bandwidth that board is capable of. This meant that my simulation test bench environments needed to be bus–width agnostic as well. This then turned into a requirement that my test bench include a bus downsizer, in addition to requiring one more parameter to define the simulation environment. Thankfully, the bus downsizer can be included even if the bus doesn’t need downsizing–in that case, it just becomes a simple pass-through.
Then, after using these simulation environments for a while, I ended up retrofitting each of them with a Verilog watchdog timer.
This timer will count the number of clock cycles it’s been since the CPU has
attempted to access the bus. It’s sort of a proxy for whether or not the
CPU has ever locked up. (Hint: This means the CPU was locking up. In
this case, the lock-up was caused by the clock gating logic found in the
Wishbone drivers.) The way it works is, if the CPU’s bus inputs ever become
idle for some parameterized number of clock cycles,
then the simulation halts with an ERROR. This helped to keep failing
simulations from hanging the entire simulation setup. (We’ll get to the
setup in the next section.)
The final critical component of the simulation environment was the ZipCPU software executable itself. By parameterizing the simulation software load using the name of the ZipCPU executable, I now had complete control over what simulations would run and how.
Configuring the test cases
The last critical piece in this setup, prior to the simulation script, was the test definition file. The Perl script reads the various test configurations from this test definition file, and then commands a run of that test. All output from the test then gets logged to a file for later viewing.
To show how this is done, let’s back up a moment and start with the simulation configuration file. I chose to define a given simulation run via five space delimited fields. These are:
The name of the test. This also becomes the name of the executable Icarus Verilog builds, as well as being transformed into the name of the output log file. For these reasons, the test name needs to be unique.
The CPU configuration. Given that there were 15 separate parameters I wanted to control via the configuration, it helped to have named configuration rather than writing all of these parameters out on each configuration line. The simulation drive script could then easily look up a configuration by name, and set everything. For example, here’s what the generic pipeline configuration looks like from Perl:
This is just the first step in the configuration, though. This would then need to be coupled with a top level entity, a simulation file set, and one more parameter indicating which wrapper was being used: either one of the ZipBones or ZipSystem wrappers (for Wishbone), or the ZipAXIL or ZipAXI wrappers. I’ll get to these details in the next section, though.
Now that the design has a named configuration to use, the next step was to select a test. For this, the third element in each line was the name of a ZipCPU ELF executable turned hex file. This file would then be included into the simulation via
$readmemh()for the ZipCPU to execute.
To know, later on, if the test passed successfully, I also kept track of the output of the CPU’s console. This is the fourth component of a test configuration line: the name of a file to write this console output into.
The reason for this configuration parameter was to help guarantee that all intermediate and output files had unique names. This would allow me to run the script multiple times, for different tests, on multiple different processors concurrently.
In hindsight, I could’ve just created a file name from the name of the test. Perhaps I might’ve called it
$tstname-out.txtor some such. Still, this works, so I have no need to change it at present.
The final part of the command line was really key to the success of this format as a whole. The final piece is an (optional) white-space separated list of parameter overrides. Frankly, if I ever do this again, this will be a guaranteed part of any future approach.
Why? Because it keeps me from modifying the files under test just to test a new configuration.
Why? Because in one customer project, I created an Icarus Verilog script file for each test, and I then kept needing to change one (or more) of those files to turn on (or off) VCD generation. (Yes, VCD generation in the Verilog test benches is completely parameterized). It then became a hassle to recognize whether or not the changes to the file needed to be committed to the repository or not, since git only ever flagged that the file was changed. (It didn’t help that the change was a single character on a very long Icarus command line.) This way, I can control the current test configuration separate from the other files under version control, and I can also see at a glance whether configuration changes were substantial or not.
How have I used this? I’ve now used it for more than just turning on and off VCD (or other trace file) generation. I’ve also used it to adjust the default bus width, or to turn on clock gating for configurations that don’t have it enabled by default. Want to create an ad-hoc test to check a 512bit bus? Not a problem! In another project, one with fewer configuration parameters, I use this parameter list field to set all of the key parameters (and macros!)–such as whether an analog PHY is present, or whether or not Xilinx SERDES I/O elements should be used and tested.
Even better, when required, I can use this optional field to implement Verilog macros as well. So there are a lot of opportunities here.
That’s the configuration file. As you can see, it really captures all of the potential ways the simulation can be reconfigured to support one test or another.
The simulation driver
Now let’s turn our attention over to some of the key components of this simulation perl script driver.
The script starts off by defining a massive number of configuration default values. I’ll skip this section for brevity, but you are more than welcome to look through it. These define both the basic ZipCPU configurations, such as we listed above, as well as which wrapper is to be used for each configuration.
From there, the script starts looking over the command line
arguments. In this
case, we insist on at least one argument or die!. Sorry, couldn’t help it.
or die is common Perlese for exiting the script with an error. Otherwise,
if the first command line argument is the single word
then we’ll ignore any other arguments and run every test case found in the
Finally, if neither case applies, then the argument list is interpreted as a
set of test names
that we’ll then look
in our test definitions
We’re going to want to place our results in a directory that isn’t under
version control. Let’s call this directory
test/, and make sure it exists.
If not, we’ll create it next.
The next step is found at the bottom of the file. Here we either look
up a test configuration by name, via the
gettest() function (also
implemented within this perl script, or read
every line from the configuration file. Every line is then passed to the
simline() function for both parsing and to run the actual simulation.
Incidentally, it’s this
simline() function where all the work takes
place, so let’s break this function up into pieces and walk through it, since
this is the function that actually runs the simulator for a given test
The key to this function is Perl’s pattern matching ability.
We’ll start by removing any end of line comments.
We’ll then apply a pattern match to the line to separate out the various components of the line.
The pattern above depends upon the existence of four (or more) white space delineated fields, as we described them above, where the last field containing the (optional) parameter list may be left blank. If this pattern doesn’t match, then … this isn’t a properly configured test line, and we’ll generate an error and then skip it.
For now, let’s assume the pattern matches and we’ll continue.
It’s important for me to know when things happen. This helps me know how long a test takes, as well as how deep into a test I am at any given time. When I’m not producing any console output, this is also the first indication I have of any (potential) errors. So, I’ll take this time to grab a time stamp to describe the beginning of the simulation call.
Incidentally, if you look through this simulation driver Perl script, you’ll find the script works for Verilator as well as Icarus–it’s just that the Verilator support isn’t (currently) configured either by default or by the command line.
Now that we have our configuration test parameters, the next step is to put
iverilog command line together.
We’ll start with
-g2012. I personally use the
-g2012 option for all my
work. I need it to support my liberal use of
localparams, but I’m sure
there are other goodies that come with this as well.
Let’s look up the parameters associated with our named configuration next.
To do this, however, we’ll need to know the top level module name, whether
We’re going to need that, so let’s grab that off of the configuration file
We can now look up our configuration string. This is the string with all of
our parameters defined in it. In my case, I prefixed each parameter with
-P. This, however, isn’t sufficient. Parameters need to be prefixed with
-P and the name of the top level–so we’ll do a simple substitution here to
get that right. While we’re at it, we’ll add a shell escape for our quotation
marks–so the shell won’t play with them unduly.
Two more parameters come from the test configuration line itself. These are the name of the memory file, containing the ZipCPU test program memory image, and the name of the console file output.
With that, it’s now time to look at our parameter list. This list comes in the
form of a set of
A=B pairs, where the
A is the parameter name and
is it’s value.
The first step is to try to match the remaining portion of the line to
A=B pair and an everything else in the line. I’ve chosen to
name this “everything else” as the CDR after the use of this term in
LISP. Once the design
has been separated into these three pieces, I can then use the first
A=B, to generate a command line parameter
setting. Since this will follow all other parameter settings on the command
line, this one should override any previous parameters set by the same name.
Don’t forget to escape any string quotations!
Our last step will be to append the name of our file list to the command line, and then specify that Icarus should produce an output file in our test directory having the same name as our test’s configuration name.
This should be unique enough to work with.
For those not familiar with Icarus Verilog,
this is Icarus’s way of doing business.
Simulation takes place in two parts. The first part is to build a simulation
executable, and the second part is to build the simulation itself.
(Vivado isn’t all that different.) Now that we have a command line built up to build the executable, therefore, we go ahead and run Icarus to build our simulation executable.
If all goes well, we should now have a simulation executable–assuming Icarus didn’t find some error while building our design.
So … let’s run our simulation!
Once we get to this point, the simulation has now completed. This may take
many hours, depending upon the configuration and the test. For example,
LOCK check on the
MIN configuration takes about 12hrs on my computer. On the other hand, when
all of the tests for all of the configurations can complete in less than one
hour–but … that’s another story. (It’s also one of the reasons why I love
Now that everything has completed, let’s go dig through the log file to see if we’ve been successful.
If we find
ERROR in the log file, or any reference to an assertion failure,
then we have not been successful.
We can now take these results and write them into a report file, to track all of our simulation results. We’ll assume here that if we haven’t found any errors, then the test has been successful.
This isn’t really an ideal test, but it’s worked for me so far.
The ideal would be for the test to end with some form of SUCCESS message. My test setup still needs some work, though, before I will have a dependable SUCCESS message to work from.
Put together, this script allows me to test a rough 67 test cases. Why so many? Simply because each test checks something different. Sadly, I’ve learned from experience that it’s possible to have 66 test cases pass and one fail.
Ideally, the initial CPU test
should catch any and all bugs. Sadly, it doesn’t. Or rather, it hasn’t. For
example, the original CPU
never caught the bugs associated with stepping the CPU, one instruction at a
time. Specifically, stepping through a divide instruction would void the
divide instruction, and leave you forever stepping through the same
instruction. While I’ve now fixed the CPU test
so it checks for that bug, I’m reasonably confident that nothing other than my
LOCK checking program
will truly check for whether or not
LOCK instruction works. Further, the
bus width tests have found bugs the other tests haven’t as well.
While the test set is reasonably complete, I am also painfully aware of some
significant holes remaining in it–thanks to both
coverage checking capability and MCY.
For example, while I exercise the exclusive access capabilities of
both AXI and Wishbone buses, I only do so from a single CPU. Worse, the
CPU will not allow a
LOCK instruction sequence to be interrupted or stepped
through–it’s either all one instruction or none by design. In many ways,
that’s a good thing … except that it means there will never be any true
bus contention to test whether or not the memory modules handle locking
properly. Another glaring fault in this test setup is that nothing is
(currently) testing the CPU’s debug port. Hence, I may choose to fix both
of these by adding additional CPU’s to my simulation, in such a way that one
master CPU controls and starts all others.
For now, let me note that I’ve enjoyed this approach so much that I’ve started using something similar on my commercial projects, and I’ve even ported a similar script to Vivado.
Bottom line: the approach works nicely, and I’m likely to use it again.
Judge me, O LORD; for I have walked in mine integrity: I have trusted also in the LORD; therefore I shall not slide. Examine me, O LORD, and prove me; try my reins and my heart. (Ps 26:1-2)