My Personal Journey in Verification

This week, I’ve been testing a CI/CD pipeline. This has been my opportunity to shake the screws and kick the tires on what should become a new verification product shortly.

I thought that a good design to check might be my SDIO project. It has roughly all the pieces in place, and so makes sense for an automated testing pipeline.

This weekend, the CI project engineer shared with me:

It’s literally the first time I get to know a good hardware project needs such many verifications and testings! There’s even a real SD card simulation model and RW test…

After reminiscing about this for a bit, I thought it might be worth taking a moment to tell how I got here.

Verification: The Goal

Perhaps the best way to explain the “goal” of verification is by way of an old “war story”–as we used to call them.

At one time, I was involved with a DOD unit whose whole goal and purpose was to build quick reaction hardware capabilities for the warfighter. We bragged about our ability to respond to a call on a Friday night with a new product shipped out on a C-130 before the weekend was over.

Anyone who has done engineering for a while will easily recognize that this sort of concept violates all the good principles of engineering. There’s no time for a requirements review. There’s no time for prototyping–or perhaps there is, to the extent that it’s always the prototype that heads out the door to the warfighter as if it were a product. There’s no time to build a complete test suite, to verify the new capability against all things that could go wrong. However, we’d often get only one chance to do this right.

Now, how do you accomplish quality engineering in that kind of environment?

The key to making this sort of shop work lay in the “warehouse”, and what sort of capabilities we might have “lying on the shelf” as we called it. Hence, we’d spend our time polishing prior capabilities, as well as anticipating new requirements. We’d then spend our time building, verifying, and testing these capabilities against phantom requirements, in the hopes that they’d be close to what we’d need to build should a real requirement arise. We’d then place these concept designs in the “warehouse”, and show them off to anyone who came to visit wondering what it was that our team was able to accomplish. Then, when a new requirement arose, we’d go into this “warehouse” and find whatever capability was closest to what the customer required and modify it to fit the mission requirement.

That was how we achieved success.

The same applies in digital logic design. You want to have a good set of tried, trusted, and true components in your “library” so that whenever a new customer comes along, you can leverage these components quickly to meet his needs. This is why I’ve often said that well written, well tested, well verified design components are gold in this business. Such components allow you to go from zero to product in short order. Indeed, the more well-tested components you have that you can reuse, the faster you’ll be to market with any new need, and the cheaper it will cost you to get there.

That’s therefore the ultimate goal: a library of reusable components that can be quickly composed into new products for customers.

As I’ve tried to achieve this objective over the years, my approach to component verification has changed, or rather grown, many times over.

Hardware Verification

When I first started learning FPGA design, I understood nothing about simulation. Rather than learning how to do simulation properly, I instead learned quickly how to test my designs in hardware. Most of these designs were DSP based. (My background was DSP, so this made sense …) Hence, the following approach tended to work for me:

I created access points in the hardware that allowed me to read and write registers at key locations within the design.
One of these “registers” I could write to controlled the inputs to my DSP pipeline.
Another register, when written to, would cause the design to “step” the entire DSP pipeline as if a new sample had just arrived from the A/D.
A set of registers within the design then allowed me to read the state of the entire pipeline, so I could do debugging.

This worked great for “stepping” through designs. When I moved to processing real-time information, such as the A/D results from the antenna connected to the design, I build an internal logic analyzer to catch and capture key signals along the way.

I called this “Hardware in the loop testing”.

Management thought I was a genius.

This approach worked … for a while. Then I started realizing how painful it was. I think the transition came when I was trying to debug my FFT by writing test vectors to an Arty A7 circuit board via UART, and reading the results back to display them on my screen. Even with the hardware in the loop, hitting all the test vectors was painfully slow.

Eventually, I had to search for a new and better solution. This was just too slow. Later on, I would start to realize that this solution didn’t catch enough bugs–but I’ll get to that in a bit.

Happy Path Simulation Testing

“Happy path” testing is a reference to simply testing working paths through a project’s environment. To use an aviation analogy, a “happy path” test might make sure the ground avoidance radar never alerted when you weren’t close to the ground. It doesn’t make certain that the radar necessarily does the right thing when you are close to the ground.

So, let’s talk about my next project: the ZipCPU.

Verification of the CPU began with an assembly program the ZipCPU would run. The program was designed to test all the instructions of the CPU with sufficient fidelity to know when/if the CPU worked.

The test had one of two outcomes. If the program halted, then the test was considered a success. If it detected an error, the CPU would execute a BUSY instruction (i.e. jump to current address) and then perpetually loop. My test harness could then detect this condition and end with a failing exit code.

When the ZipCPU acquired a software tool chain (GCC+Binutils) and C-library support, this assembly program was abandoned and replaced with a similar program in C. While I still use this program, it’s no longer the core of the ZipCPU’s verification suite. Instead, I tend to use it to shake out any bugs in any new environment the ZipCPU might be placed into.

This approach failed horribly, however, when I tried integrating an instruction cache into the ZipCPU. I built the instruction cache. I tested the instruction cache in isolation. I tested the cache as part of the CPU. I convinced myself that it worked. Then I placed my “working” design onto hardware and all hell broke lose.

This was certainly not “the way.”

Formal Verification

I was then asked to review a new, open source, formal verification tool called SymbiYosys. The tool handed my cocky attitude back to me, and took my pride down a couple steps. In particular, I found a bunch of bugs in a FIFO I had used for years. The bugs had never shown up in hardware testing (that I had noticed at least), and certainly hadn’t shown up in any of my “Happy path” testing. This left me wondering, how many other bugs did I have in my designs that I didn’t know about?

I then started working through my previous projects, formally verifying all my prior work. In every case, I found more bugs. By the time I got to the ZipCPU–I found a myriad of bugs in what I thought was a “working” CPU.

I’d like to say that the quality of my IP went up at this point. I was certainly finding a lot of bugs I’d never found before by using formal methods. I now knew, for example, how to guarantee I’d never have any more of those cache bugs I’d had before.

So, while it is likely that my IP quality was going up, the unfortunate reality was that I was still finding bugs in my “formally verified” IP–although not nearly as many.

A couple of improvements helped me move forward here.

Bidirectional formal property sets

The biggest danger in formal verification is that you might assume() something that isn’t true. The first way to limit this is to make sure you never assume() a property within the design, but rather you only assume() properties of inputs–never outputs, and never local registers.

But how do you know when you’ve assumed too much? This can be a challenge.

One of the best ways I’ve found to do this is to create a bidirectional property set. A bus master, for example, would make assumptions about how the slave would respond. A similar property set for the bus slave would make assumptions about what the master would do. Further, the slave would turn the master’s assumptions into verifiable assertions–guaranteeing that the master’s assumptions were valid. If you can use the same property set in this manner for both master and slave, save that you swap assumptions and assertions, then you can verify both in isolation to include only assuming those things that can be verified elsewhere.

Creating such property sets for both AXI-Lite and AXI led me to find many bugs in Xilinx IP. This alone suggested that I was on the “right path”.
Cover checking

I also learned to use formal coverage checking, in addition to straight assertion based verification. Cover checks weren’t the end all, but they could be useful in some key situations. For example, a quick cover check might help you discover that you had gotten the reset polarity wrong, and so all of your formal assertions were passing because your design was assumed to be held in reset. (This has happened to me more than once. Most recently, the cost was a couple of months delay on what should’ve otherwise been a straight forward hardware bringup–but that wasn’t really a formal verification issue.)

For a while, I also used cover checking to quickly discover (with minimal work) how a design component might work within a larger environment. I’ve since switched to simulation checking (with assertions enabled) for my most recent examples of this type of work, but I do still find it valuable.
Induction

Induction isn’t really a “new” thing I learned along the way, but it is worth mentioning specially. As I learned formal verification, I learned to use induction right from the start and so I’ve tended to use induction in every proof I’ve ever done. It’s just become my normal practice from day one.

Induction, however, takes a lot of work. Sometimes it takes so much work I wonder if there’s really any value in it. Then I tend to find some key bug or other–perhaps a buffer overflow or something–some bug I’d have never found without induction. That alone keeps me running induction every time I can. Even better, once the induction proof is complete, you can often trim the entire formal proof down from 15-20 minutes down to less than a single minute.
Contract checking

My initial formal proofs were haphazard. I’d throw assertions at the wall and see what I could find. Yes, I found bugs. However, I never really had the confidence that I was “proving” a design worked. That is, not until I learned of the idea of a “formal contract”. The “formal contract” simply describes the essence of how a component worked.

For example, in a memory system, the formal contract might have the solver track a single value of memory. When written to, the value should change. When read, the value should be returned. If this contract holds for all such memory addresses, then the memory acts (as you would expect) … like a memory.
Parameter checks

For a while, I was maintaining “ZBasic”–a basic ZipCPU distribution. This was where I did all my simulation based testing of the ZipCPU. The problem was, this approach didn’t work. Sure, I’d test the CPU in one configuration, get it to work, and then put it down believing the “CPU” worked. Some time later, I’d try the CPU in a different configuration–such as pipelined vs non-pipelined, and … it would fail in whatever mode it had not been tested in. The problem with the ZBasic approach is that it tended to only check one mode–leaving all of the others unchecked.

This lead me to adjust the proofs of the ZipCPU so that the CPU would at least be formally verified with as many parameter configurations as I could to make sure it would work in all environments.

I’ve written more about these parts of a proof some time ago, and I still stand by them today.

Yes, formal verification is hard work. However, a well verified design is highly valuable–on the shelf, waiting for that new customer requirement to come in.

The problem with all this formal verification work lies in its (well known) Achilles heel. Because formal verification includes an exhaustive combinatorial search for bugs across all potential design inputs and states, it can be computationally expensive. Yeah, it can take a while. To reduce this expense, it’s important to limit the scope of what is verified. As a result, I tend to verify design components rather than entire designs. This leaves open the possibility of a failure in the logic used to connect all these smaller, verified components together.

AutoFPGA and Better Crossbars

Sure enough, the next class of bugs I had to deal with were integration bugs.

I had to deal with several. Common bugs included:

Using unnamed ports, and connecting module ports to the wrong signals.

At one point, I decided the Wishbone “stall” port should come before the Wishbone acknowledgment port. Now, how many designs had to change to accommodate that?
I had a bunch of problems with my initial interconnect design methodology. Initially, I used the slave’s Wishbone strobe signal as an address decoding signal. I then had a bug where the address would move off of the slave of interest, and the acknowledgment was never returned. The result of that bug was that the design hung any time I tried to read the entirety of flash memory.

Think about how much simulation time and effort I had to go through to simulate reading an entire flash memory–just to find this bug at the end of it. Yes, it was painful.

Basically, when connecting otherwise “verified” modules together by hand, I had problems where the result wasn’t reliably working.

The first and most obvious solution to something like this is to use a linting tool, such as verilator -Wall. Verilator can find things like unconnected pins and such. That’s a help, but I had been doing that from early on.

My eventual solution was twofold. First, I redesigned my bus interconnect from the top to the bottom. You can find the new and redesigned interconnect components in my wb2axip repository. Once these components were verified, I then had a proper guarantee: all masters would get acknowledgments (or errors) from all slave requests they made. Errors would no longer be lost. Attempts to interact with a non-existent slave would (properly) return bus errors.

To deal with problems where signals were connected incorrectly, I built a tool I call AutoFPGA to connect components into designs. A special tag given to the tool would immediately connect all bus signals to a bus component–whether it be a slave or master, whether it be connected to a Wishbone, AXI-Lite, or AXI bus. This required that my slaves followed one of two conventions. Either all the bus ports had to follow a basic port ordering convention, or they needed to follow a bus naming convention. Ideally, a slave should follow both. Further, after finding even more port connection bugs, I’m slowly moving towards the practice of naming all of my port connections.

This works great for composing designs of bus components. Almost all of my designs now use this approach, and only a few (mostly test bench) designs remain where I connect bus components by hand manually.

MCY

At one time along the way, I was asked to review MCY: Mutation Coverage with Yosys. My review back to the team was … mixed.

MCY works by intentionally breaking your design. Such changes to the design are called “mutations”. The goal is to determine whether or not the mutated (broken) design will trigger a test failure. In this fashion, the test suite can be evaluated. A “good” test suite will be able to find any mutation. Hence, MCY allows you to measure how good your test suite is in the first place.

Upon request, I tried MCY with the ZipCPU. This turned into a bigger challenge than I had expected. Sure, MCY works with Icarus Verilog, Verilator, and even (perhaps) some other (not so open) simulators as well. However, when I ran a design under MCY, my simulations tended to find only a (rough) 70% of any mutations. The formal proofs, however, could find 95-98% of any mutations.

That’s good, right?

Well, not quite. The problem is that I tend to place all of my formal logic in the same file as the component that would be mutated. In order to keep the mutation engine from mutating the formal properties, I had to remove the formal properties from the file to be mutated into a separate file. Further, I then had to access the values that were to be assumed or asserted external from the file under test using something often known as “dot notation”. While (System)Verilog allows such descriptions natively, there weren’t any open source tools that allowed such external formal property descriptions. (Commercial tools allowed this, just not the open source SymbiYosys.) This left me stuck with a couple of unpleasant choices:

I could remove the ability of the ZipCPU (or whatever design) to be formally verified with Open Source tools,
I could give up on using induction,
I could use MCY with simulation only, or
I could choose to not use MCY at all.

This is why I don’t use MCY. It may be a “good” tool, but it’s just not for me.

What I did learn, however, was that my ZipCPU test suite was checking the CPU’s functionality nicely–just not the debugging port. Indeed, none of my tests checked the debugging port to the CPU at all. As a result, none of the (simulation-based) mutations of the debugging port were ever caught.

Lesson learned? My test suite still wasn’t good enough. Sure, the CPU might “work” today, but how would I know some change in the future wouldn’t break it?

I needed a better way of knowing whether or not my test suite was good enough.

Coverage Checking

Sometime during this process I discovered coverage checking. Coverage checking is a process of automatically watching over all of your simulation based tests to see which lines get executed and which do not. Depending on the tool, coverage checks can also tell whether particular signals are ever flipped or adjusted during simulation. A good coverage check, therefore, can provide some level of indication of whether or not all control paths within a design have been exercised, and whether or not all signals have been toggled.

Coverage metrics are actually kind of nice in this regard.

Sadly, coverage checking isn’t as good as mutation coverage, but … it’s better than nothing.

Consider a classic coverage failure: many of my simulations check for AXI backpressure. Such backpressure is generated when either BVALID && !BREADY, or RVALID && !RREADY. If your design is to follow the AXI specification, it should be able to handle backpressure properly. That is, if you hold !BREADY long enough, it should be possible to force !AWREADY and !WREADY. Likewise, it should be possible to hold RREADY low long enough that ARREADY gets held low. A well verified, bug-free design should be able to deal with these conditions.

However, a “good” design should never create any significant backpressure. Hence, if you build a simulation environment from “good” working components, you aren’t likely to see much backpressure. How then should a component’s backpressure capability be tested?

My current solution here is to test backpressure via formal methods, with the unfortunate consequence that some conditions will never get tested under simulation. The result is that I’ll never get to 100% coverage with this approach.

A second problem with coverage regards the unused signals. For example, AXI-Lite has two signals, AWPROT and ARPROT, that are rarely used by any of my designs. However, they are official AXI-Lite (and AXI) signals. As a result, AutoFPGA will always try to connect them to an AXI-Lite (or AXI) port, yet none of my designs use these. This leads to another set of exceptions that needs to be made when measuring coverage.

So, coverage metrics aren’t perfect. Still, they can help me find what parts of the design are (and are not) being tested well. This can then help feed into better (and more complete) test design.

That’s the good news. Now let’s talk about some of the not so good parts.

When learning formal verification, I spent some time formally verifying Xilinx IP. After finding several bugs, I spoke to a Xilinx executive regarding how they verified their IP. Did they use formal methods? No. Did they use their own AXI Verification IP? No. Yet, they were very proud of how well they had verified their IP. Specifically, their executive bragged about how good their coverage metrics were, and the number of test points checked for each IP.

Hmm.

So, let me get this straight: Xilinx IP gets good coverage metrics, and hits a large number of test points, yet still has bugs within it that I can find via formal methods?

Okay, so … how severe are these bugs? In one case, the bugs would totally break the AXI bus and bring the system containing the IP down to a screeching halt–if the bug were ever tripped. For example, if the system requested both a read burst and a write burst at the same time, one particular slave might accomplish the read burst with the length of the write burst–or vice versa. (It’s been a while, so I’d have to look up the details to be exact regarding them.) In another case dealing with a network controller, it was possible to receive a network packet, capture that packet correctly, and then return a corrupted packet simply because the AXI bus handshakes weren’t properly implemented. To this day this bugs have not been fixed, and it’s nearly five years later.

Put simply, if it is possible for an IP to lock up your system completely, then that IP shouldn’t be trusted until the bug is fixed.

How then did Xilinx manage to convince themselves that their IP was high quality? By “good” coverage metrics.

Lesson learned? Coverage checking is a good thing, and it can reveal holes in any simulation-based verification suite. It’s just not good enough on its own to find all of what you are missing.

My conclusion? Formal verification, followed by a simulation test suite that evaluates coverage statistics is something to pay attention to, but not the end all be-all. One tool isn’t enough. Many tools are required.

Self-Checking Testbenches

I then got involved with ASIC design.

ASIC design differs from FPGA design in a couple of ways. Chief among them is the fact that the ASIC design must work the first time. There’s little to no room for error.

Fig 1. A typical verification environment

When working with my first ASIC design, I was introduced to a more formalized simulation flow. Let me explain it this way, looking at Fig. 1. Designs tend to have two interfaces: a bus interface, together with a device I/O interface. A test script can then be used to drive some form of bus functional model, which will then control the design under test via its bus interface. A device model would then mimic the device the design was intended to talk to. When done well, the test script would evaluate the values returned by the design–after interacting with the device, and declare “success” or “failure”.

Here’s the key to this setup: I can run many different tests from this starting point by simply changing the test script and nothing else.

For example, let’s imagine an external memory controller. A “good” memory controller should be able to accept any bus request, convert it into I/O wires to interact with the external memory, and then return a response from the memory. Hence, it should be possible to first write to the external memory and then (later) read from the same external memory. Whatever is then read should match what was written previously. This is the minimum test case–measuring the “contract” with the memory.

Other test cases might evaluate this contract across all of the modes the memory supports. Still other cases might attempt to trigger all of the faults the design is supposed to be able to handle. The only difference between these many test cases would then be their test scripts. Again, you can measure whether or not the test cases are sufficient using coverage measures.

The key here is that all of the test cases must produce either a “pass” or “fail” result. That is, they must be self-checking. Now, using self checking test cases, I can verify (via simulation) something like the ZipCPU across all of its instructions, in SMP and single CPU environments, using the DMA (or not), and so forth. Indeed, the ZipCPU’s test environment takes this approach one step farther, by not just changing the test script (in this case a ZipCPU software program) but also the configuration of the test environment as well. This allows me to make sure the ZipCPU will continue to work in 32b, 64b, or even wider bus environments in a single test suite.

Yes, this was a problem I was having before I adopted this methodology: I’d test the ZipCPU with a 32b bus, and then deploy the ZipCPU to a board whose memory was 64b wide or wider. The Kimos project, for example, has a 512b bus. Now that I run test cases on multiple bus widths, I have the confidence that I can easily adjust the ZipCPU from one bus width to another.

This is now as far as I’ve now come in my verification journey. I now use formal tests, simulation tests, coverage checking, and a self-checking test suite on new design components. Is this perfect? No, but at least its more rigorous and repeatable than where I started from.

Next Steps: Software/Hardware interaction

The testing regiment discussed above continues to have a very large and significant hole: I can’t test software drivers very well.

Consider as an example my SD card controller. The repository actually contains three controllers: one for interacting with SD cards via their SPI interface, one via the SDIO interface, and a third for use with eMMC cards (using the SDIO interface). The repository contains formal proofs for all leaf modules, and two types of SD card models–a C++ model for SPI and all Verilog models for SDIO and eMMC.

This controller IP also contains a set of software drivers for use when working with SD cards. Ideally, these drivers should be tested together with the SD card controller(s), so they could be verified together.

Recently, for example, I added a DMA capability to the Wishbone version of the SDIO (and eMMC) controller(s). This (new) DMA capability then necessitated quite a few changes to the control software, so that it could take advantage of it. With no tests, how well do you think this software worked when I first tested it in hardware?

It didn’t.

So, for now, the software directory simply holds the software I will copy to other designs and test in actual hardware.

Fig 2. Software driven test bench

The problem is, testing the software directory requires many design components beyond just the SD card controllers that would be under test. It requires memory, a console port, a CPU, and the CPU’s tool chain–all in addition to the design under test. These extra components aren’t a part of the SD controller repository, nor perhaps should they be. How then should these software drivers be tested?

Necessity breeds invention, so I’m sure I’ll eventually solve this problem. This is just as far as I’ve gotten so far.

Automated testing

At any rate, I submitted this repository to an automated continuous integration facility the team I was working with was testing. The utility leans heavily on the existence of a variety of make test capabilities within the repository, and so the SD Card repository was a good fit for testing. Along the way, I needed some help from the test facility engineer to get SymbiYosys, IVerilog and Verilator capabilities installed. His response?

It’s literally the first time I get to know a good hardware project needs such many verifications and testings! There’s even a real SD card simulation model and RW test…

Yeah. Actually, there’s three SD card models–as discussed above. It’s been a long road to get to this point, and I’ve certainly learned a lot along the way.