Will formal methods ever find a bug in a working CPU?

Today, I’m starting what will likely be the slow process of formally verifying the ZipCPU. I’m going to use yosys for this process initially, but given SymbiYosys’s ability to run multiple formal configurations as separate tasks, I will probably need to switch after I get some distance along.

At this point in the game, I’ve already formally proven all of the components of the ZipCPU. What remains is to prove the CPU itself.

So here’s my question: How many bugs would you expect to find in a “working” soft-core CPU? One? Ten? Twenty? One hundred? Shall we count? I propose keeping a running log of the bugs I find in the ZipCPU while using formally methods. Perhaps this log will help to convince you the value of formally methods, perhaps not. Either way, I’ll keep it accurate to the information I discover.

The memory manager refused to release the bus if the response came back on the same cycle.
Certain errors should cause the CPU to halt–such as encountering a bus error when reading from instruction memory. Due to a (now locally fixed) bug in the ZipCPU, these errors would not cause the CPU to halt if they were one of the first couple instructions.
The ZipCPU is designed so that memory reads or writes never need to be rolled back. This is to facilitate using the memory bus for reading and writing peripherals, such as the serial port. Reads from such peripherals have side effects that cannot be rolled back. The formal methods however, discovered an example where reading into the program counter, such as during an indirect jump, my cause a following memory instruction to need to be rolled back.
The ZipCPU’s current approach to the compressed instruction set (CIS) is to prevent the CPU from interrupting in the middle of a CIS instruction. Formal methods, however, found an internal accounting bug which prevented this logic from working properly.
Several bugs have been found in the debug infrastructure, allowing an external debugger to change registers internal to the ZipCPU.

In one example, this was violating the rules the ZipCPU uses when issuing pipelined memory instructions.

In another example, during a divide that was to write its results into R0, the external debug infrastructure was allowed to write to a register, say R1. This would cause the divide to write its results into R1 upon completion, rather than R0.

The ZipCPU has therefore been adjusted so that the debug infrastructure can only modify the value of a register if the CPU has been fully halted.
The flag indicating that the CPU is fully halted was not properly implemented. Specifically, the CPU might declare it was halted in the middle of a CIS instruction when the second half of the instruction still needed to be flushed through the pipeline.
On a multi-cycle ALU operation (i.e. one of the multiplies), the flags were being set before the operation was complete
The clock-less version of the instruction decoder, used by the formal properties alone to know what to expect from an instruction, didn’t match the actual instruction decoder when it came to whether or not SIM instructions should cause an illegal interrupt exception
The divide wouldn’t wait for a multiply to complete before starting. Such a multiply might have provided the divide with its operand, but due to this bug the divide input would’ve been the prior value of the register.
Under certain circumstances in the pipeline, the break instruction would get bypassed.

I’m actually looking forward to simplifying this break logic by merging it with the illegal instruction logic, since the two do the same thing in different ways. I’m holding off on this for now, however.
Writes to the user PC register, such as by the debug port or even the ALU, while in supervisor (not user) mode, would corrupt any logic depending upon the user PC register that was already in the pipeline. Hence, the pipeline needs to be cleared following any such write.
After making several fixes, the memory instruction following an illegal instruction would be issued. This error isn’t so much a bug in the working CPU, but rather a bug in my pipeline logic rewrite.
Perhaps this would never happen. However, the memory controller wouldn’t stall when it ran out of internal memory to keep track of where the write back results would be written to.
On an early branch or an illegal instruction, the valid memory instruction flag was still being set, eventually violating the contract with the memory device.
The switch to interrupt flag wasn’t waiting for any ongoing memory operations to complete first
On a divide error, the memory unit would still issue an instruction. This creates a problem because a memory operation to a peripheral device cannot be withdrawn, as the CPU tries to switch to its exception handling code.
Under certain circumstances, the CPU would continue issuing instructions following a bus error or division by zero error.

At this point, I’m going to stop counting. It’s not because there are so many errors remaining, but rather because its no longer clear which of the remaining errors were originally in the ZipCPU and which ones were added while trying to fix the earlier bugs.

Bugs in the Formal Properties

The formal properties within the ZipCPU core, as well as the abstracted components, are still quite immature. Multiple bugs are still being found within them.
The Wishbone arbiter had a careless assumption within it. This prevented the arbiter from ever being fully tested and verified.
The ZipCPU recognizes two memory regions: a local region for CPU-specific peripherals from 0xff000000 to 0xffffffff (used by the ZipSystem, and a more global memory region (everything else). Crossing regions within the same memory operation is a fault that needs to be prevented in software, and the formal properties for describing this fault were assert() functions rather than restrict() functions.

Current status

I’m currently fully verifying the ZipCPU’s capability using an abstract prefetch, abstract divide, and an abstract multiply. Thanks to SymbiYosys and “abc pdr”, the pipeline properties I’ve now verified constitutes a full formal proof (BMC + induction) of the first set of formal properties for the ZipCPU. A second stage of properties have since been added, dependent upon the synthesizer define PHASE_ONE. The proof of this second set of properties remains ongoing.

First, the formal properties:

I’ve proven that the assumptions within all of the ZipCPU’s sub-modules hold.
I’ve manage to verify that the ZipCPU’s pipeline logic won’t accidentally “delete” an instruction in the pipeline.
I’ve also added the properties that the operands given to the ALU, memory, or divide unit, need to be valid.

This portion of the proof is ongoing.
Following any ALU or divide instruction, only the correct register will ever be written if at all–I haven’t dealt with predicates yet.
BREAK instructions never pass the OP stage as desired. This makes certain these instructions will not increment the PC when encountered. This is very similar to an illegal operand instruction, and the logic for these may yet be merged. For now, the BREAK instruction bypasses the ALU stage of the CPU as desired.

I haven’t gotten to the cover properties yet, those are yet to come. As a result, I may have carelessly assumed away a portion of the proof and can’t tell yet.

Here’s how far I’ve gotten:

Without pipelining the memory bus, the ZipCPU made it for a full 40 clocks in a bounded model check. This was the extent of the test.
With the pipelined memory bus, the ZipCPU has formally passed all of the formal properties now written for it
The updated formal code is currently in a git repository branch that hasn’t (yet) been pushed. Upon request I have pushed components of this proof into the dev branc, although I’d still be a bit embarrassed to present the (not quite working) CPU yet. This includes the abstract multiply mentioned above, as well as the multiply module that it replaces.

My apologies to all that have indicated an interest in seeing this code. Thank you for your interest. I guess it surprised me a bit. I’m just not quite ready yet to post the code–mostly because things aren’t fully working. I do anticipate fully sharing this code in due time. If you are really interested in seeing this before it is complete, then please send me an e-mail and I will consider placing it on gitlab where sponsors can have access to the current work in progress.

You can find my work in the dev branch of the github repository for the ZipCPU. Specifically, you can find the properties that I’m working with at the bottom of the core zipcpu.v Verilog file. I’m using a “make” file within the bench/formal directory to control the process.