Formally Verifying Memory and Cache Components
While I was in Italy for ORCONF 2016 I started scribbling out a data cache for the ZipCPU. In many ways I had to. The ZipCPU compared nicely to many other soft-core CPU architectures in all but two important comparisons–the ZipCPU had neither data cache nor MMU.
That was almost two years ago.
Since that time, I rewrote that data cache draft many times over, but never managed to bring my effort to the finish line. The design was just never complete. Indeed, I never got so far as to write the bench test for it.
Two months ago, I tried again. This time, I had a trick up my sleeve that I never had before: I now understood how to formally describe a memory controller. Using Yosys, the design became much easier to build. I could move quickly from one fault to another, fixing one problem after another until I had a strong confidence that the design would work. No, I haven’t tested it on hardware yet, but I know that when I do it will be close to all it needs to be.
How much of the design did I evaluate? Using formal methods, I only ever needed to evaluate about 20 clocks worth of the design. I evaluated both the first 20 clocks of the design, as well as an arbitrary set of 20 clock periods somewhere in the middle of time. That was it. Put together, I now know this data cache will maintain it’s formal properties for all time.
This is in strong contrast to using a Bus Functional Model (BFM) to test whether a memory interface design works. Consider, for example, this Wishbone BFM by Andrew Mulcock and posted on OpenCores. It is built around a series of canned memory operations that can be applied to any Wishbone bus. It then uses random numbers and thousands (millions?) of randomly generated tests to verify that a design works.
Wouldn’t you rather just check two sets of twenty steps, and then know that your design will work for all time?
This is the benefit of formal methods.
When dealing with memories, though, there is a simple trick you need to know to make this happen. That trick will be the focus of this article. First, I’ll explain the trick in the next section, we’ll then discuss how the trick can be applied to caches, finally we’ll walk through the design of a block RAM controller connected to the Wishbone bus as an example of how this trick can be applied.
Formal Verifying Memory-like Components
Okay, so here’s the trick to formally verifying memory-like components: assume an arbitrary address, construct the data at that address, and then verify that transactions to/from this one arbitrary address all match the data at this address.
Yes, it is that simple–formally verifying one arbitrary address is sufficient to formally verify the entire memory space.
Let’s walk through this concept a bit more. We’ll start with an arbitrary address.
Since we want the solver to pick any constant address, we’ll use
anyconst attribute to capture this property.
We’re also going to need to start with the piece of data at that address.
In this case, we’ll just declare
f_data as a register and initialize it
to the initial value of the memory.
Given these two values, we can now make our only assumption about them and
our memory. After setting
f_data to its initial value, we’ll then assert
that these two values match for the rest of time.
We’ll also want to assert that any memory read from this address must also
That’s all that’s required for read only memories.
For memories with a write capability we’ll also need to change
f_data value on any memory write.
That’s all it takes to formally describe and verify a memory. Pretty simple, no?
Properties of a read-only cache
But what about a cache? I began this post by talking about implementing a cache for the ZipCPU. What does it take to formally verify a cache?
In the case of a read only cache, such as an instruction cache, the same basic principle applies. You will need to support three basic properties–roughly mirroring exactly those same basic properties that we just discussed above.
On any read resulting from a cache miss for address
f_datais the assumed result.
f_addris currently a valid address within the cache, then you’ll need to assert that
f_datais the cache value at that location.
On any return from the cache where
f_addris the address of the value returned, then assert that
Remember the rule: assume inputs, assert local state and outputs.
Let’s look at an example of each of these properties.
The first property is to assume that the response from a bus transaction,
f_addr, must return
Having done this a couple of times, I often find it valuable to create a wire
just to capture the logic of whether the current return is the return of
interest. Not only does this logic simplify the assumption, but it also
has the additional property of creating a value in the trace
file, making it easier to trouble shoot what may be end up being very complex
logic. In this case, let’s use
f_this_return to represent if the
value is the current one being returned from the bus.
What value should this wire have? Well, if you are incrementing
with every address, and if
f_outstanding captures how many requests are
outstanding, then the logic describing whether or not the current return
from the bus is the
f_addr value might look like:
Of course, your logic might vary with your needs in your own cache design.
f_this_return value, we then want to assume that any response
from the bus when
f_this_return is true must equal
That’s step one–assume that the input matches.
For the second property, we’ll need to assert
that if our arbitrary address is found within the
value at that location must be
You’ll need this assertion in order to get your design to pass
This idea is a little bit more difficult to communicate–especially to someone who hasn’t dealt with caches before. (If the Lord is willing, we’ll come back and discuss how to build a cache like this later.)
In the two
implementations that I’ve built, all direct mapped
I’ve split the address into
three basic sections, all shown below in Fig 1. The
CS bits are used to describe a location within the
cache memory, and
LS of those bits are used to describe a location within a
line. I like to think of these values as the “cache size”
and the “line size” respectively, even though they are really the
log based two of those sizes.
The third component of the address is the “tag”. This is the top portion of the
LS on upwards.
You can see an address broken down into these separate components in Fig 1 below.
In this picture, you can see how the lowest two bits aren’t really part of the word address. These are the octet address bits, describing an octet’s position within a 32-bit data word. We’ll ignore these sub-word address bits at this stage, though, focusing only on word addresses.
Second, you can see how the lowest
LS bits are the
line address. These bits are not part of the tag, since all of the bits
within a given
line will share the same
cache tag bits.
itself is maintained as three arrays. The first, simply called
cache below, uses the lower
CS bits to address word sized memory values.
This is the block RAM data structure containing the values within the
The second array is a bit vector of
CS-LS bits called
This tells you which
lines have valid values within them.
The third array is also indexed by the upper
CS-LS bits of the
address. This is an array of “tags”, called
Hence, any time 1) a
line is valid, and 2) its tag matches
then 3) the value within the
f_data as well.
Here’s how we’d express that.
We’ll have to come back to this later and provide a proper description of
how to build a
now, this is the basics of how you would test
f_addr is in the
cache, and second that
f_data is found at that location within the
We now move on to the third
that upon any return from the
that comes from the
f_addr address, the value returned should be
These three properties, the assumption regarding the input, the assertion regarding the internal state of the cache, and the assertion about the output, are all that is needed to formally describe the required functionality of any read-only cache. That said, you might need to adjust how you express these properties for your design, as not all caches have the same structure or the same register and memory names.
If you recall, we discussed these properties earlier–in our pipelined prefetch article. That article discussed a simple two-element rolling instruction cache design.
Properties of a read-write cache
What about a read-write
such as the
ZipCPU’s (brand new)
that I referenced above? In this case, all of the properties of a read-only
listed above, still need to hold. In addition, we also need to require that
f_data change upon any write as well.
If your data value is octet addressable, and your word size is 32-bits, you’ll need to do a bit more work–but we’ll get to that below.
You may also need to be careful with the timing of this write–it will need to match the timing of any actual write to the cache memory or the second property will fail.
That’s basically it. These simple and basic properties are all that is needed to formally verify any type of memory–whether it be block RAM, an external SDRAM or flash device, a read-only cache or even a read-write cache. The approach is very versatile.
Just to drive this lesson home, let’s work through the example of a block RAM memory controller, using this same approach, and let’s verify this block RAM responds as it should.
If you’ve been building FPGA designs for long, you’ll know that creating a basic block RAM component, in Verilog consists of only a few lines of code.
First, we’ll need to declare our memory. In this declaration
DW is the
width of the data words contained in the memory, and
AW is the number of
bits required to address it.
Reading from memory is as simple as,
and writing to memory is only a touch more difficult.
Notice how we needed to check whether or not each individual octet was being written, and then only set that octet to the new value if so.
We’ve discussed both of these operations before.
I maintain a block RAM controller based upon these principles in the ZBasic repository. We’ll examine use this design for today’s discussion.
There are a couple of differences in this block RAM
what we’ve just discussed above, although most of them are fairly superficial.
These come from the many years that I’ve worked with this controller. For
example, on one high speed design I discovered I couldn’t get the request
from a corner of the chip to the dedicated block RAM on-chip hardware in
the center of the chip. I managed to solve this by adding an extra clock
to memory, but made that extra clock optional–controlled by the parameter,
EXTRACLOCK. Recently I added a preload option controlled by the parameter,
HEXFILE, and even a ROM option,
OPT_ROM should you ever wish to create a
read-only memory instead of the more general purpose block RAM.
The basic controller that we just outlined above still remains within that design. Indeed, even with the little feature bloat discussed above, the design remains quite simple and very readable.
So what does it take to verify this controller? We’ll skip through some of the bus properties and jump straight to the arbitrary address declaration.
These two values are assigned as we discussed above.
We then need to make certain that our RAM, at address
f_addr, starts with
the initial data found in `f_data.
This guarantees that, should the memory be pre-loaded like a ROM, that the two values start out identical.
Next, just as we adjusted our memory value on any write,
we also need to adjust our data value,
f_data, upon any write to
well. The code below is just a touch more complicated then we presented above,
simply because the code below only updates
f_data if the ROM option,
OPT_ROM, is clear.
In the case of a ROM, things are just a touch different.
First, we actually need to assign
f_data to be
This piece caught me by surprise. It appears as though a value that isn’t
assigned is given an unknown value,
x, that can change on every clock tick.
Hence, we make certain we assign it here.
Likewise, we assert that this value should never change–this is the ROM option after all.
We conclude our proof by asserting that the memory at our our address
mem[f_addr], must always contain the value
Really, that’s all it takes–just a small number of assumptions and assertions and you can verify that any memory controller will return the right data given any address. Further, notice how these same properties can be applied to both instruction and data caches, to SDRAM and flash memory controllers, as well as our examples above. Indeed, it was the ease of describing how a data cache should work that made it so much easier to test when I finally finished building it two months ago.
Would you believe this approach works for MMU verification as well? We may have to come back and discuss that later as well. The approach really is very flexible!
Now that you know the basics, we can come back to this topic later and discuss how to build a basic instruction or data cache, such as the ZipCPU has. That article is still coming up.
Then will I remember my covenant with Jacob, and also my covenant with Isaac, and also my covenant with Abraham will I remember; and I will remember the land. (Lev 26:42)