Formally Verifying Memory and Cache Components
While I was in Italy for ORCONF 2016 I started scribbling out a data cache for the ZipCPU. In many ways I had to. The ZipCPU compared nicely to many other soft-core CPU architectures in all but two important comparisons–the ZipCPU had neither data cache nor MMU.
That was almost two years ago.
Since that time, I rewrote that data cache draft many times over, but never managed to bring my effort to the finish line. The design was just never complete. Indeed, I never got so far as to write the bench test for it.
Two months ago, I tried again. This time, I had a trick up my sleeve that I never had before: I now understood how to formally describe a memory controller. Using Yosys, the design became much easier to build. I could move quickly from one fault to another, fixing one problem after another until I had a strong confidence that the design would work. No, I haven’t tested it on hardware yet, but I know that when I do it will be close to all it needs to be.
How much of the design did I evaluate? Using formal methods, I only ever needed to evaluate about 20 clocks worth of the design. I evaluated both the first 20 clocks of the design, as well as an arbitrary set of 20 clock periods somewhere in the middle of time. That was it. Put together, I now know this data cache will maintain it’s formal properties for all time.
This is in strong contrast to using a Bus Functional Model (BFM) to test whether a memory interface design works. Consider, for example, this Wishbone BFM by Andrew Mulcock and posted on OpenCores. It is built around a series of canned memory operations that can be applied to any Wishbone bus. It then uses random numbers and thousands (millions?) of randomly generated tests to verify that a design works.
Wouldn’t you rather just check two sets of twenty steps, and then know that your design will work for all time?
This is the benefit of formal methods.
When dealing with memories, though, there is a simple trick you need to know to make this happen. That trick will be the focus of this article. First, I’ll explain the trick in the next section, we’ll then discuss how the trick can be applied to caches, finally we’ll walk through the design of a block RAM controller connected to the Wishbone bus as an example of how this trick can be applied.
Formal Verifying Memory-like Components
Okay, so here’s the trick to formally verifying memory-like components: assume an arbitrary address, construct the data at that address, and then verify that transactions to/from this one arbitrary address all match the data at this address.
Yes, it is that simple–formally verifying one arbitrary address is sufficient to formally verify the entire memory space.
Let’s walk through this concept a bit more. We’ll start with an arbitrary address.
Since we want the solver to pick any constant address, we’ll use
Yosys’s
anyconst
attribute to capture this property.
We’re also going to need to start with the piece of data at that address.
In this case, we’ll just declare f_data
as a register and initialize it
to the initial value of the memory.
Given these two values, we can now make our only assumption about them and
our memory. After setting f_data
to its initial value, we’ll then assert
that these two values match for the rest of time.
We’ll also want to assert that any memory read from this address must also
return f_data
.
That’s all that’s required for read only memories.
For memories with a write capability we’ll also need to change
our f_data
value on any memory write.
That’s all it takes to formally describe and verify a memory. Pretty simple, no?
Properties of a read-only cache
But what about a cache? I began this post by talking about implementing a cache for the ZipCPU. What does it take to formally verify a cache?
In the case of a read only cache, such as an instruction cache, the same basic principle applies. You will need to support three basic properties–roughly mirroring exactly those same basic properties that we just discussed above.
-
On any read resulting from a cache miss for address
f_addr
from memory,f_data
is the assumed result. -
If
f_addr
is currently a valid address within the cache, then you’ll need to assert thatf_data
is the cache value at that location. -
On any return from the cache where
f_addr
is the address of the value returned, then assert thatf_data
is result.
Remember the rule: assume inputs, assert local state and outputs.
Let’s look at an example of each of these properties.
The first property is to assume that the response from a bus transaction,
reading from f_addr
, must return f_data
.
Having done this a couple of times, I often find it valuable to create a wire
just to capture the logic of whether the current return is the return of
interest. Not only does this logic simplify the assumption, but it also
has the additional property of creating a value in the trace
file, making it easier to trouble shoot what may be end up being very complex
logic. In this case, let’s use f_this_return
to represent if the f_addr
value is the current one being returned from the bus.
What value should this wire have? Well, if you are incrementing o_wb_addr
with every address, and if f_outstanding
captures how many requests are
outstanding, then the logic describing whether or not the current return
from the bus is the f_addr
value might look like:
Of course, your logic might vary with your needs in your own cache design.
Using this f_this_return
value, we then want to assume that any response
from the bus when f_this_return
is true must equal f_data
.
That’s step one–assume that the input matches.
For the second property, we’ll need to assert
that if our arbitrary address is found within the
cache,
then the
cache
value at that location must be f_data
.
You’ll need this assertion in order to get your design to pass
induction.
This idea is a little bit more difficult to communicate–especially to someone who hasn’t dealt with caches before. (If the Lord is willing, we’ll come back and discuss how to build a cache like this later.)
In the two
implementations that I’ve built, all direct mapped
caches,
I’ve split the address into
three basic sections, all shown below in Fig 1. The
bottom CS
bits are used to describe a location within the
cache memory, and
the bottom LS
of those bits are used to describe a location within a
given
cache
line. I like to think of these values as the “cache size”
and the “line size” respectively, even though they are really the
log based two of those sizes.
The third component of the address is the “tag”. This is the top portion of the
address, from LS
on upwards.
You can see an address broken down into these separate components in Fig 1 below.
In this picture, you can see how the lowest two bits aren’t really part of the word address. These are the octet address bits, describing an octet’s position within a 32-bit data word. We’ll ignore these sub-word address bits at this stage, though, focusing only on word addresses.
Second, you can see how the lowest LS
bits are the
cache
line address. These bits are not part of the tag, since all of the bits
within a given
cache
line will share the same
cache tag bits.
The cache
itself is maintained as three arrays. The first, simply called
cache
below, uses the lower CS
bits to address word sized memory values.
This is the block RAM data structure containing the values within the
cache itself.
The second array is a bit vector of CS-LS
bits called cache_line_valid
.
This tells you which
cache
lines have valid values within them.
The third array is also indexed by the upper CS-LS
bits of the CS
bit
cache
address. This is an array of “tags”, called cache_tag
below.
Hence, any time 1) a
cache
line is valid, and 2) its tag matches f_addr
,
then 3) the value within the
cache
must match f_data
as well.
Here’s how we’d express that.
We’ll have to come back to this later and provide a proper description of
how to build a
cache. For
now, this is the basics of how you would test
first that f_addr
is in the
cache, and second that
the value f_data
is found at that location within the
cache.
We now move on to the third
formal property
for read-only
caches,
that upon any return from the
cache
that comes from the f_addr
address, the value returned should be f_data
.
These three properties, the assumption regarding the input, the assertion regarding the internal state of the cache, and the assertion about the output, are all that is needed to formally describe the required functionality of any read-only cache. That said, you might need to adjust how you express these properties for your design, as not all caches have the same structure or the same register and memory names.
If you recall, we discussed these properties earlier–in our pipelined prefetch article. That article discussed a simple two-element rolling instruction cache design.
Properties of a read-write cache
What about a read-write
cache,
such as the
ZipCPU’s (brand new)
data cache
that I referenced above? In this case, all of the properties of a read-only
cache,
listed above, still need to hold. In addition, we also need to require that
f_data
change upon any write as well.
If your data value is octet addressable, and your word size is 32-bits, you’ll need to do a bit more work–but we’ll get to that below.
You may also need to be careful with the timing of this write–it will need to match the timing of any actual write to the cache memory or the second property will fail.
That’s basically it. These simple and basic properties are all that is needed to formally verify any type of memory–whether it be block RAM, an external SDRAM or flash device, a read-only cache or even a read-write cache. The approach is very versatile.
Block RAM
Just to drive this lesson home, let’s work through the example of a block RAM memory controller, using this same approach, and let’s verify this block RAM responds as it should.
If you’ve been building FPGA designs for long, you’ll know that creating a basic block RAM component, in Verilog consists of only a few lines of code.
First, we’ll need to declare our memory. In this declaration DW
is the
width of the data words contained in the memory, and AW
is the number of
bits required to address it.
Reading from memory is as simple as,
and writing to memory is only a touch more difficult.
Notice how we needed to check whether or not each individual octet was being written, and then only set that octet to the new value if so.
We’ve discussed both of these operations before.
I maintain a block RAM controller based upon these principles in the ZBasic repository. We’ll examine use this design for today’s discussion.
There are a couple of differences in this block RAM
controller from
what we’ve just discussed above, although most of them are fairly superficial.
These come from the many years that I’ve worked with this controller. For
example, on one high speed design I discovered I couldn’t get the request
from a corner of the chip to the dedicated block RAM on-chip hardware in
the center of the chip. I managed to solve this by adding an extra clock
to memory, but made that extra clock optional–controlled by the parameter,
EXTRACLOCK
. Recently I added a preload option controlled by the parameter,
HEXFILE
, and even a ROM option, OPT_ROM
should you ever wish to create a
read-only memory instead of the more general purpose block RAM.
The basic controller that we just outlined above still remains within that design. Indeed, even with the little feature bloat discussed above, the design remains quite simple and very readable.
Formal Properties
So what does it take to verify this controller? We’ll skip through some of the bus properties and jump straight to the arbitrary address declaration.
These two values are assigned as we discussed above.
We then need to make certain that our RAM, at address f_addr
, starts with
the initial data found in `f_data.
This guarantees that, should the memory be pre-loaded like a ROM, that the two values start out identical.
Next, just as we adjusted our memory value on any write, mem[f_addr]
,
we also need to adjust our data value, f_data
, upon any write to f_addr
as
well. The code below is just a touch more complicated then we presented above,
simply because the code below only updates f_data
if the ROM option,
OPT_ROM
, is clear.
In the case of a ROM, things are just a touch different.
First, we actually need to assign f_data
to be mem[f_addr]
.
This piece caught me by surprise. It appears as though a value that isn’t
assigned is given an unknown value, x
, that can change on every clock tick.
Hence, we make certain we assign it here.
Likewise, we assert that this value should never change–this is the ROM option after all.
We conclude our proof by asserting that the memory at our our address f_addr
,
mem[f_addr]
, must always contain the value f_data
.
Conclusion
Really, that’s all it takes–just a small number of assumptions and assertions and you can verify that any memory controller will return the right data given any address. Further, notice how these same properties can be applied to both instruction and data caches, to SDRAM and flash memory controllers, as well as our examples above. Indeed, it was the ease of describing how a data cache should work that made it so much easier to test when I finally finished building it two months ago.
Would you believe this approach works for MMU verification as well? We may have to come back and discuss that later as well. The approach really is very flexible!
Now that you know the basics, we can come back to this topic later and discuss how to build a basic instruction or data cache, such as the ZipCPU has. That article is still coming up.
Then will I remember my covenant with Jacob, and also my covenant with Isaac, and also my covenant with Abraham will I remember; and I will remember the land. (Lev 26:42)