While I was in Italy for ORCONF 2016 I started scribbling out a data cache for the ZipCPU. In many ways I had to. The ZipCPU compared nicely to many other soft-core CPU architectures in all but two important comparisons–the ZipCPU had neither data cache nor MMU.

That was almost two years ago.

Since that time, I rewrote that data cache draft many times over, but never managed to bring my effort to the finish line. The design was just never complete. Indeed, I never got so far as to write the bench test for it.

Two months ago, I tried again. This time, I had a trick up my sleeve that I never had before: I now understood how to formally describe a memory controller. Using Yosys, the design became much easier to build. I could move quickly from one fault to another, fixing one problem after another until I had a strong confidence that the design would work. No, I haven’t tested it on hardware yet, but I know that when I do it will be close to all it needs to be.

How much of the design did I evaluate? Using formal methods, I only ever needed to evaluate about 20 clocks worth of the design. I evaluated both the first 20 clocks of the design, as well as an arbitrary set of 20 clock periods somewhere in the middle of time. That was it. Put together, I now know this data cache will maintain it’s formal properties for all time.

This is in strong contrast to using a Bus Functional Model (BFM) to test whether a memory interface design works. Consider, for example, this Wishbone BFM by Andrew Mulcock and posted on OpenCores. It is built around a series of canned memory operations that can be applied to any Wishbone bus. It then uses random numbers and thousands (millions?) of randomly generated tests to verify that a design works.

Wouldn’t you rather just check two sets of twenty steps, and then know that your design will work for all time?

This is the benefit of formal methods.

When dealing with memories, though, there is a simple trick you need to know to make this happen. That trick will be the focus of this article. First, I’ll explain the trick in the next section, we’ll then discuss how the trick can be applied to caches, finally we’ll walk through the design of a block RAM controller connected to the Wishbone bus as an example of how this trick can be applied.

Formal Verifying Memory-like Components

Okay, so here’s the trick to formally verifying memory-like components: assume an arbitrary address, construct the data at that address, and then verify that transactions to/from this one arbitrary address all match the data at this address.

Yes, it is that simple–formally verifying one arbitrary address is sufficient to formally verify the entire memory space.

Let’s walk through this concept a bit more. We’ll start with an arbitrary address.

(* anyconst *)	wire	[AW-1:0]	f_addr;

Since we want the solver to pick any constant address, we’ll use Yosys’s anyconst attribute to capture this property.

We’re also going to need to start with the piece of data at that address. In this case, we’ll just declare f_data as a register and initialize it to the initial value of the memory.

reg	[DW-1:0]	f_data;

initial f_data = mem[f_addr];

Given these two values, we can now make our only assumption about them and our memory. After setting f_data to its initial value, we’ll then assert that these two values match for the rest of time.

always @(*)
	assert(mem[f_addr] == f_data);

We’ll also want to assert that any memory read from this address must also return f_data.

always @(posedge i_clk)
if ((f_past_valid)
		// A wishbone transaction
		&&($past(i_wb_stb))
		// Reading from our memory
		&&(!$past(i_wb_we))
		// At this address
		&&($past(i_wb_addr == f_addr)))
	assert(o_wb_data == f_data);

That’s all that’s required for read only memories.

For memories with a write capability we’ll also need to change our f_data value on any memory write.

always @(posedge i_clk)
// A wishbone transaction
if ((i_wb_stb)
		// writing to our memory
		&&(i_wb_we)
		// At this address
		&&(i_wb_addr == f_addr))
	// Then overwrite f_data
	f_data <= i_wb_data;

That’s all it takes to formally describe and verify a memory. Pretty simple, no?

Properties of a read-only cache

But what about a cache? I began this post by talking about implementing a cache for the ZipCPU. What does it take to formally verify a cache?

In the case of a read only cache, such as an instruction cache, the same basic principle applies. You will need to support three basic properties–roughly mirroring exactly those same basic properties that we just discussed above.

  1. On any read resulting from a cache miss for address f_addr from memory, f_data is the assumed result.

  2. If f_addr is currently a valid address within the cache, then you’ll need to assert that f_data is the cache value at that location.

  3. On any return from the cache where f_addr is the address of the value returned, then assert that f_data is result.

Remember the rule: assume inputs, assert local state and outputs.

Let’s look at an example of each of these properties.

The first property is to assume that the response from a bus transaction, reading from f_addr, must return f_data.

Having done this a couple of times, I often find it valuable to create a wire just to capture the logic of whether the current return is the return of interest. Not only does this logic simplify the assumption, but it also has the additional property of creating a value in the trace file, making it easier to trouble shoot what may be end up being very complex logic. In this case, let’s use f_this_return to represent if the f_addr value is the current one being returned from the bus.

wire	f_this_return;

What value should this wire have? Well, if you are incrementing o_wb_addr with every address, and if f_outstanding captures how many requests are outstanding, then the logic describing whether or not the current return from the bus is the f_addr value might look like:

assign	f_this_return = (o_wb_cyc)&&(i_wb_ack)
		&&(o_wb_addr-f_outstanding == f_addr);

Of course, your logic might vary with your needs in your own cache design.

Using this f_this_return value, we then want to assume that any response from the bus when f_this_return is true must equal f_data.

always @(*)
if (f_this_return)
	assume(i_wb_data == f_data);

That’s step one–assume that the input matches.

For the second property, we’ll need to assert that if our arbitrary address is found within the cache, then the cache value at that location must be f_data. You’ll need this assertion in order to get your design to pass induction.

This idea is a little bit more difficult to communicate–especially to someone who hasn’t dealt with caches before. (If the Lord is willing, we’ll come back and discuss how to build a cache like this later.)

In the two implementations that I’ve built, all direct mapped caches, I’ve split the address into three basic sections, all shown below in Fig 1. The bottom CS bits are used to describe a location within the cache memory, and the bottom LS of those bits are used to describe a location within a given cache line. I like to think of these values as the “cache size” and the “line size” respectively, even though they are really the log based two of those sizes.

The third component of the address is the “tag”. This is the top portion of the address, from LS on upwards.

You can see an address broken down into these separate components in Fig 1 below.

Fig 1. Components of a Cache Address

In this picture, you can see how the lowest two bits aren’t really part of the word address. These are the octet address bits, describing an octet’s position within a 32-bit data word. We’ll ignore these sub-word address bits at this stage, though, focusing only on word addresses.

Second, you can see how the lowest LS bits are the cache line address. These bits are not part of the tag, since all of the bits within a given cache line will share the same cache tag bits.

The cache itself is maintained as three arrays. The first, simply called cache below, uses the lower CS bits to address word sized memory values. This is the block RAM data structure containing the values within the cache itself. The second array is a bit vector of CS-LS bits called cache_line_valid. This tells you which cache lines have valid values within them. The third array is also indexed by the upper CS-LS bits of the CS bit cache address. This is an array of “tags”, called cache_tag below. Hence, any time 1) a cache line is valid, and 2) its tag matches f_addr, then 3) the value within the cache must match f_data as well.

Here’s how we’d express that.

always @(*)
	// If the cache line itself is valid
if ((cache_line_valid[f_addr[CS-LS-1:0])
	//
	// and if this valid cache line's tag matches the top
	// "tag" bits of our address
	&&(cache_tag[CS-LS-1:0]==f_addr[AW-1:LS]))
	//
	// then assert that this is the value in the cache
	assert(cache[f_addr[CS-1:0]] == f_data);

We’ll have to come back to this later and provide a proper description of how to build a cache. For now, this is the basics of how you would test first that f_addr is in the cache, and second that the value f_data is found at that location within the cache.

We now move on to the third formal property for read-only caches, that upon any return from the cache that comes from the f_addr address, the value returned should be f_data.

always @(*)
if ((output_valid)&&(output_address == f_addr))
	assert(output_word == f_data);

These three properties, the assumption regarding the input, the assertion regarding the internal state of the cache, and the assertion about the output, are all that is needed to formally describe the required functionality of any read-only cache. That said, you might need to adjust how you express these properties for your design, as not all caches have the same structure or the same register and memory names.

If you recall, we discussed these properties earlier–in our pipelined prefetch article. That article discussed a simple two-element rolling instruction cache design.

Properties of a read-write cache

What about a read-write cache, such as the ZipCPU’s (brand new) data cache that I referenced above? In this case, all of the properties of a read-only cache, listed above, still need to hold. In addition, we also need to require that f_data change upon any write as well.

always @(posedge i_clk)
if ((write_request)&&(write_address == f_addr))
	f_data <= write_value;

If your data value is octet addressable, and your word size is 32-bits, you’ll need to do a bit more work–but we’ll get to that below.

You may also need to be careful with the timing of this write–it will need to match the timing of any actual write to the cache memory or the second property will fail.

That’s basically it. These simple and basic properties are all that is needed to formally verify any type of memory–whether it be block RAM, an external SDRAM or flash device, a read-only cache or even a read-write cache. The approach is very versatile.

Block RAM

Just to drive this lesson home, let’s work through the example of a block RAM memory controller, using this same approach, and let’s verify this block RAM responds as it should.

If you’ve been building FPGA designs for long, you’ll know that creating a basic block RAM component, in Verilog consists of only a few lines of code.

First, we’ll need to declare our memory. In this declaration DW is the width of the data words contained in the memory, and AW is the number of bits required to address it.

	reg	[(DW-1):0]	mem	[0:((1<<AW)-1)];

Reading from memory is as simple as,

always @(posedge i_clk)
	o_wb_data <= mem[i_wb_addr];

and writing to memory is only a touch more difficult.

always @(posedge i_clk)
begin
	if ((i_wb_stb)&&(i_wb_we)&&(i_wb_sel[3]))
		mem[w_addr][31:24] <= w_data[31:24];

	if ((i_wb_stb)&&(i_wb_we)&&(i_wb_sel[2]))
		mem[w_addr][23:16] <= w_data[23:16];

	if ((i_wb_stb)&&(i_wb_we)&&(i_wb_sel[1]))
		mem[w_addr][15: 8] <= w_data[15:8];

	if ((i_wb_stb)&&(i_wb_we)&&(i_wb_sel[0]))
		mem[w_addr][ 7: 0] <= w_data[7:0];
end

Notice how we needed to check whether or not each individual octet was being written, and then only set that octet to the new value if so.

We’ve discussed both of these operations before.

I maintain a block RAM controller based upon these principles in the ZBasic repository. We’ll examine use this design for today’s discussion.

There are a couple of differences in this block RAM controller from what we’ve just discussed above, although most of them are fairly superficial. These come from the many years that I’ve worked with this controller. For example, on one high speed design I discovered I couldn’t get the request from a corner of the chip to the dedicated block RAM on-chip hardware in the center of the chip. I managed to solve this by adding an extra clock to memory, but made that extra clock optional–controlled by the parameter, EXTRACLOCK. Recently I added a preload option controlled by the parameter, HEXFILE, and even a ROM option, OPT_ROM should you ever wish to create a read-only memory instead of the more general purpose block RAM.

The basic controller that we just outlined above still remains within that design. Indeed, even with the little feature bloat discussed above, the design remains quite simple and very readable.

Formal Properties

So what does it take to verify this controller? We’ll skip through some of the bus properties and jump straight to the arbitrary address declaration.

	(* anyconst *)	wire	[(AW-1):0]	f_addr;
	reg	[31:0]		f_data;

These two values are assigned as we discussed above.

We then need to make certain that our RAM, at address f_addr, starts with the initial data found in `f_data.

	initial	assume(mem[f_addr] == f_data);

This guarantees that, should the memory be pre-loaded like a ROM, that the two values start out identical.

Next, just as we adjusted our memory value on any write, mem[f_addr], we also need to adjust our data value, f_data, upon any write to f_addr as well. The code below is just a touch more complicated then we presented above, simply because the code below only updates f_data if the ROM option, OPT_ROM, is clear.

	generate if (!OPT_ROM)
	begin : F_MATCH_WRITES

		always @(posedge i_clk)
		if ((w_wstb)&&(f_addr == w_addr))
		begin
			if (w_sel[3])
				f_data[31:24] <= w_data[31:24];
			if (w_sel[2])
				f_data[23:16] <= w_data[23:16];
			if (w_sel[1])
				f_data[15: 8] <= w_data[15: 8];
			if (w_sel[0])
				f_data[ 7: 0] <= w_data[ 7: 0];
		end
	end else begin

In the case of a ROM, things are just a touch different. First, we actually need to assign f_data to be mem[f_addr]. This piece caught me by surprise. It appears as though a value that isn’t assigned is given an unknown value, x, that can change on every clock tick. Hence, we make certain we assign it here.

Likewise, we assert that this value should never change–this is the ROM option after all.

		initial	f_data = mem[f_addr];

		always @(posedge i_clk)
		if (f_past_valid)
			assert(f_data == $past(f_data));

	end endgenerate

We conclude our proof by asserting that the memory at our our address f_addr, mem[f_addr], must always contain the value f_data.

	always @(*)
		assert(mem[f_addr] == f_data);
`endif
endmodule

Conclusion

Really, that’s all it takes–just a small number of assumptions and assertions and you can verify that any memory controller will return the right data given any address. Further, notice how these same properties can be applied to both instruction and data caches, to SDRAM and flash memory controllers, as well as our examples above. Indeed, it was the ease of describing how a data cache should work that made it so much easier to test when I finally finished building it two months ago.

Would you believe this approach works for MMU verification as well? We may have to come back and discuss that later as well. The approach really is very flexible!

Now that you know the basics, we can come back to this topic later and discuss how to build a basic instruction or data cache, such as the ZipCPU has. That article is still coming up.