Creating a Simple AXI-Lite Master for the Hexbus

This post continues our series over the last three years looking into AXI and AXI-lite interface design. Normally I’d take a moment to recount all of the various articles in a series as background to any new article, but if you check the topics page, you’ll see I’ve now written over 25 AXI articles. These include a discussion on how to build an AXI-lite slave, a high performance AXI (full) slave, how to debug an AXI stream based design, and even how to build both an AXI-lite master as well as how to modify a general purpose AXI-lite master for AXI (full) performance–to include exclusive access but not burst performance.

Today, let’s look into extending our debugging bus design with an AXI-lite back end.

If you’ve followed my blog from the beginning, you might remember that I’ve spent quite a bit of time discussing what I call a debugging bus early on. As I use the term, a debugging bus is a way of accessing the bus within a logic design from a remote host. Typically, I do this over a serial port, b sending special commands to the design, although I have transport systems that will work well for both SPI and JTAG as well. The design then decodes the various characters sent across the link into bus read or bus write requests, issues the requests of the internal bus, and then returns the results.

Fig 1. How a debugging bus fits into a larger system

Why would you ever want to do something like this? Wouldn’t it make more sense to just issue the commands from a soft-core CPU within the design? Well, there are actually a lot of reasons why you might want to use a debugging bus. For example …

You might be building a CPU. Until that CPU works, a debugging bus can give you a strong confidence that the rest of the design works. You can even use the debugging bus to pre-load the flash or RAM for your CPU before releasing the design from reset.
This applies just as much to all of those Vendor CPU’s as it does to your own homebrew CPU. Once you place that CPU into the FPGA, you lose almost all insight into what’s going on within the FPGA. For example, what if your CPU wasn’t getting the interrupt you were expecting? Well, why not go and just query the interrupt controller on the bus to see what’s going on?
When working with an external piece of hardware, and until you have that hardware “under control”–where your design interacts with it properly and they way you expect it to–sometimes you have to work with things to figure out what’s going right (or wrong) with the interface. A good example might be my work in a Quad SPI flash controller. Being able to explore “what-if” scenarios from a command line can be quite powerful. (What if I have the timing delay messed up, and it needs to be three clock cycles instead of four? Let’s try that …) By using a debugging bus to find and fix problems, you won’t need to take the time to rebuild your FPGA design until you know what was going wrong with the current design.

Indeed, and as an example, someone recently tried out my Quad SPI flash controller. So far, he tells me that the controller works as long as he only uses the memory mapped I/O port. However, without being able to shut the CPU down and run ad-hoc queries, he’s been struggling to figure out why the flash won’t handle his arbitrary access commands. A proper debugging bus interface will help this individual.
You’ve seen me discuss how a debugging bus could be used to debug a signal processing chain by inspecting histograms or even taking spectral estimates of what’s going on within that chain. All this can be done from an external computer via commands sent over a debugging bus.
Of course there’s also my own favorite use for the bus: getting access to a bus-based internal logic analyzer, such as my Wishbone Scope. (Don’t get hung up on the term “Wishbone”. Yes, there is now an AXI-lite version of it, and even a virtual AXI (full) version which can use SDRAM as a back end.)

Such a bus-based scope capability requires you to have access to your design from an external location. If you can get access to the design externally, you can then command the scope, adjust the window location with respect to the trigger, and then read back the results to tell you what’s going on within the design–even potentially after the CPU has locked up.

Once you do start using that soft-core CPU of yours within the design, you can then also script the logic analyzer from within the soft-core CPU’s software to capture according to whatever your software is doing by just writing to the bus. Indeed, I’ve been known to do that with my CPU test script, to provide me with a trace should any individual CPU test fail–but let’s not get ahead of ourselves today.

In the military, we might say that such a “debugging bus” gives you the ability to “command and control” your design. You can also use it to get “telemetry“-like data back from a running design. Okay, the analogy doesn’t quite work–Telemetry is a “push”-based system, always broadcasting information to listeners, whereas a debugging bus requires a bus master to “pull” any desired information–but it’s still a matter of getting debugging information from within the system under test. Perhaps a better analogy might be “micromanaging” an interaction, but we won’t push a bad metaphor quite so far.

The Hexbus Design

Fig 2. An overview of the parts and pieces in my standard debugging bus

When we started talking about a debugging bus, I offered an overview of the debugging bus I’d been using in my own designs–one I’ve called my “wbubus” since it offers a “Wishbone to UART” conversion. Data would come in, get decoded–possibly even decompressed, head into a FIFO, and then from there commands would issue to the bus. Results would then get formed from the bus executor and sent into a FIFO, from whence they would be mixed with an interrupt or idle signal, compressed, and then recoded back into bytes that could be sent back across the serial port. We then built, together, a second debugging bus design that I called the “hexbus” design since it is designed around a simpler hexadecimal encoding. You can see the block diagram for this “hexbus” in Fig. 3 below.

Fig 3. The hexbus design

That “hexbus” was meant to be a demonstration only design–showing you how it might be done. It was built around a very simple hexadecimal encoding that could just about be read and debugged manually. Together, then, we walked through all the pieces of it from converting the incoming characters into 34-bit command words, issuing those commands across the bus, and then recoding the 34-bit command results back for transmission across the serial port. My intention, however, was always to throw away the “hexbus” implementation when I was done. It was only meant to be a demonstration design after all.

Fig 4. Could the hexbus protocol drive an AXI bus?

That was until I tried working with an iCE40. No matter how hard I tried, I couldn’t seem to fit my full featured wbubus debugging bus onto an iCE40 HX8K together with the ZipCPU. The two just wouldn’t fit in the same design at the same time. The “hexbus” on the other hand was simple enough to fit. Using the hexbus for debugging, the entire design, CPU + hexbus, currently fits in 4,659 LUTs–small enough that I could probably go back and retrofit it with the wbubus now. It’s not the smallest iCE40 design, but debugging it isn’t all that hard. In other words, this throw-away design has now been well loved and well used.

For today, however, the key detail is that the “hexbus” design has always been fundamentally a Wishbone design.

What if we wanted to give it an AXI-lite capability instead? This will be the topic of today’s article.

The Hexbus Code

Fig 5. Hexbus line protocol

Just to review, there are a couple basic commands to the hexbus encoding, as illustrated in Fig. 5 on the right. The address can be set for following transactions by sending an “A” followed by up to 8 lower case hexadecimal characters. A read request consists of a simple solitary “R”, whereas a write request starts with the letter “W” and is then followed by the hexadecimal value to be written. Further, I chose to use white space characters as command separators or synchronization characters if and when needed. Hence, both address and write commands can end with a white space character. They can also end with any other non-hex character, such as the beginning of the next command.

Fig 6. Internal command protocol

By the time these commands arrive at our new AXI-lite bus master, they are bundled into 34-bit words as shown in Fig. 6 on the left. Commands are determined by the first two bits of those 34-bit words. 2'b00 prefixes a read request, 2'b01 a write request, 2'b10 a set address request, and 2'b11 is either a reset request (handled earlier) or a don’t care.

The commands themselves arrive via a basic stream protocol, as shown in Fig. 7 below. Once the bus command is complete, a response is then generated and sent via a similar stream protocol to the next block in the processing chain–the difference being that there’s no back pressure on the outgoing responses.

Fig 7. AXI-lite Controller Block Diagram

One of the challenges, and indeed vulnerabilities, associated with the hexbus design is that there are no FIFO’s anywhere in the hexbus protocol. Remember, this protocol is designed to be simple, and to fit on really small hardware. This means that the stream protocol and handshakes shown above in Fig. 7 are a misnomer: hexbus can’t handle overflow anywhere in its processing. The downstream processor must be ready to accept any response value provided. The upstream source can only delay values by one or two clock cycles at the most. Further, it is the responsibility of the host software, not the RTL, to guarantee that there are no overflows in actual operation.

This block diagram in Fig. 7, together with the command protocol shown in Fig. 6 above, is where we’ll start today’s design discussion from.

Building the AXI-Lite Bus Master

The key feature of this AXI-lite master that we’ll be discussing today is not so much that it’s implemented internally as a state machine, but rather that we’ll encode our current state in the AXI-lite signals themselves: On any write request, we’ll set AWVALID, WVALID, and BREADY and then hold BREADY high until the BVALID acknowledgment. Likewise, on a read request, we’ll set ARVALID and RREADY and then hold RREADY high until we receive our RVALID response. The “Idle” state will therefore be encoded as !BREADY && !RREADY.

We’ll expect one of two paths from idle back to idle, as shown in Fig. 8 below.

Fig 8. AXI-lite controller state diagram

Let’s start by decoding our incoming command. We have three possible values that can come into our core that we need to worry about, as shown in Fig. 6 above. (Reset is handled elsewhere in the stack.) Either we want to process an address, a read command, or a write command. From this we can create one of three flags, with two caveats. First, if the incoming strobe (valid) bit is low, then there’s no command ready at the input, and second, if we are still busy with the last command, then we can also ignore any incoming requests.

	localparam [1:0]	CMD_SUB_RD = 2'b00,
				CMD_SUB_WR =	2'b01,
				CMD_SUB_ADDR =	2'b10,
				CMD_SUB_SPECIAL=2'b11;

	always @(*)
	begin
		i_cmd_addr = (i_cmd_word[33:32] == CMD_SUB_ADDR);
		i_cmd_rd   = (i_cmd_word[33:32] == CMD_SUB_RD);
		i_cmd_wr   = (i_cmd_word[33:32] == CMD_SUB_WR);

		if (!i_cmd_stb || o_cmd_busy)
			{ i_cmd_addr, i_cmd_rd, i_cmd_wr } = 3'h0;
	end

This should be familiar as your basic VALID/!READY handshake that we’ve discussed often enough before. The difference here is that this custom protocol doesn’t require that the ready logic be registered, so there’s no protocol requirement for any skidbuffers.

Now we can start figuring out how to process these commands.

Address Processing

The first, and perhaps easiest, command to handle is the address command. If ever we receive an address word, we’ll want to set the bus address. Then, later, when we receive an actual read or write command we’ll acknowledge the address back across the channel. That means we’re going to need to keep track of the current bus address, as well as whether or not we want to acknowledge a new address.

	initial	M_AXI_AWADDR = 0;
	initial	newaddr = 1;
	always @(posedge i_clk)
	begin

So let’s break down, now, how we’ll handle a new address command. In general, we’ll just set our outgoing address word.

		if (i_cmd_addr)
			M_AXI_AWADDR <= { i_cmd_word[AW+1:2], 2'b00 };

Well, not quite. As it turns out, that’s a nice first pass, but we can do better with just a touch of compression. Let’s use the two lower (unused) address bits as a compression scheme, as illustrated in Fig. 6 above: one bit will indicate an address difference, whereas the second bit will indicate whether or not we increment addresses between commands.

First, bit 1. If bit 1 is set we’ll allow that this command word encodes a difference and we’ll adjust our address by this difference. Otherwise we’ll set it as above.

			if (!i_cmd_word[1])
				// New address
				M_AXI_AWADDR <= { i_cmd_word[AW+1:2], 2'b00 };
			else
				// Difference address
				M_AXI_AWADDR <= { i_cmd_word[AW+1:2], 2'b00 }
						+ M_AXI_AWADDR;

Synchronizing the initial address will be a task of the software address encoder: the first address given to the hexbus will never be a difference address, whereas difference addresses may be used for subsequent address requests if they reduce the number of bytes that need to be transmitted for any new address.

Bit 0 on the other hand will be an increment indicator. If we leave it zero, then we’ll naturally increment our address from one request to the next. Otherwise, if one, we won’t adjust it from one request to the next at all. Either way, that means we’ll need to store this value away for later.

			inc <= !i_cmd_word[0];

This also means that you can set an address by hand and have the core mostly just “do the right thing.”

We can also use a flag, newaddr, to indicate that the next results from the bus will be the result of reading or writing to this new address.

			newaddr <= 1;

Now, any time an address word gets accepted by the bus, we’ll increment the address if this increment bit is set, or otherwise just leave it the same.

		end else begin
			if ((M_AXI_AWVALID && M_AXI_AWREADY)
					||(M_AXI_ARVALID && M_AXI_ARREADY))
				M_AXI_AWADDR[AW+1:2]<= M_AXI_AWADDR[AW+1:2]+(inc ? 1:0);

Likewise, whenever we get a new read or write command that will use this new address, then we’ll then send a copy of the address over the link at the same time we issue the bus command. That means we can clear our new address flag at that time as well.

			if (i_cmd_rd || i_cmd_wr)
				newaddr <= 0;
		end
	end

We can also use the same logic for the read address, and so just copy the read address value from the write address register.

	assign	M_AXI_ARADDR = M_AXI_AWADDR;

Sometime later, we’re going to need to come back to this and make certain that, upon a read or write command, the address response gets sent back across the bus. We can make a mental note of that to ourselves now by simply adding a formal property to our design:

	always @(posedge i_clk)
	if ((f_past_valid)&&(!$past(i_reset))&&($past(i_cmd_rd || i_cmd_wr)))
	begin
		// A new address should generate a response headed downstream
		`ASSERT(o_rsp_stb  == $past(newaddr));

		// That response should include ... the new address
		if ($past(newaddr))
			`ASSERT(o_rsp_word == { RSP_SUB_ADDR,
				$past(M_AXI_AWADDR[AW+1:1]), !$past(inc) });
	end

This simply states that the first step of processing any read or write command, that is on the first clock following i_cmd_rd || i_cmd_wr, we must acknowledge any new/updated address–but only if the address had been changed since the last read or write command.

Write Processing

The next step is write processing. If you’ve never built an AXI master before, this will be easier than you think. Indeed, the way we’ll build this below it’ll be really easy. We’ll control the valid signals, the write data, and then return an acknowledgment on success or failure. Oh, one more thing–we’ll set BREADY to note that we are no longer idle, and now expecting a BVALID response.

First, we clear everything on reset. This is a necessity. AXI requires a reset, so let’s make certain we implement it here.

	initial	M_AXI_AWVALID = 0;
	initial	M_AXI_WVALID = 0;
	initial	M_AXI_BREADY = 0;
	always @(posedge i_clk)
	if (i_reset)
	begin
		M_AXI_AWVALID <= 0;
		M_AXI_WVALID  <= 0;
		M_AXI_BREADY <= 0;

The next step is going to look a bit backwards. Chronologically we’d set AWVALID && WVALID && BREADY on any write command. I’m instead going to start with the last half of the operation, and say that if we are waiting on a write response then …

We should stop waiting if/when we get that response.

	end else if (M_AXI_BREADY)
	begin
		if (M_AXI_BVALID)
			M_AXI_BREADY <= 0;

AWVALID and WVALID should also each be cleared independently when their respective xREADY signals go high.

		if (M_AXI_AWREADY)
			M_AXI_AWVALID <= 0;
		if (M_AXI_WREADY)
			M_AXI_WVALID <= 0;

This is really the biggest gotcha of building an AXI4-lite interface: the write address and write data channels aren’t synchronized at all. Sure, we’ll synchronize them both to start of this transaction, but either one of these two channels may get accepted before the other. This is captured by the fact that both of these signals are handled in the same logic block, although in separate if statements.

That’s the end of processing the burst. Seriously? Yeah, it really is that easy. No, we haven’t gotten to the write data yet–but that’s even easier.

For now, let’s step back and look at how we would generate a write request in the first place.

On any write request from our interface, we set all three signals high, AWVALID, WVALID, and BREADY. Remember, these signals are also encoding our state machine. We won’t return to idle again until BREADY is cleared.

	end else if (i_cmd_wr)
	begin
		M_AXI_AWVALID <= 1;
		M_AXI_WVALID  <= 1;
		M_AXI_BREADY  <= 1;
	end

That leaves only two signals left for the write half, WDATA and WSTRB. In the case of WSTRB, it’s easy: the hexbus only supports full 32-bit word accesses–this is no different from the wbubus or any of my other debugging buses. As a result, there’s no way to access an 8-bit byte within any 32-bit word using the protocol we defined above in Fig. 6. For this reason, we can just leave WSTRB as all ones: any write will write to all four bytes at the same time.

	assign	M_AXI_WSTRB = -1;

The second piece is almost just as unremarkable: if we aren’t busy, then we can set the write data based upon any incoming command.

	always @(posedge i_clk)
	if (!M_AXI_BREADY)
		M_AXI_WDATA <= i_cmd_word[31:0];

There’s just one problem with this: how much downstream logic will get driven every time i_cmd_word changes? There’s a cost in terms of power to every wire that has to change. Thus, although this is a low-logic solution, there is also a low power solution.

Perhaps the ultimate low power solution would be to only update WDATA on a new write request.

	always @(posedge i_clk)
	if (i_cmd_wr)
		M_AXI_WDATA <= i_cmd_word[31:0];

I’ve also been experimenting with forcing value to zero when not in use, for much the same reason. In that case, we might try:

	always @(posedge i_clk)
	if (!S_AXI_ARESETN)
		M_AXI_WDATA <= 0;
	else if (i_cmd_wr)
		M_AXI_WDATA <= i_cmd_word[31:0];
	else if (M_AXI_WREADY)
		M_AXI_WDATA <= 0;

Either way, the point is that following a write request, we want to make certain that we are then driving the bus based upon that request. A simple assertion at this point in the design can help us describe this.

assert property (@(posedge i_clk)
	S_AXI_ARESETN && i_cmd_wr
	|=> M_AXI_BREADY && M_AXI_AWVALID && M_AXI_WVALID
		&& M_AXI_WDATA == $past(i_cmd_word[31:0]));

There’s just one thing we’ve skipped, and that’s creating the write return response. We’ll come back to that in a moment, though, following the read state machine.

Read Processing

As it turns out, reads are even easier than writes. On a reset, we clear ARVALID.

	always @(posedge S_AXI_ARESETN)
	if (!S_AXI_ARESETN)
	begin
		M_AXI_ARVALID <= 1'b0;
		M_AXI_RREADY  <= 1'b0;
	end

While waiting for a response, we’ll clear ARVALID on any ARREADY.

	else if (M_AXI_RREADY)
	begin
		if (M_AXI_ARREADY)
			M_AXI_ARVALID <= 1'b0;

Once we get our read response, we’ll clear RREADY–sending us back to our idle state.

		if (M_AXI_RVALID)
			M_AXI_RREADY <= 1'b0;

But how shall we begin any reads? Simple! If we are in our idle state, then start a read on any request.

	end else if (i_cmd_rd)
	begin
		M_AXI_ARVALID <= 1'b1;
		M_AXI_RREADY  <= 1'b1;
	end

We can even capture this thought in a simple assertion.

	assert property (@(posedge S_AXI_ACLK)
		S_AXI_ARESETN && i_cmd_rd
		|=> M_AXI_ARVALID && M_AXI_RREADY);

While this sort of ad-hoc assertion isn’t sufficient to pass induction, it’s certainly good enough to get us started when we get there. Actually, when we get there below, I’m going to continue using immediate assertions–they’re a bit more verbose, but they can have the same effect without many of the serious drawbacks associated with formally verifying concurrent assertions.

All that remains is to grab and return the response to then be sent to rest of the debugging bus design.

Return Processing

Now that we’ve run the bus and accomplished our transaction, it’s important that we return a proper response downstream. In this case, we’ll want to send one of several words down the debugging bus processing chain depending on both our state, and the response we just received from the bus:

Following a system reset, we’ll immediately send a reset confirmation downstream
On a write response, we’ll send a write acknowledgment
On a read response, we’ll need to send the RDATA value that the bus returned
On any bus error, we’ll want to send a bus error, response.
Finally, on any new address, we’ll send that new address downstream the first time it is used

Now, how shall all these values be encoded? I’ll admit, I spend far more time thinking about this than perhaps I should have.

If you’ll remember, some time ago I discussed minimizing logic usage when the question was how to select from a number of potential sources–each with a valid flag. The answer I came up with at the time was to pre-calculate an index, and then to use a case statement based upon that index to determine a return value.

An alternative approach that came up in a twitter thread with Clifford was to use a for loop, but in such a fashion that it would simply collapse into a sum of products. For example, if you know that only one ACK value will ever be true at a time, you might write:

	always @(*)
	begin
		return_value = 0;

		for(k=0; k<AXI_LIST_SIZE; k=k+1)
		if (ack[k])
			return_value = return_value | data[k];
	end

Notice how the result doesn’t depend upon any multiplexers: it’s just a giant OR statement–a “sum” (i.e. OR) of “products” (ANDs). As long as you, the designer, can ensure that the ack vector will only ever be one hot or zero, then this approach can work well.

Indeed, this is the approach I chose to use for the response word, returning data to our debugging bus processing chain. I started by initializing this response word to zero. Then, on any write return, I set the response word.

	always @(*)
	begin
		rsp_word = 0;

		if (M_AXI_BVALID)
		begin
			if (M_AXI_BRESP[1])
				rsp_word[33:28] = RSP_BUS_ERROR[33:28];
			else
				rsp_word[33:32] = RSP_WRITE_ACKNOWLEDGMENT[33:32];
		end

Note that there are two possible returns here: either there’s been a bus error, and a bus error, return needs to be generated, or we are simply acknowledging that a write has been completed.

Given that this is the first potential value of the response word, there were no “OR” values here–at least, not yet. For the first word, we can just set things independent of any prior value in the chain.

We can then move on to any read response. Here things change subtly. Unlike if BVALID was true above, where I could force the prior value of rsp_word to a known value, in this case of RVALID I might need to set rsp_word to completely different response word. In this case, the synthesizer would never know that RVALID would only ever be true if BVALID were not. So, I used the “OR” approach outlined above to capture the idea of merging these two return responses.

		if (M_AXI_RVALID)
		begin
			if (M_AXI_RRESP[1])
				rsp_word[33:28] = rsp_word[33:28]
						| RSP_BUS_ERROR[33:28];
			else
				rsp_word = rsp_word
					| { RSP_SUB_DATA, M_AXI_RDATA };
		end

As a final potential return value, the response word needs to contain any new address the first time we use it. As before, we’ll simply OR this together with the prior values.

		if (newaddr)
			rsp_word = rsp_word | { RSP_SUB_ADDR,
					{(32-AW-2){1'b0}},
					M_AXI_AWADDR[AW+1:2], 1'b0, !inc };

The neat thing about these “OR” functions is that they don’t create long multiplexer chains. Further, because this rsp_word, however, is built of “OR” functions, the practical reality is that you must build it in an always @(*) block. Within the always @(*) block, rsp_word is allowed to reference the last value of rsp_word–something that would not work in an always @(posedge CLK) block. That also means that, now that we’ve built our response word, rsp_word, we now need to register it in a second step.

For those who know me and the logic I write, you’ll know that I don’t normally use two process blocks. The complexity of rsp_word above, however, is enough to force us into a two process implementation.

Hence, here’s the second process. It starts with the reset.

	initial	o_rsp_stb = 1'b1;
	initial	o_rsp_word = RSP_RESET;
	always @(posedge i_clk)
	if (i_reset)
	begin
		o_rsp_stb <= 1'b1;
		o_rsp_word <= RSP_RESET;

On any system reset, our first response down the processing chain will be to acknowledge that reset.

Otherwise, we’ll send a response downstream on either any response from the bus, or any time we get a read or write request after a new address has been set.

	end else begin
		o_rsp_stb  <= 0;
		o_rsp_word <= 0;

		if (M_AXI_BVALID || M_AXI_RVALID)
			o_rsp_stb <= 1;

		if (newaddr && (i_cmd_rd || i_cmd_wr))
			o_rsp_stb <= 1;

The final step is to set the response word that will be valid if ever o_rsp_stb is also valid. This is the data word, set above, that will be qualified by o_rsp_stb and ignored any time o_rsp_stb is zero.

		o_rsp_word <= rsp_word;
	end

This ends the basic AXI-lite bus master implementation. A couple things to note:

We kept this simple, by limiting ourselves to no more than one request at a time. AXI-lite can handle many more, but our goal here was simplicity.
We encoded our “state machine”’s state in the various hand shaking signals used by AXI-lite. While this may not feel like a conventional state machine, it is technically a state machine. Even better, the approach is both simple and effective.

Although this design was intended for use with a debugging bus implementation, the unexpected reality is that we could use this approach to script any AXI-lite interaction we wanted to create. In other words, this simple approach is quite a bit more powerful than I had originally intended.

Verification

Let’s do verification the easy way. Any time you need to verify that an AXI-lite implementation “works”, the easy way to verify it is to grab a copy of the formal AXI-lite model and then to simply instantiate it within your design.

There’s a couple of configuration notes to setting this up. First, we only need two bits to be able to count up to the maximum number of transactions on the bus. Hence, we’ll set the F_LGDEPTH to 2 and define a couple of values to connect to our model having this width.

	localparam	F_LGDEPTH=2;
	wire	[F_LGDEPTH-1:0]
			faxil_awr_outstanding, faxil_wr_outstanding,
			faxil_rd_outstanding;

We also need to set the address width (C_AXI_ADDR_WIDTH) and data width (C_AXI_DATA_WIDTH) of the property set. We’ll allow the design to assume the existence of a reset (F_OPT_ASSUME_RESET), while also not requiring that reset to be a full 16 clock cycles (F_OPT_NO_RESET). (Xilinx’s AXI implementation notes require a long reset, even though most of their IP does not.)

	faxil_master #(
		.C_AXI_ADDR_WIDTH(AW+2),.C_AXI_DATA_WIDTH(32),
		.F_OPT_ASSUME_RESET(1'b1),
		.F_OPT_NO_RESET(1'b1),
		.F_LGDEPTH(F_LGDEPTH)

From here, the rest of instantiating the AXI-lite properties are very straightforward.

	) faxil(
		.i_clk(i_clk), .i_axi_reset_n(!i_reset),
		.i_axi_awvalid(M_AXI_AWVALID), .i_axi_awready(M_AXI_AWREADY),
			.i_axi_awaddr(M_AXI_AWADDR),
			.i_axi_awprot(M_AXI_AWPROT),
		.i_axi_wvalid(M_AXI_WVALID), .i_axi_wready(M_AXI_WREADY),
			.i_axi_wdata(M_AXI_WDATA),
			.i_axi_wstrb(M_AXI_WSTRB),
		.i_axi_bvalid(M_AXI_BVALID), .i_axi_bready(M_AXI_BREADY),
			.i_axi_bresp(M_AXI_BRESP),
		.i_axi_arvalid(M_AXI_ARVALID), .i_axi_arready(M_AXI_ARREADY),
			.i_axi_araddr(M_AXI_ARADDR),
			.i_axi_arprot(M_AXI_ARPROT),
		.i_axi_rvalid(M_AXI_RVALID), .i_axi_rready(M_AXI_RREADY),
			.i_axi_rdata(M_AXI_RDATA),
			.i_axi_rresp(M_AXI_RRESP),
		.f_axi_rd_outstanding(faxil_rd_outstanding),
			.f_axi_wr_outstanding(faxil_wr_outstanding),
			.f_axi_awr_outstanding(faxil_awr_outstanding)
	);

At this point, we should be able to start running and passing proofs. Induction will take some more work, but we’ll get to that in a moment. Even better, this design is so simple that 20-40 clock steps should be sufficient for any non-induction proof.

This is also the point where I tend to start throwing assertions at the wall, just to make certain that things I’ve assumed during my design really are true. For example, we chose above to capture our “state” in BREADY and RREADY. Our goal was that if we were ever working on a write, then BREADY should be true, and if we were ever working on a read the RREADY should be true. If neither are true, then we should be idle. This also means that both should never be true together.

	always @(*)
		assert(!M_AXI_BREADY || !M_AXI_RREADY);

Let’s break this down a bit more, though. If BREADY is false, then we are not in the middle of any write transactions. The number of AWVALIDs that have taken place without seeing any corresponding BVALID are zero, and the same can be said for WVALIDs. Not only that, but if BREADY is false, then both AWVALID and WVALID should be zero–since we’re not in the middle of any write transaction either.

	always @(*)
	if (!M_AXI_BREADY)
	begin
		assert(faxil_awr_outstanding == 0);
		assert(faxil_wr_outstanding  == 0);
		assert(M_AXI_AWVALID == 0);
		assert(M_AXI_WVALID  == 0);
	end else begin

Where things get a bit more interesting is when BREADY is true. In this case, we’ll have a write address request outstanding if AWVALID has been accepted and dropped. The same will be true of a write data request should WVALID have been accepted and then dropped.

		assert(faxil_awr_outstanding == (M_AXI_AWVALID ? 0:1));
		assert(faxil_wr_outstanding  == (M_AXI_WVALID  ? 0:1));
	end

Indeed, this is often all I have to do to verify the write half of an AXI-lite interface. It’s pretty easy, and nearly boilerplate.

The read half isn’t all that different either.

If RREADY is low, then we aren’t trying to read and so both ARVALID and the number of read requests outstanding should both be zero.

	always @(*)
	if (!M_AXI_RREADY)
	begin
		assert(faxil_rd_outstanding == 0);
		assert(M_AXI_ARVALID == 0);

If, on the other hand, we are reading and so RREADY is high, then either ARVALID is one or we have exactly one read request outstanding.

	end else
		assert(faxil_rd_outstanding == (M_AXI_ARVALID ? 0:1));

How about reset? Following a reset, we should be able to assume that nothing is incoming. Likewise, following a reset, we should be in our idle “state” with both BREADY and RREADY low.

	always @(posedge i_clk)
	if ((!f_past_valid)||($past(i_reset)))
	begin
		`ASSUME(!i_cmd_stb);
		assert(!M_AXI_BREADY);
		assert(!M_AXI_RREADY);
	end

Did you notice how we only checked BVALID above if BREADY were also true? Or likewise we only checked RVALID if RREADY was also true? Let’s add a quick property to help guarantee that neither BVALID or RVALID will ever be true unless we are expecting them. (This should also be captured by the properties above, but an extra assertion or two won’t hurt anything.)

	always @(*)
	if (M_AXI_BVALID)
		assert(M_AXI_BREADY);

	always @(*)
	if (M_AXI_RVALID)
		assert(M_AXI_RREADY);

So far, we’ve focused primarily on the AXI-lite interface. Indeed, the above is really all that’s required to verify an AXI-lite interface. There’s literally nothing more to it.

In the meantime, though, I’d like to assume the stream properties our incoming interface. This interface is essentially an AXI stream interface, although the labels are a bit different. For example, we used a busy instead of a ready–but the principle remains almost identical. Hence, following any reset, we can assume that the STB (VALID) goes low. Second, following any STB && BUSY (i.e. VALID && !READY), pending requests need to remain that: pending and without change.

	always @(posedge i_clk)
	if (!f_past_valid || $past(i_reset))
	begin
		assume(!i_cmd_stb);
	end else if ($past(i_cmd_stb && o_cmd_busy))
	begin
		assume(i_cmd_stb);
		assume($stable(i_cmd_word));
	end

That leaves us with one last property: that our BUSY signal will be true any time either BREADY || RREADY.

	always @(*)
		assert(o_cmd_busy == (M_AXI_BREADY || M_AXI_RREADY));

This, however, is one of those “Do I really need this?” assertions. Why? Because we defined o_cmd_busy as BREADY || RREADY. Why then have an assertion to verify this?

Do we need such an assertion? Probably not. I’ve placed it in here, though, to remind myself that o_cmd_busy has a specific definition. There will be consequences should I ever try to change it in the future. This is just a reminder of that–something to force me to think a touch harder before ever adjusting this value.

Contract checking

Now that we know our AXI-lite interface works, let’s turn our attention to the specific functionality of this design. Specifically, we want to know not just that the design will follow the AXI-lite rules of the road, but also that it will do what we want it to. So, let’s check some contract rules.

For example, we want to assert the newaddr flag following any requested address, but also to guarantee that it returns low after we issue any bus requests.

	always @(posedge i_clk)
	if (!f_past_valid || $past(i_reset))
		assert(newaddr);
	else if ($past(i_cmd_addr))
		assert(newaddr);
	else if ($past(i_cmd_rd || i_cmd_wr))
		assert(!newaddr);
	else
		assert($stable(newaddr));

Following this further, on any request to read or write following a new address request, we should also be producing a downstream response acknowledging the new address.

	always @(posedge i_clk)
	if ((f_past_valid)&&(!$past(i_reset))&&($past(i_cmd_rd || i_cmd_wr)))
	begin
		`ASSERT(o_rsp_stb  == $past(newaddr));
		if ($past(newaddr))
			`ASSERT(o_rsp_word == { RSP_SUB_ADDR, {(30-AW){1'b0}},
				$past(M_AXI_AWADDR[AW+1:1]), !$past(inc) });
	end

Finally, the new address flag should be low while any request is pending.

	always @(*)
	if (!i_reset && (M_AXI_BREADY || M_AXI_RREADY))
		assert(!newaddr);

How about resets? Following any reset, we said we wanted to produce a reset response output. Here, we’ll just double check that this happens.

	always @(posedge i_clk)
	if (f_past_valid && $past(i_reset))
		`ASSERT((o_rsp_stb)&&(o_rsp_word == RSP_RESET));

We can then check for write acknowledgments following BVALID and BRESP=OKAY.

	always @(posedge i_clk)
	if (f_past_valid && !$past(i_reset)
			&& $past(M_AXI_BVALID && !M_AXI_BRESP[1]))
	begin
		assert(o_rsp_stb);
		assert(o_rsp_word == RSP_WRITE_ACKNOWLEDGEMENT);
	end

Read acknowledgments are also (nearly) identical.

	always @(posedge i_clk)
	if (f_past_valid && !$past(i_reset)
			&& $past(M_AXI_RVALID && !M_AXI_RRESP[1]))
	begin
		assert(o_rsp_stb);
		assert(o_rsp_word == { RSP_SUB_DATA, $past(M_AXI_RDATA) });
	end

The last response we might return is a bus error. In this case, if xRESP is every anything other than OKAY, then it’s an error. (AXI-lite doesn’t allow xRESP to ever equal EXOKAY=2'b01.) We don’t care, here, if it’s a slave error, 2'b10, or an interconnect error, 2'b11–a bus error return is a bus error return as far as this protocol goes.

	always @(posedge i_clk)
	if (f_past_valid && (!$past(i_reset))
		&&(($past(M_AXI_BVALID && M_AXI_BRESP[1]))
			||($past(M_AXI_RVALID && M_AXI_RRESP[1]))))
	begin
		assert(o_rsp_stb);
		assert(o_rsp_word == RSP_BUS_ERROR);
	end

At this point, we should have good confidence that our design will always return the values downstream that it’s supposed to.

Cover Checks

This leaves us one last verification step. So far, we’ve proven that this design will follow the AXI-lite protocol. We’ve proven this via induction. We’ve also guaranteed that the design will properly return appropriate values down stream based upon what’s going on within.

What we haven’t done is to prove that responses are still possible.

I’ve just had one too many designs where I’ve convinced myself that the design works when, for one reason or another, I’ve made one too many assumptions to keep the design from working. For example, I once assumed reset was always true. It was amazing how quickly the design passed a formal check, and just as disheartening to see that it never worked in simulation or hardware.

A good cover check will help guarantee we haven’t made such mistakes.

Therefore, let’s see if we can complete several writes and reads.

The first step is to count the number of writes that complete. In this case, let’s count how many writes in a row we can go through–while disallowing any reads.

	initial	cvr_writes = 0;
	always @(posedge i_clk)
	if (i_reset || M_AXI_RREADY)
		cvr_writes <= 0;
	else if (M_AXI_BVALID)
		cvr_writes <= cvr_writes + 1;

Our goal will be to accomplish four writes before returning to idle.

	always @(posedge i_clk)
		cover(cvr_writes == 4 && faxil_awr_outstanding == 0
			&& faxil_wr_outstanding == 0);

You can see how well we did in Fig. 9 below.

Fig 9. Cover, four writes in a row

In this case, there are four write requests, and six responses forwarded downstream. The first response acknowledges a reset, and the next acknowledges the new address. These two responses are then followed by a regular write acknowledgment, and then (bonus!) three bus error acknowledgments.

This is also the place where I usually measure throughput as well. In this case, the throughput is horrible: one word can be written every three cycles. It’s worse than that, though, since this doesn’t capture any interconnect latencies.

On the other hand, the purpose of this design was never throughput–it was low logic, and a basic demonstration of an AXI-lite master. We’ll come back to the logic estimate in a moment to see how well we did there.

For now, let’s repeat this test with reads. Can we cover a set of four reads in a row? The first step is to count them–much like we did before.

	initial	cvr_reads = 0;
	always @(posedge i_clk)
	if (i_reset || M_AXI_BREADY)
		cvr_reads <= 0;
	else if (M_AXI_RVALID)
		cvr_reads <= cvr_reads + 1;

Now let’s let the formal tool find us a sequence showing how four reads might look in a row, once we’ve returned to idle.

	always @(posedge i_clk)
		cover(cvr_reads == 4 && faxil_rd_outstanding == 0);
endmodule

You can see the result of this exercise in Fig. 10 below.

Fig 10. Cover, four reads in a row

As before, we’re getting about a 33% throughput. There’s a reset acknowledgment, a new address acknowledgment, a read response, and then three read bus errors. The 33% throughput isn’t great, and it’s certainly nothing to write home about. But, as before, our goal is low logic and this is certainly that.

Conclusion

I’ve now mentioned several times that our purpose is low logic. How low, therefore, did we get? A quick Yosys run shows that this simple and basic AXI-lite design requires no more than 148 4-LUTs. Not bad for an iCE40, no? Indeed, the entire AXI-lite verseion of the hexbus on an iCE40 (minus the serial port) requires no more than 349 4-LUTs.

Surely 349 4-LUTs can be easily hidden in a larger design, no? Surely it’s a small price to pay for ad-hoc, external access to the bus within a design? Other costs, however, will always add up. Don’t forget that, in addition to the missing serial port cost (about 135 4-LUTs), there’s also the cost of adding yet one more master to the internal crossbar–something that can run upwards of 1500 4-LUTs by itself alone.

Still, this does make for a very low logic AXI-lite master. Remember our last AXI-lite master implementation? That was a bridge from Wishbone to AXI-lite. Comparably, it requires 118 4-LUTs to the 148 4-LUTs used by today’s controller. The big difference with this controller, though, is that this one is intended for scripting. Therefore, there are fewer wires used to control this master.

Better, because this controller can be easily scripted, its uses go well beyond the debug bus implementation it is designed and presented for.

Before leaving, I should also point out that neither the hexbus nor the wbubus is the end-all in debugging bus implementations. The first can transfer, at best, one 32-bit word every 10 bytes (100 baud intervals). The wbubus is better, but it can transfer one 32-bit word in six bytes (60 baud intervals, or 40% faster–before compression). I’m currently working on a newer version of the bus which will be able to transfer one 32-bit word in five bytes (50 baud intervals)–while still reserving one bit so as to multiplex a console channel over the debugging bus. Were I to implement it without console support, then the new bus implementation would be able to transfer (worst case) one word in 45 baud intervals. That’s a full 55% faster than the hexbus, and yes, times do add up when you are transferring large amounts of information. Indeed, that last couple of percentage points can amount to minutes of valuable transfer time.

As you can see, with a little bit of work, performance and throughput can and do improve over time–although getting that last little bit always tends to be somewhat of a challenge. Perhaps that’s just the reality of any engineering endeavor.