Building a Simple Wishbone Master

Explaining how to build a good wishbone controlled debug port may take a couple of posts to do right. Worse, it may take us a couple of rounds just to get the logic right, but let’s try anyway.

Fig 1: WB-UART Overview

For this post, we’re going to concentrate on the wishbone bus master found at the bottom of the simplified UART to wishbone converter, as outlined in Fig 1.

Hence, if the whole capability will eventually look like Fig 1, we’re only looking at the component at the very bottom within this post, and even then we’re only going to examine a simplified version of it. We’ll leave the implementation of multiple transactions at once for a later date.

My strategy for this blog post will be to come back and update it later with any updates or fixes, so (hopefully) any mistakes will get fixed over time.

We are going to try to take this opportunity to build a simple wishbone bus master. Before starting, you may wish to grab a copy of the wishbone bus specification and follow along in the B4 version.

You can find definitions for the various wishbone interface wires in chapter two. We’re going to continue following our practice of prepending input wires with i_… and output wires with o_.., even though the specification appends similar notes. We’re also going to add the _wb_ designator to all of the inputs associated with the wishbone bus. Hence, i_wb_ack will reference the return acknowledgement from the slave.

Chapter three of the spec describes how the various wishbone wires compose a bus cycle. We’ll specifically be implementing the pipelined bus cycle (not the classic bus cycle), and we’re eventually going to build our implementation so that it can issue requests across the bus in a pipelined fashion as fast as the slave will allow it.

If you are a visual learner, check out figures 3-6, and 3-8 from the specification. I find these to be the most useful, as I’m explaining how the bus works. We’ll reserve the capability shown in Figures 3-11 and 3-13 for a later post.

For today’s post, we’ll handle a single bus interaction per bus cycle, so the CYC line will be lowered between requests. We’ll revisit this decision in a later post so that we can issue multiple requests of the bus at one request per clock, but that will be a later discussion.

Further, you can see the code as we’re building it in the dbgbus project here.

The Control Interface

The bus master interface we are building will ultimately be commanded from an external interface. We’ve already discussed how that one might wish to do this over ICO board parallel port, UART, SPI, or JTAG. Exactly which is used will be external to our bus master implementation.

We’ll have to build up the functionality in our control interface as we build up the whole interface. The two sort of go together.

For now, let’s handle all of our communication using 34-bit words. We’ll use the top 2-bits of these 34-bit words for signaling, and then the bottom 32 for any values we wish to pass.

input		i_reset;
input		i_cmd_stb;
input	[33:0]	i_cmd_word;
output	reg	o_cmd_busy;

Our input data word will be valid any time i_cmd_stb is valid, but it will only be accepted whenever o_cmd_busy is false. Hence, when you read the code, you may find (i_stb)&&(!o_cmd_busy). This will be the indication a request has been accepted.

For now, we’ll just set o_cmd_busy to be true any time the bus is active. Eventually, we’ll want to take a peek at the next bus request, and drop o_cmd_busy if the next request is one we are interested in. For example, if we are busy doing something and the user requests a bus reset, we’ll need to drop the busy line and accept that request.

For our command words, we’ll use the following definition to define how the 34-bit control words will be interpreted:

33	32	31 - 0
0	0	Read request, ignore the rest of the 32-bits, ACK on output
0	1	Write request, the 32-bit data contains the word to be written
1	0	Set an address. If bit 31 is set, we’ll add this value to the current bus address. If bit 30 is set, the address will be incremented upon each bus access
1	1	4’h0, 28’hxx, Bus Reset

Our signaling scheme will allow us to issue a bus reset command, which will abruptly cause us to abandon any bus cycle we may be in the middle of. To make this work, the bus reset request will need to override the busy flag.

This is not the most efficient scheme. For example, why send 34-bits when you are only going to pay attention to two of them (i.e. the read)? Wouldn’t it make more sense to send a smaller number of bits for the read, together with the number of items you intend to read? Yes, it would. Optimizing this command word will be the subject of another post.

The output of our bus will use (almost) the exact same approach. We’ll create a o_rsp_stb signal that will be true any time the output references a valid codeword. Unlike the input, though, we’ll ignore any flow control on the output. We can add some amount of flow control back in later with a FIFO.

input		o_rsp_stb;
input		o_rsp_word;

We’ll also borrow from the input codeword encoding for the return trip encoding, as shown below:

33	32	31 - 0
0	0	Acknowledge a write. The 32-bit value contains number of writes to acknowledge
0	1	Read response, the 32 data bits are the word that was read
1	0	Acknowledge an address that has been set, with two zero bits and 30 address bits
1	1
1	1	3’h0, 29’hxx, Bus Reset acknowledgement
1	1	3’h1, 29’hxx, Bus Error

Decoding the Control Interface

We have only four different types of command words in our code book. In the first section of our simple bus master, we’ll create flags to indicate which request is currently being made.

assign	i_cmd_rd   = (i_cmd_stb)&&(i_cmd_word[33:32] == 2'b00);
assign	i_cmd_wr   = (i_cmd_stb)&&(i_cmd_word[33:32] == 2'b01);
// We'll use i_cmd_bus to capture whether we have a read or write request
assign	i_cmd_bus  = (i_cmd_stb)&&(i_cmd_word[33]    == 1'b0);
//
assign	i_cmd_addr = (i_cmd_stb)&&(i_cmd_word[33:32] == 2'b10);
assign	i_cmd_special = (i_cmd_stb)&&(i_cmd_word[33:32] == 2'b11);

You may notice that I have violated my naming convention with these wires: I have named locally generated wires with an i_… prefix when they are not actual inputs to our bus master module, but rather the results of combinatorial logic applied to inputs. In this case, it tends to work out, but it’s not something I’m regularly going to do.

The Wishbone Master Interface

This bus controller will have three basic states, as shown in Fig 2.

Wishbone Master State Transition Diagram

Here’s a quick description of each of those states:

IDLE: When we are doing nothing, both CYC and STB must be low. In this state, we’ll need to be responsive to incoming requests from the bus. Upon a request, we’ll need to set the request direction (o_wb_we), the data lines (o_wb_data) and then CYC and STB.

We’ll also set our address lines in this state, but without adjusting CYC or STB.
BUS REQUEST: When CYC and STB are both high, a bus request is taking place. This request phase lasts until i_wb_stall goes low, at which point our request has been accepted.

When we come back to this code later and transition it to handling multiple requests, we will transition from one request to the next any time o_wb_stb is true and i_wb_stall is false.

It is also possible that the acknowledgement might be received on the same clock the transaction was requested. We’ll need to make certain we deal with this case.
BUS RESPONSE: After making a request of the bus, we need to wait until the slave acknowledges it. Every acknowledgement will lead us to sending another response across our command interface back up our command stream. More importantly, every read response will also need to carry the value read from the i_wb_data data lines in its payload. We’ll need to make certain we return those back to the user.

Once the last acknowledgement is received, we can transition back to the idle state.

There are a couple of exceptions to this model: if a bus error occurs, we’ll simply abandon the current transaction. This approach has the inherent problem in it that acknowledgements may come back later and get mixed with another bus request. For now, we’ll accept that risk and do it anyway, since it may be the only way to recover the bus if a peripheral is non-responsive.

Simplified Overview

If you’re still somewhat new to digital design and coming from the software world, your first approach to building a Wishbone Bus Master might look something like the following:

initial	o_wb_cyc = 1'b0;
initial	o_wb_stb = 1'b0;
initial	newaddr  = 1'b0;
initial	o_rsp_stb= 1'b0;
always @(posedge i_clk)
if ((i_reset)||(i_wb_err))
begin
	o_wb_cyc <= 1'b0;
	o_wb_stb <= 1'b0;
	o_cmd_busy   <= 1'b0;
	o_rsp_stb    <= 1'b1;
	newaddr <= 0;

	// Return over the command interface that we just had an error,
	// or a bus reset
	if (i_reset)
		o_rsp_word <= `RSP_RESET;
	else
		o_rsp_word <= `RSP_BUS_ERROR;
end else if ((i_cmd_stb)&&(!o_cmd_busy))
begin
	//
	// In the idle state
	//
	newaddr <= 0;
	if (i_cmd_addr)
	begin
		if (!i_cmd_word[1])
			o_wb_addr <= i_cmd_word[29:0];
		else
			o_wb_addr <= i_cmd_word[29:0] + o_wb_addr;

		inc <= !i_cmd_word[0];

		// Acknowledge the new address -- on the next clock
		// (after the add has completed)
		newaddr <= 1'b1;
	end

	if (newaddr)
	begin
		o_rsp_stb <= 1'b1;
		o_rsp_word <= { `RSP_SUB_ADDR, o_wb_addr, 1'b0, !inc };
	end

	o_wb_we <= (i_cmd_wr);

	// On a read or write request, activate the bus and go to the bus
	// request state
	if (i_cmd_bus)
	begin
		o_wb_cyc <= 1'b1;
		o_wb_stb <= 1'b1;
		o_cmd_busy   <= 1'b1;
	end

	if (i_cmd_wr)
		o_wb_data <= i_cmd_word[31:0];
end else if (o_wb_stb)
begin
	newaddr <= 1'b0;
	//
	// BUS REQUEST state
	//
	// In the state where we are commanding the bus, and waiting for
	// the bus request to be accepted
	//
	// o_wb_cyc will also be true here, since we cannot allow
	// o_wb_stb to be true if o_wb_cyc is not true.  (Too many
	// peripherals depend upon this bus simplification ...)
	//
	if (!i_wb_stall)
	begin
		// The request has been accepted, don't request again.
		o_wb_stb  <= 1'b0;
		o_wb_addr <= o_wb_addr + inc;

		// If we get an ack on the same cycle as the request,
		// quietly transition back to idle.
		if (i_wb_ack)
		begin
			o_wb_cyc <= 1'b0;
			o_rsp_stb <= 1'b1;
			if (o_wb_we)
				o_rsp_word <= `RSP_WRITE_ACKNOWLEDGEMENT;
			else
				o_rsp_word <= { `RSP_SUB_DATA, i_wb_data };
		end
	end
end else if (o_wb_cyc)
begin
	newaddr <= 1'b0;
	if (i_wb_ack)
	begin
		o_wb_cyc <= 1'b0;
		o_cmd_busy <= 1'b0;
		o_rsp_stb <= 1'b1;
		if (o_wb_we)
			o_rsp_word <= `RSP_WRITE_ACKNOWLEDGEMENT;
		else
			o_rsp_word <= { `RSP_SUB_DATA, i_wb_data };
	end
end

From this view, the three states of the controller should be readily apparent. The controller starts out idle, CYC=STB=0. Once the controller receives a command, it moves to a bus request state to issue the command, CYC=STB=1. Once the command has been issued, but before any response, it is in a bus wait state with CYC=1, STB=0. When the final ACK comes back, we’ll go back to idle, CYC=STB=0.

That’s how a wishbone master works.

When I first sketch out a design, it often looks very similar to this giant always block above. Perhaps it’s my software background. I like to build one big always block with all the parts and pieces within it. Indeed, my flash controller is still built in this fashion, with one giant always block.

Now that I’ve been doing this for a while, I’ve learned that breaking the big always block up into little blocks is easier on the FPGA. For example, in this case, why should the address lines only get set when the new address command shows up and the reset is clear?

For this reason, we’ll split up the always block into parts and pieces.

The CYC and STB Lines

The wishbone CYC and STB lines are so integrally connected, they tend to remain together no matter how the interface gets broken up. Indeed, these two wires alone define which state we are in within our state space. Further, in the big always block above, few lines actually depend upon the reset line. Hence, we’ll build their state diagram like this:

initial	o_wb_cyc = 1'b0;
initial	o_wb_stb = 1'b0;
always @(posedge i_clk)
	if ((i_reset)||((i_wb_err)&&(o_wb_cyc)))
	begin
		// On any error or reset, then clear the bus.
		o_wb_cyc <= 1'b0;
		o_wb_stb <= 1'b0;
	end else if (o_wb_stb)
	begin
		//
		// BUS REQUEST state
		//
		if (!i_wb_stall)
			// If we are only going to do one transaction,
			// then as soon as the stall line is lowered, we are
			// done.
			o_wb_stb <= 1'b0;

		// While not likely, it is possible that a slave might ACK
		// our request on the same clock it is received.  In that
		// case, drop the CYC line.
		//
		// We gate this with the stall line in case we receive an
		// ACK while our request has yet to go out.  This may make
		// more sense later, when we are sending multiple back to back
		// requests across the bus, but we'll leave this gate here
		// as a placeholder until then.
		if ((!i_wb_stall)&&(i_wb_ack))
			o_wb_cyc <= 1'b0;
	end else if (o_wb_cyc)
	begin
		//
		// BUS WAIT
		//
		if (i_wb_ack)
			// Once the slave acknowledges our request, we are done.
			o_wb_cyc <= 1'b0;
	end else begin
		//
		// IDLE state
		//
		if (i_cmd_bus)
		begin
			// We've been asked to start a bus cycle from our
			// command word, either RD or WR
			o_wb_cyc <= 1'b1;
			o_wb_stb <= 1'b1;
		end
	end

	// For now, we'll use the bus cycle line as an indication of whether
	// or not we are too busy to accept anything else from the command
	// port.  This will change if we want to accept multiple write
	// commands per bus cycle, but that will be a bus master that's
	// not nearly so simple.
	assign	o_cmd_busy = o_wb_cyc;

The write line

We can significantly simplify the output bus write enable line. Since we only accept commands when we are in the idle state, and we only transition to the bus request state on a read (or write) command, we can just simply leave this as:

	always @(posedge i_clk)
		if (!o_wb_cyc)
			o_wb_we <= (i_cmd_wr);

Notice how much we just simplified this.

The consequence of this simplification (and the many others like it) is that our code will be harder to read. The positive: because this line now uses fewer FPGA resources, there will be less logic between clocks, allowing you to (possibly) run your clock a touch faster, and there will be fewer LUTs used to generate this line, allowing you to place more of what you care about onto your FPGA.

The Address Lines

We remove the address lines from the big block simply because there’s no reason why the address line logic needs to depend upon the reset line. On reset, we can allow the address (and the increment) to both come up undefined. We’ll also keep our own internal new address variable in this block as well.

	//
	// The bus ADDRESS lines
	//
	initial	newaddr = 1'b0;
	always @(posedge i_clk)
	begin
		if ((i_cmd_addr)&&(!o_cmd_busy))
		begin
			// If we are in the idle state, we accept address
			// setting commands.  Specifically, we'll allow the
			// user to either set the address, or add a difference
			// to our address.  The difference may not make sense
			// now, but if we ever wish to compress our command bus,
			// sending an address difference can drastically cut
			// down the number of bits required in a set address
			// request.
			if (!i_cmd_word[1])
				o_wb_addr <= i_cmd_word[31:2];
			else
				o_wb_addr <= i_cmd_word[31:2] + o_wb_addr;

			//
			// We'll allow that bus requests can either increment
			// the address, or leave it the same.  One bit in the
			// command word will tell us which, and we'll set this
			// bit on any set address command.
			inc <= !i_cmd_word[0];
		end else if ((o_wb_stb)&&(!i_wb_stall))
			// The address lines are used while the bus is active,
			// and referenced any time STB && !STALL are true.
			//
			// However, once STB and !STALL are both true, then the
			// bus is ready to move to the next request.  Hence,
			// we add our increment (one or zero) here.
			o_wb_addr <= o_wb_addr + {{(AW-1){1'b0}}, inc};


		// We'd like to respond to the bus with any address we just
		// set.  The goal here is that, upon any read from the bus,
		// we should be able to know what address the bus was set to.
		// For this reason, we want to return the bus address up the
		// command stream.
		//
		// The problem is that the add (above) when setting the address
		// takes a clock to do.  Hence, we'll use "newaddr" as a flag
		// that o_wb_addr has a new value in it that needs to be
		// returned via the command link.
		newaddr <= ((i_cmd_addr)&&(!o_cmd_busy));
	end

The output data lines

Those output data lines can be set just like the write enable line. Only, this time, we don’t care what the lines are set to when we are reading. Hence, we’ll set them upon any request.

	always @(posedge i_clk)
	begin
		// This may look a touch confusing ... what's important is that:
		//
		// 1. No one cares what the bus data lines are, unless we are
		//	in the middle of a write cycle.
		// 2. Even during a write cycle, these lines are don't cares
		//	if the STB line is low, indicating no more requests
		// 3. When a request is received to write, and so we transition
		//	to a bus write cycle, that request will come with data.
		// 4. Hence, we set the data words in the IDLE state on the
		//	same clock we go to BUS REQUEST.  While in BUS REQUEST,
		//	these lines cannot change until the slave has accepted
		//	our inputs.
		//
		// Thus we force these lines to be constant any time STB and
		// STALL are both true, but set them otherwise.
		//
		if ((!o_wb_stb)||(!i_wb_stall))
			o_wb_data <= i_cmd_word[31:0];
	end

Since we are setting 32 outputs, the logic savings is much greater than the savings from simplifying just the one o_wb_we line.

The output result

We also need to return a result back up the command chain. This result will be dependent upon what has taken place. It could be:

An acknowledgement of a bus reset request
A notification of a bus error
An acknowledgement of a new address, or a value that has been written
Or (finally) the result of any data read from the bus.

Because of all of these possibilities, it takes a bit of logic to set this right. Remember, o_rsp_stb will be true any time o_rsp_word has valid information, and that the o_rsp_word wires are don’t cares any time o_rsp_stb is low.

always @(posedge i_clk)
	if (i_reset)
	begin
		o_rsp_stb <= 1'b1;
		o_rsp_word <= `RSP_RESET;
	end else if (i_wb_err)
	begin
		o_rsp_stb <= 1'b1;
		o_rsp_word <= `RSP_BUS_ERROR;
	end else if (o_wb_cyc) begin
		//
		// We're either in the BUS REQUEST or BUS WAIT states
		//
		// Either way, we want to return a response on our command
		// channel if anything gets ack'd
		o_rsp_stb <= (i_wb_ack);
		//
		//
		if (o_wb_we)
			o_rsp_word <= `RSP_WRITE_ACKNOWLEDGEMENT;
		else
			o_rsp_word <= { `RSP_SUB_DATA, i_wb_data };
	end else begin
		//
		// We are in the IDLE state.
		//
		// Echo any new addresses back up the command chain
		//
		o_rsp_stb  <= newaddr;
		o_rsp_word <= { `RSP_SUB_ADDR, o_wb_addr, 1'b0, !inc };
	end

A Not-So Simplified Wishbone Master

If you’ve along so far, you may notice we’ve left a lot of capabilities we want in our bus master on the floor:

There’s no means for sending multiple write commands without dropping CYC between them. This will break our flash controller, so we’ll have to come back and fix this.
There’s no means for reading from (or writing to) multiple consecutive addresses in one transaction. That’s really useful for getting us on and off the bus quickly. We’ll need to come back to this later.
There are a lot of times when the bits in our 34-bit codeword are going unused. For example, why transmit 34-bits to our device just to send a reset, when only six of those 34 bits are ever used to decode a reset? Why send a 30’bit address offset, when you are just adding it to the current address?

If we want a really good wishbone master interface that’s fully featured, we’ll need to come back and fix these things. For now, let’s move on to the next piece in our command wishbone bridge.

Examples

Now that you know the basic pieces of any wishbone bus master, here’s a list of some example wishbone bus master’s that I’ve built that you might find worth referencing:

Instruction fetch: one word at a time, two at a time, and using a cache.
CPU memory stage: one word at a time time, and pipelined.
A direct memory access (DMA) controller.
A similar UART-WB bridge.

Next Steps

You can find prior posts in this series on the site topics page. You can also see from that page where I’m hoping to go next.