No part of any system is as critical as the bus that connects all of the components together. One misbehaving peripheral, or one tyrannical master, and the bus can be locked up until the next power cycle or internal reset.
Making matters worse is the fact that it is very difficult to create a test bench for every possible bus interaction. Questions like, what happens if the bus request is abandoned, what happens if there’s a reset in the middle of the request, what happens if … are all things that are difficult to capture with a test bench. This is simply due to the fundamental limitation of test benches: they prove one path through your code, but not all paths through your code. From my own experience, test benches prove the “normal” path through the code, whereas the formal methods will check for validity even in the presence of abnormal things taking place.
As a result, bus interactions are ideal candidates for formal verification. Doing so, though, requires formal rules defining how the bus interaction must take place–rules which, if a peripheral or bus master fails to follow, will cause the bus to fail in some fashion.
Generating those rules for the Wishbone bus will be our task today. Specifically, we’ll be looking at the B4 version of the Wishbone specification, and the pipeline form of interaction within it. Our presentation will start with a discussion of how the Wishbone bus operates in general, and then a quick comparison between the B4 and B3 versions of the Wishbone specification. We’ll then mention two other buses and point out some of the differences between the Wishbone and these other buses. Finally, we’ll present a list of formal properties that can be used to verify the functionality of Wishbone bus master.
How Wishbone Works
If you are not familiar with how a Wishbone bus works, it is perhaps one of the simpler bus implementations out there. Here on this blog we’ve already discussed how to build a Wishbone slave. We’ve even walked through how to build a Wishbone bus master, and a basic Wishbone interconnect as part of our debugging bus implementation series.
From a high level standpoint, the Wishbone bus supports two channels of information: a request channel which can be used to request either a read or a write, and an acknowledgement channel which acknowledges the transaction. Further, while the request channel can be stalled at the slave’s request, the response channel cannot.
This name translation is mostly about conforming to Gisselquist
strict naming conventions: inputs begin with
i_, outputs with
io_. After that, I group all of the wires associated
with a particular interface together, and hence the
_wb_. For those
components I’ve worked on with two Wishbone interfaces, such as this delay by
one clock IP
I’ll give the other bus
another name such as
_dly_ for the delayed
bus. Later on in this
post, I may reference bus wires by the specification name bus drop the
_I suffix when it is clear what I am referencing.
By way of an introduction, let’s walk through a bus request, such as the one shown in Fig 3.
This example is drawn from the single-instruction ZipCPU prefetch. That particular prefetch module is the simplest of the four prefetch modules the ZipCPU supports, in that it only handles a single request at a time–perfect for an introductory discussion!
So, let’s walk through the steps in this request.
The bus starts out idle. This idle is defined by
o_wb_stbboth being low.
While the bus is idle, none of the other signals are relevant–save that the specification insists that the
ACKline be low.
When the bus master chooses to start a transaction, it raises the
o_wb_cycline. On the same clock, the master places a request on the bus. This means that the
o_wb_stbis raised and the address is placed on
o_wb_addr.Since this is a read request,
o_wb_weis held low.
Had this been a write request,
o_wb_wewould’ve been raised, the data to be written would be placed on
o_wb_selwould be filled out with one bit per byte in
o_wb_dataindicating which bytes are actually going to be written.
From here we move to the slave. The slave has the opportunity to tell the bus master that it’s not (yet) ready to receive the request. It does this by holding the
i_wb_stallline high. As soon as
o_wb_stbis true and
i_wb_stallis false, the request has been accepted.
There are many reasons why a slave might not be ready to receive a request, but most of the ones I’ve dealt with surround either the slave being busy with its initialization sequence or processing another interaction. For example, the SDRAM controller for the XuLA-LX25 SoC project requires over 20k clocks to start up. Once started, it can only transmit 16-bits to or from the SDRAM at a time. For this reason, after startup, the SDRAM controller needs to stall the bus during every other incoming transaction so as to allow the SDRAM controller time to send (or receive) both 16-bit halves of any 32-bit transaction.
Once the master has made its request of the slave, it drops
o_wb_stbon the first clock after
(o_wb_stb)&&(!i_wb_stall), since at this point it knows that the slave has received its request.
When the slave’s response is ready, the slave will raise the
i_wb_ackline and places the data that’s been read (if this were a response to a read request) onto the
When the master sees the
i_wb_ackhigh, it ends the request by dropping the
o_wb_cycline and the transfer is complete.
Simple, no? Let’s try another example.
Fig 4 shows an example, drawn from the ZipCPU’s dblfetch module. This is another one of the ZipCPU’s prefetch modules, with the difference being that this one makes two back to back requests of the slave.
This example starts out just like the last example. However, once into the example, there are some differences.
Instead of dropping
o_wb_stbafter the first time
(o_wb_stb)&&(!i_wb_stall), the master leaves
o_wb_stbhigh and initiates a second request. A new address is placed into
o_wb_addrfor this second request as well. Only after the second time
(o_wb_stb)&&(!i_wb_stall)is true does this master release the
o_wb_stbline, having now finished making both requests.
Since this example is being drawn from a prefetch module, the
o_wb_weline is kept low. Had this been a write request to the bus,
o_wb_wewould’ve been kept high for this second request, and
o_wb_selwould’ve been set as appropriate for a second write request.
The slave now response with two clocks with
i_wb_ackhigh. The first time
i_wb_datais set to the result of reading the first address request. On the second
i_wb_datacontains the results of reading from the second address.
It’s worth noting here that the two acknowledgements do not need to come back to back. On slower peripherals they may be separated by one or more clocks.
Also, had the request been a write request instead of a read request, everything would be the same at this step except that the
o_wb_datainformation would’ve been relevant when
i_wb_datareturned information would’ve been ignored.
That’s really all there is to it. The whole of this is really quite simple.
Or is it? Once I started trying to formally prove that a master and a slave were properly “behaving”, the formal solver started to find more and more cases of interest. For example,
What happens when the master drops the
What happens when the slave never responds?
What happens when the master wishes to change the request mid-cycle?
The Wishbone specification also defines another type of interaction, this one known as “Wishbone classic”. It is defined in both the Wishbone B4 version of the standard, as well as the original Wishbone B3 version.
When using the Wishbone classic version of the specification, the master is required to wait until the slave acknowledges the request before it can start a new request, as shown in Fig 5 below.
The unfortunate consequence of this bus implementation is that it takes a minimum of three clocks per transaction. To illustrate this difference, compare Fig 5 above to Fig 6 at the right showing a Wishbone pipeline interaction. The pipeline mode can issue and receive the return from one request per clock.
Hence, the best case classic performance is
N transactions in
whereas for the pipeline mode you can achieve
N transactions in
The problem is compounded when you add circuit timing into the mix, since the classic mode makes it very difficult to place synchronous/clocked components, such as routers, arbiters, or delays, between the master and slave without slowing down the overall system clock speed–not just the bus throughput.
The author(s) of the
Wishbone B3 specification
recognized this problem and so they offered a means of
extending the classic mode. They extended Wishbone classic with
tag lines, both input and output, that can be optionally included with the
Chief among these extra lines are the cycle type indicator or
and the burst length indicator. Using these extra lines allows components
to interact from one side of a design to another. For example, a slave can
ACK without waiting for new requests if it knows
that further requests will be coming and that they will have incrementing
Understanding all of these other wires and indicators can add additional complexity to a Wishbone master. For example, the master now needs to know which slaves support these extra request wires and transaction types and which do not. Likewise, the any router or arbiter will need to know when these wires are relevant and when they are not, and how shall they know this unless they decode all this extra logic?
All of these are reasons the ZipCPU uses the Wishbone B4 pipeline standard: it creates a single, efficient bus transaction standard. If all of the peripherals follow this same standard, then the ZipCPU (or any other bus master) doesn’t need to keep track of which components interact in one fashion of the interface and which interact in another, and slave’s don’t need to worry about whether or not the bus master will be sending additional requests or not–until they are ready to see if an additional request is incoming.
What’s missing: an Abort Capability
There are two situations missing from the Wishbone standard that will need definition before we can discuss formally proving that a master or a slave conforms to the standard. The first is how a bus error is handled, and the second is how a bus transaction may be aborted.
We’ll start with bus errors. There are several possible causes of a bus error. The most obvious one is an error generated by the interconnect when the given address doesn’t reference a known slave. Some slaves, notably those that contain buses within them but not exclusively, may also generate bus errors. Further, on those buses where security protections are in place, a bus error may be generated by a security fault. The question is, how shall a bus error be handled?
The Wishbone specification states only that the response from a slave shall either be an acknowledgement, a retry, or an error signal–never more than one of these three for any transaction. An example of this concept is shown in Fig 7. Fig 7 shows two transaction requests, the first ending in an error, and the second in an acknowledgement.
The problem with this approach is that an error indicates that a problem has or is taking place. It isn’t immediately clear how a successful transaction might follow one that is in error. As a result, the acknowledgement following an error doesn’t really make sense. Instead, the bus transaction really needs to be immediately terminated.
For this reason, I recommend that any time
i_wb_err goes high, it should
remain high until the end of the
o_wb_cyc drops, as illustrated in Fig 8.
Further, to minimize the uncertainty associated with which transactions are
complete and which ones ended in an
the stall line should be raised
i_wb_err and also held high until
o_wb_cyc is dropped.
Perhaps it may not seem all that important to implement a bus abort. This is not the case at all. Two particular cases come to mind where a bus abort might be very useful. The first is the case of a misbehaving peripheral. Should a peripheral misbehave and not return an acknowledgement, it would make sense to have a timeout following which the bus transaction would be aborted. The second case is that of a bus master that is given a warm reset signal, while the rest of the items on the bus are not reset. An example of this might be the ZipCPU needing to abort an ongoing DMA transaction. A third case, albeit a simple one, is that it would make sense to abort a transaction following a bus error as we discussed above.
In this figure, four bus transaction requests are issued, two acknowledgements are received, and then the master aborts the transaction. Let’s just dig into this idea a little deeper.
We’ll start with the simple: those transactions that have been acknowledged
prior to the abort (i.e. the dropping of
o_wb_cyc), will have been
completed. In Fig 9, transactions
A1 have completed.
All other outstanding requests have been left in an uncertain state: they
may or may not have been completed.
Further, since the
i_wb_ack line is registered,
A2’s acknowledgement still
comes back–just on the clock after
o_wb_cyc is dropped. This acknowledgement
is the result of clocked logic: it cannot be canceled until a clock after the
abort. However, the master knows nothing about the
A2 request being
completed–since the return took place after the
o_wb_cyc line was
dropped and the transaction aborted. Likewise, the
A3 transaction may
or may not have been completed, but the master received no feedback regarding
these transactions as a result.
Since the standard really doesn’t discuss how
should be handled, nor
does it describe how to deal with the case where
o_wb_cyc is dropped
mid-transaction, one might argue that these two capabilities are not changes
in the specification, but rather just
clarifications of it.
Other bus standards
Of these two, the
is the closest to the
in its definition. Indeed, some of the wires, such as the
waitrequest and the Wishbone
stall lines, are virtually identical. The chief differences from a
translation standpoint, however, are:
The Wishbone bus has a bus cycle line,
o_wb_cyc, whereas the Avalon bus does not. This cycle line delimits when a particular transaction begins and ends. It is especially useful for bus arbiters that need to know when they can switch a slave from being connected to one bus master to another.
The Avalon specification requires acknowledgements that distinguish between read and write transactions. Whereas the Wishbone’s
i_wb_ackline indicates a valid response from a slave for either read or write, the Avalon specification requires either a
writeresponsevalidfrom the slave for every transaction.
The other bus in common usage today is the AMBA AXI4 bus. This bus appears to have every feature a bus could have. First, the bus has not one, not two, but five separate channels associated with it, each of which can be individually stalled, as illustrated in Fig 11. These channels are: a read address channel, a read data channel, a write address channel, a write data channel, and a write acknowledgement channel. Second, the AXI4 bus can (optionally) return items out of order–and so bus requests need to be given identifiers so they can properly be reordered (or routed) upon return. The AXI4 bus also maintains the concept of a burst, so that the address channel can announce a burst of some length, and the logic implementing the associated data channel then needs to count items and their responses in order to support it. There’s more too: “last” transaction indicators, privilege violations, multiple types of error codes, cacheable transaction attributes, and more. Unlike the simple Wishbone bus outlined here, the AXI4 bus is much more complicated–really more so than it needs to be in my humble opinion.
One good thing about having both read and write channels, though, is that an AXI4 based DMA doesn’t need intermediate memory: it can read directly from one channel while writing to another—assuming it’s not reading from and writing to the same peripheral, such as memory.
Yes, there is a simplified version of the
AXI4 bus that’s often called
AXI-lite. While it can be much simpler to build a component slave
that responds to the AXI-lite protocol, certain features still make it more
difficult to work with than the other
discussed above. As an example, the
write request is issued across two channels, the write address and write
data channels, even though most slaves need these pieces of information on
the same clock cycle in order to work with them. (The official solution
to this problem is to have the slave to stall the address line until both
address and data are available.) A second difficult piece of complexity is
that, AXI-Lite requires a hand-shake on the return path, or
ACK path, as
well as the request path. Internally, this means is that an AXI peripheral
needs to be aware of stalls on this return path and either buffer any
responses within some form of FIFO, or stall the request path.
My point here is specifically this: if you are a hobbyist working from a limited budget, then it doesn’t really make sense to implement a bus that requires lots of logic per bus master or bus slave peripheral, rather than implementing a simple Wishbone bus interface across all masters and peripherals.
Further Simplifying the Wishbone
For anyone who has followed the ZipCPU’s development, you’ll know that I have been trying to keep the logic required for any bus interaction simple. Why? Simple because logic costs money. As such, I’ve simplified the Wishbone signals that the ZipCPU issues, and that its peripherals respond to.
There are no retry signals nor tag signals.
I haven’t found a need for a slave to produce a bus errors. ROM’s, for example, can quietly acknowledge write’s without performing any action. The consequence of this is merely that the bus master interacting with a particular peripheral is responsible for interacting with that peripheral appropriately.
CYC may not be held indefinitely.
Although the specification explicitly allows a master to hold the
CYCline high indefinitely, doing so would prevent a second master from ever accessing the same bus. Hence,
CYCneeds to be dropped as soon as the transaction is complete–assuming that the bus isn’t being held open as part of a condition requiring a lock.
The Formal Rules
To capture all of this Wishbone functionality, let’s create a formal Verilog module containing only assumptions and assertions together with any other logic necessary to express those assumptions or assertions. The goal of this module will be such that, if dropped into a Wishbone master IP component, this formal properties module can then be used to verify that the master’s interaction with the bus is done properly. (Other application specific properties may still be required.)
You can find a copy of this module here, should you wish to follow along in the discussion below.
The first thing to note about this module are the ports. This module contains all the ports necessary for a full bus interface, whether master or slave, together with the traditional clock and (synchronous) reset ports. In order to keep the state of the Wishbone transaction, as viewed by the bus master, in sync with the state as seen by any other formal properties within the bus master, three additional signals are in this port list as well. These are the count of the number of requests that have been made, the number of acknowledgements received, and the number of outstanding transactions.
F_LGDEPTH, controls the number of bits in these last three
Internal to the module, we’ll create a helper variable,
f_request, to contain
all the details of any
transaction request. We’ll come back to this and use it later.
The first property to assert will be that the
is initialized in a reset condition–no requests are being made, and the
reset line is high. Further, we’ll assume that the
are also low upon startup.
In general, we’ll assume that any inputs to the bus master obey the master’s formal properties below, while asserting that any of the bus master’s outputs follow their properties. A similar companion module, appropriate for a bus slave, will assume the inputs to the bus slave from the master and assert the outputs of a bus slave–but that’s not the perspective we’ll be using below.
We also assert, on the clock following any
i_reset request, that the
has returned to this same idle state.
Further, we’ll insist that all signals coming into or out of our formal property module can only change on the positive edge of the clock. Everything, whether input or output, must be synchronous with the clock. (A later upgrade might be to create an asynchronous reset signal.)
Now let’s work through the basics of creating a transaction request.
isn’t very clear about what can happen to the
CYC is low. Indeed, it is somewhat contradictory. However, if we
STB can only be high when
CYC is also high, then it
simplifies the slave’s logic–as I discussed at
Since this logic is usually easy to guarantee within the
master–requiring no extra logic cost, we’ll insist upon it as a
formal property here.
The specification is silent regarding
whether or not the bus request has any meaning while the
STALL line is high.
However, some slaves need to peek at the
to know how to handle the next transaction.
For these slaves, we insist that once a request is placed onto the
it cannot be changed until it is accepted. Hence, if a request has been made
but the stall line remains high, then that same request must not change
on the next clock cycle–at least not until either an abort or the request
The specification says nothing about
whether read and write requests can be mixed or not. However, I am not
familiar of any situation where that makes any sense. Therefore, we’ll
specifically prohibit the
WE (write enable) line from changing between
one request and the next.
To go one step further, we’ll insist that the
WE only change when there are
no outstanding requests–such as in a “read-modify-write” cycle.
SEL (byte select) line(s) are an indication of which bytes within a word
should be written to the device. To be meaningful, any write transaction
should assert one or more of these bits.
From here, let’s move on to the lines returned from the slave–primarily the
We’ll start by insisting that these two signals should never
be asserted unless the master is in the middle of a
CYC is high.
The one exception to this rule is in the case of an abort, where
low before the slave has a chance to respond. Thus, if
CYC was low
(i.e. no bus
cycle and no abort), then both
ERR should be low on the next clock.
makes it clear that both
ERR signals may
never be true on the same clock.
The next two properties get into counting. They have to deal with how long a slave can take to accept a request, and how long the slave can take to respond to the request once accepted. Both are predicated upon some parameterized number of counts. If the respective count parameter is non-zero, then the time limit property is applied.
For the first property, we’ll assume that the slave can only stall the
F_MAX_STALL counts–no more. The purpose of this is just to
help to limit the search space for the formal verifier. It is optional and
not strictly necessary, but you may find it to be useful.
The next optional property is the number of cycles to wait until the next
acknowledgement is received. If
F_MAX_ACK_DELAY is greater than zero,
these cycles will be counted and an assumption will limit this count so that
there are always this many or fewer cycles between acknowledgements.
When I initially started proving bus properties, these two parameters,
F_MAX_ACK_DELAY were very important for limiting how many
states the formal solver needed to examine. In hind sight, I think the
three output ports can remove the need for these two properties–but I’ve left
these checks in place for the time being.
We still need two more counters. The first,
f_nreqs, will count the number
of requests that have been made and accepted by the slave, while the second,
f_nacks, will count the number of acknowledgements returned. Both of these
counters will be returned to our parent module as outputs of this module.
The number of requests accepted starts at zero, and returns to zero
upon any reset or the end of any
cycle. While the
cycle is active, this number will accumulate any time the
STB is true
STALL signal is not–the indication that a
transaction request has been made.
f_nacks counter is almost identical to the
f_nreqs counter, save that
the number of acknowledgements increments any time
The difference between the number of requests and the number of acknowledgements
is the number of outstanding requests. Since
f_nacks are both
registered, we avoid a second delay here by not registering
The result, though, is that we may need to trim it suddenly to zero anytime
i_wb_cyc is dropped.
Some bus masters are known to only ever request a fixed number of values on any transaction. For example, the ZipCPU has a prefetch module that will only ever request one item from the bus, a dblfetch module that will only ever request two items, and a pfcache module that will always request exactly one cache line.
For these masters, the
F_MAX_REQUESTS parameter can be used to limit the
formal proof and force this maximum request limit.
F_MAX_REQUESTS variable has been set to a value greater than zero,
then we insist that the number of requests and the number of acknowledgements
are always both less than this value. This assertion is set into two parts.
STB is on then a new request is pending so the number
of requests must be less than
F_MAX_REQUESTS. Second, if
STB is false,
then the full number of requests may have been issued. Further, we’ll also
insist that the number of outstanding requests remains one less than
F_LGDEPTH is the number of bits in our counters.
This guarantees that
f_outstanding will never roll over. It also requires,
however, that the log, based two, of the maximum number of outstanding
transactions must be passed to our core in
F_MAX_REQUESTS is not specified, we’ll assume that the number of
outstanding requests is not allowed to roll over the maximum value and
back to zero.
The specification makes it very clear that there shall only be one acknowledgement (at most) per request. To capture this requirement, we simply insist within our formal proof that if there are no outstanding requests, then there shall be no incoming acknowledgements.
Here we switch from requirements for all bus masters to the first of several optional properties. These options are controlled by parameters to the formal master properties module. The options insist upon properties not required by the protocol, but which may be worth asserting for some masters. As such, they are convenience properties only, and therefore default to not being part of the test.
The first of these is the source option. If a
master is the source of a
request, as opposed to being an
or adapter mid stream, then the request should start with both
STB lines going high together. Since the
doesn’t require this, we list it here as optional.
F_OPT_SOURCE is a master only parameter option, however. By the time a
interaction gets to the slave, the
CYC line may go high or low without
actually affecting the
STB line of the slave.
The next option is the read-modify-write option,
this option is not set, the master must drop the
CYC line following the last
This doesn’t apply, though, to those buses that may wish to hold the bus open (locked) between two transactions–such as a CPU requesting an atomic increment operation. For these masters, the option should be left on.
In many ways, the master doesn’t care what happens on the
lines if the cycle line is low or if there’s no acknowledgement coming back,
so restricting these wires to known values makes a lot of sense. This is the
purpose of the
F_OPT_SHORT_CIRCUIT_PROOF option. If set, it helps to limit
the formal search space and thus it is intended to make the proof simpler.
On the other hand, if something within the bus master does depend upon these values (when it shouldn’t), then we might want to know about it. For this reason, the option default is not to short circuit the slave’s responses.
While not all masters are likely to make discontinuous requests, some might.
Therefore, we allow an option,
F_OPT_DISCONTINUOUS, that can be set to
allow discontinuous requests. If this option is not set, then once the
STB line is dropped, we’ll insist that it cannot be raised again without
also dropping the
Since any master that implements a “read-modify-write” cycle will need to
STB line after it has been initially dropped,
the generate above attempts to capture these conditions.
This post has been written following the lessons I’ve learned making formal proofs for the ZipCPU prefetch modules [1, 2, 3]. I’ve also now tested other interactions with these properties, such as those of a priority arbiter or those of a Wishbone bus delay component, and I’ve adjusted many of these components with the lessons I’ve learned in this process. As a result, we may yet come back to this topic of formally verifying a Wishbone bus master, but now that we’ve presented the basics any following posts will reflect how these basics are applied to particular application-specific situations.
Until then, here’s a fun thought: on a recent application I have been working on, I had the option of using an Avalon bus. Since all of my tools are (so far) written for the Wishbone bus, I needed to either rebuild my tools, or to build a bridge to cross from the one to the other. Every transaction needed to go through this bridge, so it is a very critical component of this design. Formal methods, such as the ones we’ve discussed today, have given me confidence, even before placing this bridge onto actual hardware, that it will work in practice when I do.
Every word of God is pure: he is a shield unto them that put their trust in him. (Prov 30:5)