Creating a Simple AXI-Lite Master for the Hexbus
This post continues our series over the last three years looking into AXI and AXI-lite interface design. Normally I’d take a moment to recount all of the various articles in a series as background to any new article, but if you check the topics page, you’ll see I’ve now written over 25 AXI articles. These include a discussion on how to build an AXI-lite slave, a high performance AXI (full) slave, how to debug an AXI stream based design, and even how to build both an AXI-lite master as well as how to modify a general purpose AXI-lite master for AXI (full) performance–to include exclusive access but not burst performance.
Today, let’s look into extending our debugging bus design with an AXI-lite back end.
If you’ve followed my blog from the beginning, you might remember that I’ve spent quite a bit of time discussing what I call a debugging bus early on. As I use the term, a debugging bus is a way of accessing the bus within a logic design from a remote host. Typically, I do this over a serial port, b sending special commands to the design, although I have transport systems that will work well for both SPI and JTAG as well. The design then decodes the various characters sent across the link into bus read or bus write requests, issues the requests of the internal bus, and then returns the results.
Why would you ever want to do something like this? Wouldn’t it make more sense to just issue the commands from a soft-core CPU within the design? Well, there are actually a lot of reasons why you might want to use a debugging bus. For example …
-
You might be building a CPU. Until that CPU works, a debugging bus can give you a strong confidence that the rest of the design works. You can even use the debugging bus to pre-load the flash or RAM for your CPU before releasing the design from reset.
-
This applies just as much to all of those Vendor CPU’s as it does to your own homebrew CPU. Once you place that CPU into the FPGA, you lose almost all insight into what’s going on within the FPGA. For example, what if your CPU wasn’t getting the interrupt you were expecting? Well, why not go and just query the interrupt controller on the bus to see what’s going on?
-
When working with an external piece of hardware, and until you have that hardware “under control”–where your design interacts with it properly and they way you expect it to–sometimes you have to work with things to figure out what’s going right (or wrong) with the interface. A good example might be my work in a Quad SPI flash controller. Being able to explore “what-if” scenarios from a command line can be quite powerful. (What if I have the timing delay messed up, and it needs to be three clock cycles instead of four? Let’s try that …) By using a debugging bus to find and fix problems, you won’t need to take the time to rebuild your FPGA design until you know what was going wrong with the current design.
Indeed, and as an example, someone recently tried out my Quad SPI flash controller. So far, he tells me that the controller works as long as he only uses the memory mapped I/O port. However, without being able to shut the CPU down and run ad-hoc queries, he’s been struggling to figure out why the flash won’t handle his arbitrary access commands. A proper debugging bus interface will help this individual.
-
You’ve seen me discuss how a debugging bus could be used to debug a signal processing chain by inspecting histograms or even taking spectral estimates of what’s going on within that chain. All this can be done from an external computer via commands sent over a debugging bus.
-
Of course there’s also my own favorite use for the bus: getting access to a bus-based internal logic analyzer, such as my Wishbone Scope. (Don’t get hung up on the term “Wishbone”. Yes, there is now an AXI-lite version of it, and even a virtual AXI (full) version which can use SDRAM as a back end.)
Such a bus-based scope capability requires you to have access to your design from an external location. If you can get access to the design externally, you can then command the scope, adjust the window location with respect to the trigger, and then read back the results to tell you what’s going on within the design–even potentially after the CPU has locked up.
Once you do start using that soft-core CPU of yours within the design, you can then also script the logic analyzer from within the soft-core CPU’s software to capture according to whatever your software is doing by just writing to the bus. Indeed, I’ve been known to do that with my CPU test script, to provide me with a trace should any individual CPU test fail–but let’s not get ahead of ourselves today.
In the military, we might say that such a “debugging bus” gives you the ability to “command and control” your design. You can also use it to get “telemetry“-like data back from a running design. Okay, the analogy doesn’t quite work–Telemetry is a “push”-based system, always broadcasting information to listeners, whereas a debugging bus requires a bus master to “pull” any desired information–but it’s still a matter of getting debugging information from within the system under test. Perhaps a better analogy might be “micromanaging” an interaction, but we won’t push a bad metaphor quite so far.
The Hexbus Design
When we started talking about a debugging bus, I offered an overview of the debugging bus I’d been using in my own designs–one I’ve called my “wbubus” since it offers a “Wishbone to UART” conversion. Data would come in, get decoded–possibly even decompressed, head into a FIFO, and then from there commands would issue to the bus. Results would then get formed from the bus executor and sent into a FIFO, from whence they would be mixed with an interrupt or idle signal, compressed, and then recoded back into bytes that could be sent back across the serial port. We then built, together, a second debugging bus design that I called the “hexbus” design since it is designed around a simpler hexadecimal encoding. You can see the block diagram for this “hexbus” in Fig. 3 below.
That “hexbus” was meant to be a demonstration only design–showing you how it might be done. It was built around a very simple hexadecimal encoding that could just about be read and debugged manually. Together, then, we walked through all the pieces of it from converting the incoming characters into 34-bit command words, issuing those commands across the bus, and then recoding the 34-bit command results back for transmission across the serial port. My intention, however, was always to throw away the “hexbus” implementation when I was done. It was only meant to be a demonstration design after all.
That was until I tried working with an iCE40. No matter how hard I tried, I couldn’t seem to fit my full featured wbubus debugging bus onto an iCE40 HX8K together with the ZipCPU. The two just wouldn’t fit in the same design at the same time. The “hexbus” on the other hand was simple enough to fit. Using the hexbus for debugging, the entire design, CPU + hexbus, currently fits in 4,659 LUTs–small enough that I could probably go back and retrofit it with the wbubus now. It’s not the smallest iCE40 design, but debugging it isn’t all that hard. In other words, this throw-away design has now been well loved and well used.
For today, however, the key detail is that the “hexbus” design has always been fundamentally a Wishbone design.
What if we wanted to give it an AXI-lite capability instead? This will be the topic of today’s article.
The Hexbus Code
Just to review, there are a couple basic commands to the hexbus encoding, as illustrated in Fig. 5 on the right. The address can be set for following transactions by sending an “A” followed by up to 8 lower case hexadecimal characters. A read request consists of a simple solitary “R”, whereas a write request starts with the letter “W” and is then followed by the hexadecimal value to be written. Further, I chose to use white space characters as command separators or synchronization characters if and when needed. Hence, both address and write commands can end with a white space character. They can also end with any other non-hex character, such as the beginning of the next command.
By the time these commands arrive at our new AXI-lite bus
master,
they are bundled into 34-bit words as shown in Fig. 6 on the left. Commands
are determined by the first two bits of those 34-bit words. 2'b00
prefixes a read request, 2'b01
a write request, 2'b10
a set address
request, and 2'b11
is either a reset request (handled earlier) or a don’t
care.
The commands themselves arrive via a basic stream protocol, as shown in Fig. 7 below. Once the bus command is complete, a response is then generated and sent via a similar stream protocol to the next block in the processing chain–the difference being that there’s no back pressure on the outgoing responses.
One of the challenges, and indeed vulnerabilities, associated with the hexbus design is that there are no FIFO’s anywhere in the hexbus protocol. Remember, this protocol is designed to be simple, and to fit on really small hardware. This means that the stream protocol and handshakes shown above in Fig. 7 are a misnomer: hexbus can’t handle overflow anywhere in its processing. The downstream processor must be ready to accept any response value provided. The upstream source can only delay values by one or two clock cycles at the most. Further, it is the responsibility of the host software, not the RTL, to guarantee that there are no overflows in actual operation.
This block diagram in Fig. 7, together with the command protocol shown in Fig. 6 above, is where we’ll start today’s design discussion from.
Building the AXI-Lite Bus Master
The key feature of this AXI-lite master that we’ll be discussing
today is
not so much that it’s implemented internally as a state machine, but rather that
we’ll encode our current state in the AXI-lite signals themselves: On any
write request, we’ll set AWVALID
, WVALID
, and BREADY
and then hold
BREADY
high until the BVALID
acknowledgment. Likewise, on a read request,
we’ll set ARVALID
and RREADY
and then hold RREADY
high until we receive
our RVALID
response. The “Idle” state will therefore be encoded as
!BREADY && !RREADY
.
We’ll expect one of two paths from idle back to idle, as shown in Fig. 8 below.
Let’s start by decoding our incoming command. We have three possible values that can come into our core that we need to worry about, as shown in Fig. 6 above. (Reset is handled elsewhere in the stack.) Either we want to process an address, a read command, or a write command. From this we can create one of three flags, with two caveats. First, if the incoming strobe (valid) bit is low, then there’s no command ready at the input, and second, if we are still busy with the last command, then we can also ignore any incoming requests.
This should be familiar as your basic VALID
/!READY
handshake that we’ve
discussed
often enough before. The difference here is that this custom protocol doesn’t
require that the ready logic be registered, so there’s no protocol requirement
for any skidbuffers.
Now we can start figuring out how to process these commands.
Address Processing
The first, and perhaps easiest, command to handle is the address command. If ever we receive an address word, we’ll want to set the bus address. Then, later, when we receive an actual read or write command we’ll acknowledge the address back across the channel. That means we’re going to need to keep track of the current bus address, as well as whether or not we want to acknowledge a new address.
So let’s break down, now, how we’ll handle a new address command. In general, we’ll just set our outgoing address word.
Well, not quite. As it turns out, that’s a nice first pass, but we can do better with just a touch of compression. Let’s use the two lower (unused) address bits as a compression scheme, as illustrated in Fig. 6 above: one bit will indicate an address difference, whereas the second bit will indicate whether or not we increment addresses between commands.
First, bit 1
. If bit 1
is set we’ll allow that this command word encodes
a difference and we’ll adjust our address by this difference. Otherwise we’ll
set it as above.
Synchronizing the initial address will be a task of the software address encoder: the first address given to the hexbus will never be a difference address, whereas difference addresses may be used for subsequent address requests if they reduce the number of bytes that need to be transmitted for any new address.
Bit 0
on the other hand will be an increment indicator. If we leave it zero,
then we’ll naturally increment our address from one request to the next.
Otherwise, if one, we won’t adjust it from one request to the next at all.
Either way, that means we’ll need to store this value away for later.
This also means that you can set an address by hand and have the core mostly just “do the right thing.”
We can also use a flag, newaddr
, to indicate that the next results from the
bus will be the result of reading or writing to this new address.
Now, any time an address word gets accepted by the bus, we’ll
increment the address if this inc
rement bit is set, or otherwise just leave
it the same.
Likewise, whenever we get a new read or write command that will use this new address, then we’ll then send a copy of the address over the link at the same time we issue the bus command. That means we can clear our new address flag at that time as well.
We can also use the same logic for the read address, and so just copy the read address value from the write address register.
Sometime later, we’re going to need to come back to this and make certain that, upon a read or write command, the address response gets sent back across the bus. We can make a mental note of that to ourselves now by simply adding a formal property to our design:
This simply states that the first step of processing any read or write command,
that is on the first clock following i_cmd_rd || i_cmd_wr
, we must
acknowledge any new/updated address–but only if the address had been changed
since the last read or write command.
Write Processing
The next step is write processing. If you’ve never built an
AXI master before,
this will be easier than you think. Indeed, the way we’ll build this below
it’ll be really easy. We’ll control
the valid signals, the write data, and then return an acknowledgment on success
or failure. Oh, one more thing–we’ll set BREADY
to note that we are no
longer idle, and now expecting a BVALID
response.
First, we clear everything on reset. This is a necessity. AXI requires a reset, so let’s make certain we implement it here.
The next step is going to look a bit backwards. Chronologically we’d set
AWVALID && WVALID && BREADY
on any write command. I’m instead going to start
with the last half of the operation, and say that if we are waiting on a write
response then …
- We should stop waiting if/when we get that response.
AWVALID
andWVALID
should also each be cleared independently when their respectivexREADY
signals go high.
This is really the biggest gotcha of building an AXI4-lite interface: the write
address and write data channels aren’t synchronized at all. Sure, we’ll
synchronize them both to start of this transaction, but either one of these
two channels may get accepted before the other. This is captured by the fact
that both of these signals are handled in the same logic block, although in
separate if
statements.
That’s the end of processing the burst. Seriously? Yeah, it really is that easy. No, we haven’t gotten to the write data yet–but that’s even easier.
For now, let’s step back and look at how we would generate a write request in the first place.
- On any write request from our interface, we set all three signals high,
AWVALID
,WVALID
, andBREADY
. Remember, these signals are also encoding our state machine. We won’t return to idle again untilBREADY
is cleared.
That leaves only two signals left for the write half, WDATA
and WSTRB
.
In the case of WSTRB
, it’s easy: the
hexbus only supports
full 32-bit word accesses–this is no different from the
wbubus or any
of my other debugging buses. As a result, there’s no way to access an 8-bit
byte within any 32-bit word using the protocol we defined above in Fig. 6.
For this reason, we can just leave WSTRB
as all ones: any write will write
to all four bytes at the same time.
The second piece is almost just as unremarkable: if we aren’t busy, then we can set the write data based upon any incoming command.
There’s just one problem with this: how much downstream logic will get driven
every time i_cmd_word
changes? There’s a cost in terms of power to every
wire that has to change. Thus, although this is a low-logic solution, there
is also a low power solution.
Perhaps the ultimate low power solution would be to only update
WDATA
on a new write request.
I’ve also been experimenting with forcing value to zero when not in use, for much the same reason. In that case, we might try:
Either way, the point is that following a write request, we want to make certain that we are then driving the bus based upon that request. A simple assertion at this point in the design can help us describe this.
There’s just one thing we’ve skipped, and that’s creating the write return response. We’ll come back to that in a moment, though, following the read state machine.
Read Processing
As it turns out, reads are even easier than writes. On a reset, we clear
ARVALID
.
While waiting for a response, we’ll clear ARVALID
on any ARREADY
.
Once we get our read response, we’ll clear RREADY
–sending us back to our idle
state.
But how shall we begin any reads? Simple! If we are in our idle state, then start a read on any request.
We can even capture this thought in a simple assertion.
While this sort of ad-hoc assertion isn’t sufficient to pass induction, it’s certainly good enough to get us started when we get there. Actually, when we get there below, I’m going to continue using immediate assertions–they’re a bit more verbose, but they can have the same effect without many of the serious drawbacks associated with formally verifying concurrent assertions.
All that remains is to grab and return the response to then be sent to rest of the debugging bus design.
Return Processing
Now that we’ve run the bus and accomplished our transaction, it’s important that we return a proper response downstream. In this case, we’ll want to send one of several words down the debugging bus processing chain depending on both our state, and the response we just received from the bus:
- Following a system reset, we’ll immediately send a reset confirmation downstream
- On a write response, we’ll send a write acknowledgment
- On a read response, we’ll need to send the RDATA value that the bus returned
- Finally, on any new address, we’ll send that new address downstream the first time it is used
Now, how shall all these values be encoded? I’ll admit, I spend far more time thinking about this than perhaps I should have.
If you’ll remember, some time ago I discussed minimizing logic usage when the question was how to select from a number of potential sources–each with a valid flag. The answer I came up with at the time was to pre-calculate an index, and then to use a case statement based upon that index to determine a return value.
An alternative approach that came up in a twitter thread with Clifford was to use a for loop, but in such a fashion that it would simply collapse into a sum of products. For example, if you know that only one ACK value will ever be true at a time, you might write:
Notice how the result doesn’t depend upon any multiplexers: it’s just a giant
OR statement–a “sum” (i.e. OR) of “products” (ANDs). As long as you, the
designer, can ensure that the ack
vector will only ever be one hot or zero,
then this approach can work well.
Indeed, this is the approach I chose to use for the response word, returning data to our debugging bus processing chain. I started by initializing this response word to zero. Then, on any write return, I set the response word.
Note that there are two possible returns here: either there’s been a bus error, and a bus error, return needs to be generated, or we are simply acknowledging that a write has been completed.
Given that this is the first potential value of the response word, there were no “OR” values here–at least, not yet. For the first word, we can just set things independent of any prior value in the chain.
We can then move on to any read response. Here things change subtly. Unlike
if BVALID
was true above, where I could force the prior value of rsp_word
to a known value, in this case of RVALID
I might need to set rsp_word
to
completely different response word. In this case, the synthesizer would never
know that RVALID
would only ever be true if BVALID
were not. So, I used
the “OR” approach outlined above to capture the idea of merging these two
return responses.
As a final potential return value, the response word needs to contain any new address the first time we use it. As before, we’ll simply OR this together with the prior values.
The neat thing about these “OR” functions is that they don’t create long
multiplexer chains.
Further, because this rsp_word
, however, is built of “OR” functions, the
practical reality is that you must build it in an always @(*)
block. Within
the always @(*)
block, rsp_word
is allowed to reference the last value of
rsp_word
–something that would not work in an always @(posedge CLK)
block.
That also means that, now that we’ve built our response word, rsp_word
,
we now need to register it in a second step.
For those who know me and the logic I write, you’ll know that I don’t normally
use two process blocks. The complexity of rsp_word
above, however, is
enough to force us into a two process implementation.
Hence, here’s the second process. It starts with the reset.
On any system reset, our first response down the processing chain will be to acknowledge that reset.
Otherwise, we’ll send a response downstream on either any response from the bus, or any time we get a read or write request after a new address has been set.
The final step is to set the response word that will be valid if ever
o_rsp_stb
is also valid. This is the data word, set above, that will
be qualified by o_rsp_stb
and ignored any time o_rsp_stb
is zero.
This ends the basic AXI-lite bus master implementation. A couple things to note:
-
We kept this simple, by limiting ourselves to no more than one request at a time. AXI-lite can handle many more, but our goal here was simplicity.
-
We encoded our “state machine”’s state in the various hand shaking signals used by AXI-lite. While this may not feel like a conventional state machine, it is technically a state machine. Even better, the approach is both simple and effective.
Although this design was intended for use with a debugging bus implementation, the unexpected reality is that we could use this approach to script any AXI-lite interaction we wanted to create. In other words, this simple approach is quite a bit more powerful than I had originally intended.
Verification
Let’s do verification the easy way. Any time you need to verify that an AXI-lite implementation “works”, the easy way to verify it is to grab a copy of the formal AXI-lite model and then to simply instantiate it within your design.
There’s a couple of configuration notes to setting this up. First, we only
need two bits to be able to count up to the maximum number of transactions
on the bus. Hence, we’ll set the F_LGDEPTH
to 2 and define a couple of
values to connect to our model having this width.
We also need to set the address width (C_AXI_ADDR_WIDTH
) and data width
(C_AXI_DATA_WIDTH)
of the property set. We’ll
allow the design to assume the existence of a reset (F_OPT_ASSUME_RESET)
,
while also not requiring that reset to be a full 16 clock cycles
(F_OPT_NO_RESET)
. (Xilinx’s AXI implementation notes require a long reset,
even though most of their IP does not.)
From here, the rest of instantiating the AXI-lite properties are very straightforward.
At this point, we should be able to start running and passing proofs. Induction will take some more work, but we’ll get to that in a moment. Even better, this design is so simple that 20-40 clock steps should be sufficient for any non-induction proof.
This is also the point where I tend to start throwing assertions at the wall,
just to make certain that things I’ve assumed during my design really are true.
For example, we chose above to capture our “state” in BREADY
and RREADY
.
Our goal was that if we were ever working on a write, then BREADY
should
be true, and if we were ever working on a read the RREADY
should be true.
If neither are true, then we should be idle. This also means that both should
never be true together.
Let’s break this down a bit more, though. If BREADY
is false, then we are
not in the middle of any write transactions. The number of AWVALID
s that
have taken place without seeing any corresponding BVALID
are zero, and the
same can be said for WVALID
s. Not only that, but if BREADY
is false, then
both AWVALID
and WVALID
should be zero–since we’re not in the middle of
any write transaction either.
Where things get a bit more interesting is when BREADY
is true. In this
case, we’ll have a write address request outstanding if AWVALID
has been
accepted and dropped. The same will be true of a write data request should
WVALID
have been accepted and then dropped.
Indeed, this is often all I have to do to verify the write half of an AXI-lite interface. It’s pretty easy, and nearly boilerplate.
The read half isn’t all that different either.
If RREADY
is low, then we aren’t trying to read and so both ARVALID
and the number of read requests outstanding should both be zero.
If, on the other hand, we are reading and so RREADY
is high, then
either ARVALID
is one or we have exactly one read request outstanding.
How about reset? Following a reset, we should be able to assume that nothing
is incoming. Likewise, following a reset, we should be in our idle “state”
with both BREADY
and RREADY
low.
Did you notice how we only checked BVALID
above if BREADY
were also true?
Or likewise we only checked RVALID
if RREADY
was also true? Let’s add a
quick property to help guarantee that neither BVALID
or RVALID
will ever
be true unless we are expecting them. (This should also be captured by the
properties above, but an extra assertion or two won’t hurt anything.)
So far, we’ve focused primarily on the AXI-lite interface. Indeed, the above is really all that’s required to verify an AXI-lite interface. There’s literally nothing more to it.
In the meantime, though, I’d like to assume the stream properties our incoming
interface. This interface is essentially an AXI stream interface, although
the labels are a bit different. For example, we used a busy instead of a
ready–but the principle remains almost identical. Hence, following any
reset, we can assume that the STB (VALID) goes low. Second, following any
STB && BUSY
(i.e. VALID && !READY), pending requests need to remain that:
pending and without change.
That leaves us with one last property: that our BUSY signal will be true any
time either BREADY || RREADY
.
This, however, is one of those “Do I really need this?” assertions. Why?
Because we defined o_cmd_busy
as BREADY || RREADY
. Why then have an
assertion to verify this?
Do we need such an assertion? Probably not. I’ve placed it in here, though,
to remind myself that o_cmd_busy
has a specific definition. There will be
consequences should I ever try to change it in the future. This is just a
reminder of that–something to force me to think a touch harder before ever
adjusting this value.
Contract checking
Now that we know our AXI-lite interface works, let’s turn our attention to the specific functionality of this design. Specifically, we want to know not just that the design will follow the AXI-lite rules of the road, but also that it will do what we want it to. So, let’s check some contract rules.
For example, we want to assert the newaddr
flag following any requested
address, but also to guarantee that it returns low after we issue any bus
requests.
Following this further, on any request to read or write following a new address request, we should also be producing a downstream response acknowledging the new address.
Finally, the new address flag should be low while any request is pending.
How about resets? Following any reset, we said we wanted to produce a reset response output. Here, we’ll just double check that this happens.
We can then check for write acknowledgments following BVALID
and
BRESP=OKAY
.
Read acknowledgments are also (nearly) identical.
The last response we might return is a
bus
error.
In this case, if xRESP
is every anything other than OKAY
, then it’s an
error. (AXI-lite doesn’t allow
xRESP
to ever equal EXOKAY=2'b01
.) We don’t care, here, if it’s a slave
error, 2'b10
, or an interconnect error, 2'b11
–a
bus
error return is a
bus
error return as far as
this
protocol goes.
At this point, we should have good confidence that our design will always return the values downstream that it’s supposed to.
Cover Checks
This leaves us one last verification step. So far, we’ve proven that this design will follow the AXI-lite protocol. We’ve proven this via induction. We’ve also guaranteed that the design will properly return appropriate values down stream based upon what’s going on within.
What we haven’t done is to prove that responses are still possible.
I’ve just had one too many designs where I’ve convinced myself that the design
works when, for one reason or another, I’ve made one too many assumptions to
keep the design from working. For example, I once assumed reset
was always
true. It was amazing how quickly the design passed a formal check, and just
as disheartening to see that it never worked in simulation or hardware.
A good cover check will help guarantee we haven’t made such mistakes.
Therefore, let’s see if we can complete several writes and reads.
The first step is to count the number of writes that complete. In this case, let’s count how many writes in a row we can go through–while disallowing any reads.
Our goal will be to accomplish four writes before returning to idle.
You can see how well we did in Fig. 9 below.
In this case, there are four write requests, and six responses forwarded downstream. The first response acknowledges a reset, and the next acknowledges the new address. These two responses are then followed by a regular write acknowledgment, and then (bonus!) three bus error acknowledgments.
This is also the place where I usually measure throughput as well. In this case, the throughput is horrible: one word can be written every three cycles. It’s worse than that, though, since this doesn’t capture any interconnect latencies.
On the other hand, the purpose of this design was never throughput–it was low logic, and a basic demonstration of an AXI-lite master. We’ll come back to the logic estimate in a moment to see how well we did there.
For now, let’s repeat this test with reads. Can we cover a set of four reads in a row? The first step is to count them–much like we did before.
Now let’s let the formal tool find us a sequence showing how four reads might look in a row, once we’ve returned to idle.
You can see the result of this exercise in Fig. 10 below.
As before, we’re getting about a 33% throughput. There’s a reset acknowledgment, a new address acknowledgment, a read response, and then three read bus errors. The 33% throughput isn’t great, and it’s certainly nothing to write home about. But, as before, our goal is low logic and this is certainly that.
Conclusion
I’ve now mentioned several times that our purpose is low logic. How low, therefore, did we get? A quick Yosys run shows that this simple and basic AXI-lite design requires no more than 148 4-LUTs. Not bad for an iCE40, no? Indeed, the entire AXI-lite verseion of the hexbus on an iCE40 (minus the serial port) requires no more than 349 4-LUTs.
Surely 349 4-LUTs can be easily hidden in a larger design, no? Surely it’s a small price to pay for ad-hoc, external access to the bus within a design? Other costs, however, will always add up. Don’t forget that, in addition to the missing serial port cost (about 135 4-LUTs), there’s also the cost of adding yet one more master to the internal crossbar–something that can run upwards of 1500 4-LUTs by itself alone.
Still, this does make for a very low logic AXI-lite master. Remember our last AXI-lite master implementation? That was a bridge from Wishbone to AXI-lite. Comparably, it requires 118 4-LUTs to the 148 4-LUTs used by today’s controller. The big difference with this controller, though, is that this one is intended for scripting. Therefore, there are fewer wires used to control this master.
Better, because this controller can be easily scripted, its uses go well beyond the debug bus implementation it is designed and presented for.
Before leaving, I should also point out that neither the hexbus nor the wbubus is the end-all in debugging bus implementations. The first can transfer, at best, one 32-bit word every 10 bytes (100 baud intervals). The wbubus is better, but it can transfer one 32-bit word in six bytes (60 baud intervals, or 40% faster–before compression). I’m currently working on a newer version of the bus which will be able to transfer one 32-bit word in five bytes (50 baud intervals)–while still reserving one bit so as to multiplex a console channel over the debugging bus. Were I to implement it without console support, then the new bus implementation would be able to transfer (worst case) one word in 45 baud intervals. That’s a full 55% faster than the hexbus, and yes, times do add up when you are transferring large amounts of information. Indeed, that last couple of percentage points can amount to minutes of valuable transfer time.
As you can see, with a little bit of work, performance and throughput can and do improve over time–although getting that last little bit always tends to be somewhat of a challenge. Perhaps that’s just the reality of any engineering endeavor.
The fear of the LORD is the beginning of knowledge: but fools despise wisdom and instruction. (Prov 1:7)