Learning AXI: Where to start?

Someone once asked on Reddit, how should one go about learning the AXI protocol? The following summarizes my basic answer.

First off, don’t start with Xilinx’s example designs. Sorry, but their examples are horribly broken. Even their demo AXI Stream master is broken. It’s a shame that they’ve neither fixed these designs, nor updated their training materials to acknowledge that their basic designs are broken.

Fig. 1, Basic Roadmap for Learning AXI

Others have suggested that the best place to start is by learning handshaking, and most of the AXI stream protocol is just that: handshaking. I would agree. Therefore, I would recommend that anyone starting out begin by learning AXI’s handshaking rules. As that article will explain, the AXI stream protocol is little more than simple handshaking, and you can (mostly) ignore the TID, TSTRB, TKEEP, and TDEST signals. This is also where you’ll discover how Xilinx got their example AXI stream master messed up, and where you’ll learn how easy it would be to fix it.

Once you understand AXI handshaking, I’d then recommend learning about skidbuffers. Without them, you’ll never get better than 50% throughput without violating the AXI specification.

The next place I’d go would be to look into AXI-lite. Beware of backpressure! It has caused Xilinx (and others unnamed) no end of headaches, and forms the backdrop for many of the bugs in their example designs. If you want a working example design to start from, check out this example design that I often use myself when working with AXI-lite. You might also wish to look over this post, describing how to fix Xilinx’s (broken) AXI-lite VHDL example.

For most use cases, you can stop there. AXI-lite will get you just about everywhere you need to go. For most of the things you might need the full AXI specification for, you can already find example open source or vendor designs that’ll work. (DMAs, MM2S, S2MM, virtual FIFO, video frame buffer reading, video frame buffer writing, etc.)

If you are interested in moving past AXI-lite, then it’s time to understand AXI addressing, and the various FIX, WRAP, and INCR addressing modes and how the AxSIZE field impacts them. This is important. Xilinx didn’t even try to get this right in their example, and I’ve seen plenty of ASIC designs that even get this addressing messed up. You will need to understand this before diving into building your first full AXI slave. Indeed, I’ve used the next AXI address module built and presented in that article in many of my own designs.

Once you understand addressing, or at least once you’ve simplified it enough that you can work with it, then the next step would be to build a fully capable AXI slave.

What will you gain by writing an AXI slave over an AXI-Lite slave? Not much. Seriously. There’s not a lot of performance gain to be had by building an AXI (full) slave over that already gained by building an AXI-Lite slave–at least, not much gain to be had for most uses. What performance difference might you see? Well, following a good AXI to AXI-lite bridge, you might find yourself loosing about 2 clocks of latency per transaction. That’s it. Following a poor AXI to AXI-lite bridge? In that case, you might lose 4-8 clocks of throughput per transaction. Of course, you could always switch back to a better bridge to recover this lost throughput–so there’s really not a lot to be gained by switching from an AXI-lite slave to an AXI (full) one.

When it comes to AXI masters, however, that’s a different story. Still, I would similarly recommend you start with an AXI-Lite master. Technically, such a master should be able to be just as fast as an AXI full master. Practically and sadly, many designs cripple their AXI-lite implementations. (Hello, Xilinx?)

A full discussion of AXI masters gets difficult. It’s hard enough that I haven’t (yet) figured out how to simplify the material enough to write a post on how to build a general purpose AXI master–the addressing is just that hard to get right in general. (It usually takes me a couple of days to get right–even when building my own.) However, you are welcome to examine some of the AXI masters I’ve written and posted if you’d like.

Among those AXI master examples are two worth mentioning here since I’ve written articles about them. The first discusses how to build a memory controller for the ZipCPU using the AXI-Lite protocol, whereas the second discusses the modifications necessary to upgrade that memory controller to AXI (full). This second article goes over the AXI Exclusive Access protocol (AxLOCK, and EXOKAY), and then how to go about building a master that uses it–although I only really know of CPU use cases for such a protocol. It also discusses some of the challenging interactions between AxADDR and AxSIZE.

If you are really going to dive deeply into the AXI protocol, then it will quickly become important to know how to measure AXI performance. Just what kind of performance are you achieving, what is possible, and what can you expect are all good questions you’ll want to know how to answer.

The above will get you most of the way. However, it will also leave you with questions about what AxCACHE, AxPROT, and AxQOS are for, or when you should use the AxID field. Indeed, you may leave wondering about AxSIZE as well, and why it’s an important part of the protocol. For a discussion of these, let me point you to a reddit question of my own from some time ago: is AXI too complicated?

Formally Verifying AXI

Not that long ago, I was asked about the possibility of writing a course on how to formally verify AXI components. At the time, I sketched out the following outline for such a course–an outline that primarily matches most of the progression above.

The course would start with a quick review of formal methods: what are assertions and assumptions, and what are some of the unique challenges associated with induction.

Fig. 2, Lessons that might compose a course in formally verifying AXI components

Indeed, when you get to AXI full, induction becomes a necessity. AXI is sufficiently complicated to limit a bounded model check to somewhere between 20-40 cycles. As a result, no bounded model check will be sufficient to verify one or more 256-beat bursts. A complete proof, therefore, requires induction. It’s important here to understand how it works, and what challenges you might expect when working with it.

As above, the next step would be to look into AXI Stream, and in particular how to handle Handshaking and skidbuffers. Specifically, I’d go over the assertions necessary to describe an AXI handshake, and the need for skidbuffers. The lesson would end with a skidbuffer exercise of some type–perhaps simply requiring a student to build their own.
The next step would be AXI-Lite. The big difference between AXI-Lite and the bare handshaking discussion is that you need to count outstanding read and write requests when using AXI-Lite. Every read request requires one (and only one) response. Write requests are similar and also require a single response, however a write request is not complete until there’s been a request on both write address and write data channels. In general, though, the only thing we’re adding above and beyond the basic handshaking are some simple counters as well as some assertions tied to those counters.

An exercise for this portion of the course might involve verifying a given AXI-Lite module.
While AXI-lite is a great protocol for register handling, it’s also important to know your design handles registers properly. Therefore, I’d dedicate a lesson to discussing how to verify that registers are read or written correctly.

A good exercise here would be to modify the exercise from the AXI-lite lesson, so that it verifies the register within.
We can now discuss AXI addressing.
It would then be time to dive into AXI full. This topic is so large, however, that it really needs to be broken up. Therefore, I’d start with how to verify handling a single read burst. The basics of single read burst counting are pretty simple: you need to count the number of bursts that are outstanding, and the number of remaining outstanding items in each burst.

This might be where I’d introduce the exercise of verifying an AXI full slave.
This is also where the easy part ends. How, for example, shall you verify that the number of beats returned for a given burst read request is correct? That’s a touch harder–especially when you need to start tracking multiple outstanding bursts.

From there, I’d move on to discuss how to verify out-of-order returns, and how to handle verifying packet ID’s.

This leads to the task of verifying an AXI full slave that requires multiple beats to process requests given to it. A simple example might be an AXI SRAM controller, where the SRAM requires one (or more) clocks from request to response and where the read command can only be issued once.
Write handling is more challenging than read handling. Specifically, AXI write requests split the write address and data into two channels, and formally verifying both channels can require a bit of synchronization within the formal properties. Once the channels are synchronized, however, verification returns to being as easy as counting bursts and beats again.

Yes, there are a couple new requirements, although once the channels are synchronized these are minimal. Primarily, the extra requirements deal with those packets for which AWSIZE is less than a full word, or for which the initial AWADDR isn’t word aligned. In these cases, there’s the additional requirement that write beats only contain strobes for the correct bytes. For example, if AWADDR is odd, then the WSTRB[0] of the first beat must be zero.
Exclusive Access. Exclusive access is AXI’s method of handling atomic requests. These really aren’t all that hard to understand or model. Indeed, they are easier to deal with than read or write bursts. Still, I would place exclusive access handling late in the course, not because of how easy or difficult it is, but more because of how few things actually need it.

Fig. 3, It can be a real challenge to verify a design containing a FIFO

That brings us to the FIFO Challenge. FIFOs are fairly easy to verify on their own. Sadly, they become much harder to verify when they are used within something. AXI, however, is very much built around the concept of FIFOs. How to verify something that has a FIFO within it is something we’d need to discuss here.

Once you get past the FIFO challenge, it then becomes possible to build AXI components that can handle any number of multiple bursts at a time. When would you need something like this? When building an AXI interconnect or a DMA of some type. Both would make good examples of components that might need this technology.

At least, these are my current thoughts on what lessons I might teach were I to create a course in formally verifying AXI components. Given that every piece of commercial IP I’ve ever built has required some form of AXI interface, I wouldn’t be surprised to find such a course to be an in-demand topic. I would just need to find a way to clear enough time out of my schedule to create it.

AXI Design Exercises

When learning any new topic, its important to exercise your new knowledge as you learn it. Here’s a list, therefore, containing a progression of exercises with increasing difficulty that you might find valuable when learning AXI.

Fig. 4, Practice exercises, for use in learning AXI

Build and verify an AXI Stream component. A good example of this might be either a DSP component or perhaps a FIFO of some type. Perhaps the simplest example I might come up with would be a frequency shifter based upon an internal CORDIC.

Other examples include stream processing network packets–such as a stream component that might recognize, encrypt or decrypt a UDP packet.
Build and verify an AXI-lite bus slave

This project is actually pretty easy. I’d have you start with the EasyAXIL design, and then modify it for some purpose. Perhaps that might be to create a GPIO or UART controller from it, or to turn it into a basic timer peripheral of some type. Any of these exercises would be fairly simple, since the EasyAXIL design is just that easy to work with and from. Even better, it comes with all the details you need to formally verify how well it handles bus transactions.
The simplest bus master to build is a basic, scripted, one request at a time bus master using AXI-lite. This article presents both such a bus master, and a usage description of why you might wish to build one. The master is easy enough to verify, and so might make a good practice start at building your first AXI-lite bus master.

Indeed, a scripted AXI-lite bus master is not really all that much more complex than a basic AXI-Lite CPU memory unit, so building such a memory unit might also fit nicely here. From an exercise standpoint, however, the CPU memory unit has to deal with two protocols, both that of the CPU and that of AXI-lite, meaning that the CPU memory unit might be more complex than this exercise would require.
A more complicated AXI-lite bus master, and certainly one with more interest, might be to add a small FIFO to a CPU instruction fetch unit.

This exercise is only slightly more complex than the scripted memory controller of the last exercise. Specifically, what should the fetch unit do when, mid fetch, the CPU tells it that it no longer wants the values currently being fetched but instead wants to start over from a new address?

Before leaving this example, let me quickly outline the number of uses I’ve found for such a FIFO based AXI-lite bus controller. The most basic use is in scripting a scatter-gather DMA. I’ve also found it useful for scripting I2C or SPI transactions from a script in memory somewhere.

That would roughly exhaust the exercises needed to learn how to work with both AXI handshaking and AXI-lite. From here, it would be time to move on to the full AXI bus protocol.

The first (and simplest) exercise would be to build (and verify) an AXI (full) slave. A specific performance goal would be that this slave should be able to handle a throughput of one beat per clock–even when crossing the boundaries between multiple burst requests.

A classic design example which might work well here would be a single port SRAM controller. Such a controller would require an internal arbiter to select which of the read or write channel would be allowed access to the SRAM.

A bonus exercise might be to make that slave able to handle exclusive access requests, but this would need to be bonus. Not all AXI slaves need or want exclusive access.
A good exercise for building an AXI master might be to build an AXI based cache of some type. At this point, however, there’s no real way around the two protocols required: you’d need to support both a cache-to-CPU protocol as well as the AXI protocol.

The cache would be the first type of component requiring burst access. It can be kept simple enough as to only require a single burst at a time. Bonus points would include using WRAP addressing, or exclusive access (data cache only).
When you want good bus performance, however, you need to be able to build a DMA. The skills involved in building (and verifying) a memory to memory DMA controller should be the last AXI skills you will need to learn.

The key new feature learned when building a DMA controller, not present in any of the prior AXI components, is the simple reality that a DMA needs to be able to blast as much information across the bus as possible in as few beats as possible. This means you’d need to be able to verify that your design can issue and track multiple outstanding burst requests at any given time.

To simplify this project to something that might still be accomplished within a class, I might suggest limiting this DMA to aligned words only. Obviously, a DMA which can handle both unaligned addresses and unaligned lengths is more useful, but the challenge involved in verfiying such a DMA might be too much for an introductory course in AXI design.

The neat part of this exercise however, is that once you can build a basic AXI DMA, you can then build all kinds of specialized data movers, such as: stream or packet to memory DMA’s, or the reverse memory to packet or stream DMA, video DMAs, virtual FIFOs, and more. None of these items are really all that much more complex than the basic AXI DMA is in the first place.

While these example design exercises start simple, they do end up quite complex–as they should. This isn’t quite as complex as AXI gets, however. More complicated AXI designs might include bus upsizers, downsizers, or even crossbars. While such components are more complex, they aren’t really required for learning AXI. If you can handle the prior exercises, then you should then know enough to build any of these more complex components.

The big problem I have with these exercises, however, is that they get fairly challenging by the end. I’m not sure how I would go about fitting the verification of a DMA into an AXI formal verification course of only a couple days–especially since it took me a couple of weeks to verify my own DMA. So … I’ll continue to keep my eyes open for better (simpler) examples to work with and from. Until then, this is still a really good list of exercises for any student to work with in order to learn the basic AXI concepts on his or her own.

Conclusions

I’m not sure I’ve seen a lot of good AXI training material on-line. Most of what I’ve seen, so far, has been Xilinx’s materials–and those materials would have you start with and modify a broken design. Further, there aren’t a lot of materials discussing how to formally verify AXI designs, and it’s that formal part that was required in order to find some really fundamental bugs in Xilinx’s AXI designs.

In the meantime, I offer the roadmap above for learning AXI. Not everyone will need all of the lessons or exercises above. However, the lessons and exercises outlined above should be thorough enough for anyone to fully learn the topic.

Finally, if a course in formally verifying AXI bus components is something you would be interested in, then let me invite you to correspond with me and express your interest. Let me also invite anyone interested to suggest how either the exercises might be simplified, or how the course might be structured so as to make sure everyone has the time and ability to accomplish each of the exercises. Without such exercises, I fear that lecture alone would leave students just as confused as they are or were when they entered the course.