Examples of AXI4 bus masters

A Xilinx forum poster recently asked for some example designs they might use when designing and creating an AXI master. Since Xilinx has asked me not to post too many links in any forum response, I thought I might post some AXI master examples here that others might find valuable.

Here are some examples I have that you might find useful:

“Building a basic AXI master” discusses how to build an AXI-lite master. The article also presents some decent performance metrics regarding Xilinx’s block RAM controller, explaining why it’s faster to use AXI bursts with this controller than the single-beat transactions used by AXI-lite. You might find those comparisons valuable. Xilinx’s MIG design, however, isn’t so crippled–it’ll handle AXI bursts as well as AXI (and lite) singletons without the throughput performance loss of the block RAM controller, but … with a tremendous lag. If you are looking for the design itself, you can find that here. You can also find it in use within many of my designs, since I commonly bridge from Wishbone to AXI in order to access any DDR3 SDRAM via Xilinx’s MIG controller. (Going from Wishbone to the MIG’s native interface would be better, but I haven’t tried that yet.)
Incidentally, AXI-lite is really easy to convert to AXI full. So, one might argue that an AXI-lite master is an AXI-full master. I’ve had to adjust my terms for precision, and so I often use the term “bursting AXI master” to separate the difference that actually impacts things. I’ll let you decide whether the “bursting AXI master” term is a better or worse term for this purpose–although I’m not really sure I have a better one beyond that one to offer.
If you need to test an AXI master, then you might want a slave to test it against. The article, “Building the perfect AXI4 slave”, discusses how a slave can be built that can achieve 100% throughput in either AXI4(full) or AXI4-lite. Practically, however, the design suffers from one clock cycle lost per burst since it insists that the AW* and W* channels are offset by a beat, but otherwise it would’ve had 100% throughput. Further, since writing the article, I’ve adjusted the design so that 1) the outputs–the inputs to the externally attached RAM–are registered, and 2) it support AXI exclusive access. Registering the outputs introduces a cycle of delay, but it doesn’t impact the design’s throughput.
“The hard part of building a bursting AXI master” discusses the designs of several bursting AXI masters, and why they can be difficult to build. These include:
- Open source AXI DMA’s: Memory to memory, Stream to memory, Memory to stream.
- Video DMAs: Memory to video, as a framebuffer might use, and Video to memory, such as a video recorder might require.
- A “virtual” FIFO, which uses a RAM backing with an AXI4 interface. This can be useful when you need a REALLY LARGE FIFO, but not necessarily a large throughput. Sure, the design can theoretically achieve 100% throughput, but I doubt any slave-interconnect-RAM combination would be able to match it.
- I’ve also got an AXI backed “scope”. This follows my basic Wishbone scope implementation, only it uses an AXI-lite register for read access and an AXI4 back end to record any data written to it at high speeds. (i.e. DMA speeds.) This would be great for digital signal processing work, for whenever I need to record something at high speed and then break or stop when your run into some feature of interest that you want to go back to and inspect.
More recently, I’ve been building AXI master implementations of the ZipCPU’s memory controllers. (You’ll currently need to find these in a special branch of the ZipCPU’s repository, as they represent a major upgrade in many ways.) These AXI memory controllers are often paired with an AXI-lite master equivalents. For example:
- There’s a basic AXI data controller, and it’s AXI-lite equivalent. The AXI version is unique in that it’s my first foray into supporting exclusive access operations from an AXI master standpoint–something AXI-lite doesn’t support.
- There’s also a basic pipelined AXI data controller and it’s AXI-lite equivalent. Again, this AXI master supports exclusive access–something not supported by its AXI-lite equivalent.
- The basic instruction fetch (i.e. w/o cache) doesn’t benefit from AXI full, so the AXI-lite version is all I have for it. Still, it has support for an arbitrary number of outstanding requests, which can be tuned at design integration time to match the expected latency within your system.
- The ZipCPU now also features two AXI cache examples: A data cache and an instruction cache. Both are single-way, and the data cache is a write-through cache design. Both support AXI4 burst transactions. Unlike my other AXI data interfaces, the data cache can’t handle unaligned accesses, nor can it handle exclusive access (yet). If the Lord wills, these may be features to be added later–although I’m more likely to add uncached-pipeline reads before any other features.
Not all masters are information sources. Often a bridge makes a good example design. For example, my AXI-full to AXI-lite bridge can achieve 100% throughput at the cost of two additional cycles of latency. I also (now) have a similar bridge that can bridge to a smaller AXI(lite) interface, although this second bridge has yet to be formally verified. True, these are only AXI-lite masters, and not really AXI-full masters. I have started building an AXI4 full data upsizer, but that design remains confused enough that it won’t get past a couple cycles of formal verification. (When I couldn’t wait any longer, I built an AXI-lite data upsizer–which is functionally equivalent, but won’t preserve the burst capability of AXI4. The original, still in progress AXI4 upsizer, would’ve preserved the burst capability.)

If you’d like example designs that use these controllers, then feel free to consider either my VGASIM or AXI DMA check repositories.

VGASIM includes demonstrations of the video DMA’s eithergenerating video either from a static framebuffer or after first writing a section of the screen to a framebuffer (continuously) and then reading that back to the screen.
The AXI DMA check repository also includes a test bed for the ZipCPU’s AXI interfaces–something I’ve been experimenting with recently. (My goal has been to measure AXI performance, but I have yet to be pleased with the ability of my chosen measurements to capture what’s actually going on so far.)

Beware, both of these repositories are simulation only designs. (In the Air Force, we might call them “hangar queens”.) They won’t necessarily pass timing or resource requirements to fit on any practical FPGA chip, but they are sufficient to verify that the core components within them work as designed.

Fig 1. AXI is not a simple protocol

Finally, let me warn anyone attempting to build their own AXI master: AXI4 can be hard to get right. I’m not sure I can emphasize that enough. While many of these designs have “just worked” the first time out (after being formally verified, of course!) – I can’t say the same for the designs of others. Worse, one AXI bug can easily bring the whole system down while offering you no insight into where the bug took place. If that weren’t bad enough, I’m tracking many bugs that have lived in the Xilinx repositories for years that just don’t get triggered. They weren’t triggered by simulation, they weren’t triggered during sign off, but can often be triggered by some inconsequential change in the design elsewhere that then leads you looking in all the wrong places for the bug. Hence, when and if they do get triggered they often don’t lead to bug reports–since it can be hard to pinpoint the fault at that late stage in the game. I’ll also point out that it’s not just Xilinx–even ASIC designs struggle with getting their AXI interfaces right. Indeed, getting the AXI master right can truly be a challenge. Simulation is a good start, but nothing beats a good formal verification check.

Still, perhaps one or more of these designs will help you get up and running with your own design needs.