Your problem is not AXI

The following was a request for help from my inbox. It illustrates a common problem students have. Indeed, the problem is common enough that this blog was dedicated to its solution. Let me repeat the question here for reference:

I’ve read some of your articles and old comments on forums in trying to get something resembling Xilinx’ AXI4 Peripheral to work with my current project in VIVADO for my FPGA. My main problem is that whenever I so much as add a customizable AXI to my block design and connect it to my AXI peripheral, generate a bitstream (with no failures), then build a platform using it in VITIS (with no failures), my AXI GPIO connections which should not be connected to the recently added customizable AXI, do not operate at all (LEDs act as if tied to 0, although I’m sending all 1s). I tried a solution I found online talking about incorrect “Makefile”s but to no avail. I have also tried just adding some of your files you provided on github instead of the Xilinx’ broken IP including “demoaxi.v” and “easyaxi.v” [sp]. The “demoaxi.v” has the exact same problem as Xilinx’ AXI, just adding it to the block design and connecting it to my AXI peripheral causes the GPIO not connect somehow. Your “easyaxi.v” [sp] does not cause this issue right away, however adding an output and assigning it with the slave register “r0” then results in the same issue. I am at a loss for what to do. I’m not very familiar with the specifics of how AXI works, even after re-reading some of your articles multiple times (I’m still a student with very little experience), so I can’t be certain why I am running into this issue. My guess at what is happening is that adding an AXI block with a certain characteristic somehow causes the addresses for my GPIO and other connections to “bug out”. But I have no idea why adding this kind of AXI block does this (or something else that causes my issue). I’m reaching out because I … might as well do something other than making small changes to my design and waiting for 30+ minutes in between tests to see if something breaks or doesn’t break my GPIO. Do you have any idea what might be causing my issue or how to fix it?

Thanks,

(Student)

(Links have been added …)

Let’s start with the easy question:

Do you have any idea what might be causing my issue or how to fix it?

No. Without looking at the design, the schematic, or digging into the design files, I can’t really comment on something like this. Debugging hardware designs is hard work, it takes time, and it takes a lot of attention to detail. Without the details, I won’t be able to find the bug.

That said, let’s back up and address the root problem, and it’s not AXI.

Yes, I said that right: This student’s problem is not AXI.

If anything, AXI is just the symptom. If you don’t deal with the actual problem, you will not succeed in this field.

Iterative Debugging

The fundamental problem is the method of debugging. The problem is that the design doesn’t work, and this student doesn’t know how to figure out why not. This was why I created my blog in the first place–to address this type of problem.

Fig 1. This is not how to do debugging

Here’s what I am hearing from the description: I tried A. It didn’t work. I don’t know why not. So I tried B. That didn’t work either. I still don’t know why not. Let me try asking an expert to see if he knows. It’s as though the student expects me to be able, from these symptoms alone, to figure out what’s wrong.

That’s not how this works. Indeed, this debugging process will lead you straight to FPGA Hell.

As an illustration, and for a fun story, consider the problem I’ve been working on for the past couple weeks. I’m trying to get the FPGA processing working for this video project (fun promo video link).

I got stuck for about two weeks at the point where I commanded the algorithm to start and it didn’t do anything. Now what?

Fig 2. Voodoo computing defined

One approach to this problem would be to just change things, with no understanding of what’s going on. I like to call this “Voodoo Computing”. Sadly, it’s a common method of debugging that just … doesn’t work.

I use this definition because … it’s just so true. Even I often find myself doing “voodoo computing” at times, and somehow expecting things to suddenly fix themselves. The reality is, that’s not how engineering works.

Engineering works by breaking a problem down into smaller problems, and then breaking those problems into smaller ones at that. In this student’s case, he has a problem where his AXI slave doesn’t work. Let’s break that down by asking a question: Is it your design that’s failing, or the Vivado created “rest-of-the-system” that’s failing? Draw a line. Measure. Which one is it?

Fig 3. Iterative Debugging

Well, how would you know? You know by adding a test point of some type. “Look” inside the system. Look at what’s going on. Look for any internal evidence of a bug. For example, this student wants to write to his component and to see a pin change. Perfect. Not trigger a capture on any writes to this component, and see if you can watch that pin change from within the capture and on the board. Does the component actually get written to? Do the AWVALID, AWREADY, WVALID, WREADY, BVALID, and BREADY signals toggle appropriately? How about WDATA and WSTRB? What of AWADDR? (You might need to reduce this to a single bit: mydbg = (AWADDR == mydevices_register);) If all these are getting set appropriately, then the problem is in your design. Voila! You’ve just narrowed down the issue.

Let’s illustrate this idea. You have a design that doesn’t work. You need to figure out where the bug lies. So we first break this design into three parts. I’ll call them 1) the AXI IP, 2) the LED output, and 3) the rest of the design.

Fig 4. Breaking down the problem

I would suggest two test points–although these can probably be merged into the same “scope” (ILA). The first one would be between the AXI IP and the rest of the design. This test point should look at all the AXI signals. The second one should look at the LED output from your design.

Yes, I can hear you say, but of course the problem is within my AXI IP! Ahm, no, you don’t get it. Earlier this year, I shipped a design to a well paying customer, and they came back and complained that my design wasn’t properly acknowledging write transactions. As I recall, either BID or BVALID were getting corrupted or some such. What should I say as a professional engineer to a comment like that? Do I tell the customer, gosh, I don’t know, that’s never happened to me before? Do I tell him, not at all, my stuff works? Or do I make random changes for him to try to see if these would fix his problem? Frankly, none of these answers would be acceptable. Instead, I asked if he could provide a trace or other evidence of the problem that we could inspect together–much like I illustrated above in Fig. 4. When he did so, I was able to clearly point out that my design was working–it was just Vivado’s IP integrator that hadn’t properly connected it to the AXI bus. Yes, these things happen. You, as the engineer, need to narrow down where the bug is and getting a “trace” of what is going on is one clear way to do this.

Fig 5. Yes, it's hard. Get over it.

This problem is often both iterative and time consuming. Yes, it’s hard. As my Ph.D. advisor used to say, “Take an Aspirin. Get over it.” It’s a fact of life. This field isn’t easy. That’s why it pays well. Personally, that’s also why I find it so rewarding to work in this field. I enjoy the excitement of getting something working!

If we go back to the video processing example I mentioned earlier, I eventually found several bugs in my Verilog IP.

A bus arbiter was broken, and so the arbiter would get locked up following any bus error.

(Yes, this was my own arbiter, and and one I had borrowed from another project. It had no problems in the that other project.)
Every time the video chain got reset, the memory address got written to zero–and so the design tried accessing a NULL memory pointer. This was then the source of the bus error the arbiter was struggling with.
The CPU was faulting since the video controller was writing video data to CPU instruction memory.

I traced this to using the wrong linker description file. Sure, a simplified block RAM only description is great for initial bringup testing, but there’s no way a 1080p image frame will fit in block RAM in addition to the C library.
A key video component was dropping pixels any time Xilinx’s MIG had a hiccup on the last return beat.

This was a bit more insidious than it sounds. The component in question was the video frame buffer. This component reads video data from memory and generates an outgoing video stream. A broken signaling flag caused the frame buffer to drop the bus transaction while one word was still outstanding. This left the memory request and memory recovery FSMs off by one (more) beat.

If you’ve ever stared at traces from Xilinx’s MIG, you’ll notice that it generates a lot of hiccups. Not only does it need to take the memory off line periodically for refreshes, but it also needs to take it off line more often for return clock phase tracking. This means that the ready wire, in this case ARREADY, will have a lot of hiccups to it, and so consequently will the RVALID (and BVALID) acknowledgments have similar hiccups.

What happens, as it did in my case, when your design is sensitive to such a hiccup at one particular clock cycle in your operation but not others? The design might pass a simulation check, but still fail in hardware.

Fig 6. shows the basic trace of what was going on.

Fig 6. The missing ACK

Notice what I just did there? I created a test point within the design, looked at signals from within that test point, captured a trace of what was going on, and hence was able to identify the problem. No, this wasn’t the first test point–it took a couple to get to this point. Still, this is an example of debugging a design within hardware.

The story of this video development goes on.

Fig 7. The 3-board Stack

At this point, though, I’ve now moved from one board to three. On the one hand, that’s a success story. I only moved on once the single board was working. On the other hand, the three boards aren’t talking to each other (yet). I think I’ve now narrowed the problem down to a complex electrical interaction between the two boards.

How did I do that? The key was to be able to capture a trace of what was going on from within the system. Sound familiar? First, I captured a trace indicating that the I2C master on the middle board was attempting to contact the I2C slave on the bottom board and … the bottom board wasn’t acknowledging. Then I captured a trace from the bottom board showing that the I2C pins weren’t even getting toggled. Indeed, I eventually got to the point where I was toggling the I2C pins by hand using the on board switches–and even then the boards weren’t showing a connection between them.

Generate a test. Test. Narrow down the problem. Continue.

Enumerating Debug Methods

In many ways, debugging can be thought of as a feedback loop–much like Col Boyd’s OODA loop.

Fig 8. Debugging Feedback Loop

The faster you can go through this loop, the faster you can find bugs, the better your design will be.

Given this loop, let’s now go back and enumerate the basic methods for debugging a hardware design.

Desk checking. This is the type of debugging where you stare at your design, and hopefully just happen to see whatever the bug was. Yes, I do this a lot. Yes, after a decade or two of doing design it does get easier to find bugs this way. After a while, you start to see patterns and learn look for them. No, I’m still not very successful using this approach–and I’ve been doing digital design for a living for many years.

In the case of this student’s design, I’m sure he’d stared at his design quite a bit and wasn’t seeing anything. Yeah. I get that. I’ve been there too.

Build time required for desk checking? None.

Test time? This doesn’t involve testing, so none.

Analysis time? Well, it depends. Usually I give up before spending too much time doing this.
Lint, sometimes called “Static Design Analysis”. This type of debugging takes place any time you use a tool to examine your design.

I personally like to use verilator -Wall -cc mydesign.v. Using Verilator, I can get my design to have zero lint errors. Since this process tends to be so quick and easy, I rarely discuss bugs found this way. They’re just found and fixed so quickly that there’s no story to tell.

Vivado also produces a list of lint errors (warnings) every time it synthesizes my design. The list tends to be long and filled with false alarms. Every once in a long while I’ll examine this list for bugs. Sometimes I’ll even find one or two.

From the student’s email above, I gather he believed his design was good enough from this standpoint. Still, it’s a place worth looking when things take unexpected turns.

Build time? None.

Test time? Almost instantaneous when using Verilator.

Analysis time? Typically very fast.
Formal methods. Formal methods involve first assuming things about your inputs, and then making assertions about how the design is supposed to work. A solver can then be used to logically prove that if your assumptions hold, then your assertions will as well. If the solver fails, it will provide you with a very short trace illustrating what might happen.

You can read about my own first experience with formal methods here, although that’s no longer where I’d suggest you start. Were I to recommend a starting place, it would probably be my Verilog design tutorial.

Many of the bugs I mentioned in the video design I’m working with should’ve been found via formal methods. However, some of the key components didn’t get formally verified. (Yes, that’s on me. This was supposed to be a prototype…) The arbiter, however, had gone through a formal verification process. Sadly, at one point I had placed an assumption into the design that there would never be any bus errors. What do you know? That kept it from finding bus errors! Likewise, the frame buffer’s proof never passed induction, so it never completed a full bus request to see what would happen if the two got out of sync. The excuses go on. I’m now working on formally verifying these components.

In the case of the student above, he mentions using some formally verified designs, but says nothing about whether or not he formally verified the LED output of those designs.

Build time? For formal methods, this typically references how long it takes to translate the design into a formal language of some type–such as SMT. When using Yosys, the time it takes to do this is usually so quick I don’t notice it.

Test time? We measured formal proof solver time some time ago. Bottom line, 87% of the time a formal proof will take less than two minutes, and only 5% of the time will it ever take longer than ten minutes.

Analysis time? This tends to only take a minute or two. One of the good things of formal proofs, is that the solver will lead you directly to the error.
Simulation.

Simulation is a very important debugging tool. It’s one of the easiest ways to find bugs. In general, if a design doesn’t work in simulation, then it will never work in hardware.

However, simulation depends upon models of all of the components in question–both those written in Verilog and those only available via data sheet, from which Verilog (or other) models need to be written and thus only approximated. As a result, there are often gaps between how the models work and what happens in reality.

A second reality of simulation is that it’s not complete. There will always be cases that don’t get simulated. A good engineer will work to limit the number of these cases, but it’s very hard to eliminate them entirely. For example:
- Not simulating jumping to the last instruction in a cache line left me with quite a confusing mix of symptoms.
- Not simulating bus errors lead to missing a bus lockup in the arbiter above.
- Not simulating ACK dropping at the last beat in a series of requests, led to the frame buffer perpetually resynchronizing.
- Not simulating stalls and multiple outstanding requests led Xilinx to believe their AXI demo worked.
Considering the video processing example I’ve been discussing, I’ll be the first (and proudest) to declare that all of the video algorithms worked nicely in simulation. Yes, they worked in simulation–they just didn’t work in hardware. Why? My simulation didn’t include the MIG or the DDR3 SDRAM. Instead, I had approximated their performance with a basic block RAM implementation. This usually works for me, since I like to formally verify everything–only I didn’t formally verify everything this time. The result were some bugs that slipped through the cracks, and so among other things my simulation never fully exercised the design. My simulation also didn’t include the CPU, nor did it accurately have the same type and amount of memory as the final design had. These were all problems with my simulation, that kept me from catching some of these last bugs.

While simulation is the “easiest” type of debugging, does tend to be slow and resource (i.e. memory and disk) consuming. Traces from my video tests are often 200GB or larger. Indeed, this is one of the reasons why the simulation doesn’t include either the MIG DDR3 SDRAM controller, the CPU, the flash, block RAM, or the Wishbone crossbar.

I would be very curious to know if the student who wrote me had fully simulated his design–from ARM software to LED.

Build time? When using Verilator, I’ve seen this take up to a minute or two for a large and complex design, although I rarely notice it.

Test time? The video simulations I’ve been running take about an hour or so when using Verilator. A full ZipCPU test suite can take two hours using Verilator, or about a week when using Icarus Verilog.

Test time gets annoying when using Vivado, since it doesn’t automatically capture every signal from within the design as Verilator will. I understand there’s a setting to make this happen, but … I haven’t found it yet.

Analysis time? This tends to be longer than formal methods, since I typically find myself tracing bugs through simulations of very large and complex designs, and it takes a while to trace back from the evidence of the bug to the actual bug itself. The worst examples of simulation analysis I’ve had to do were of NAND flash simulations, where you don’t realize you have a problem until you read results from the flash. Then you need to first find the evidence of the problem in the trace (expected value doesn’t match actual value), then trace it from the AXI bus to the flash read bus, across multiple flash transactions to the critical one that actually programmed the block in question, back across the flash bus to the host IP, and then potentially back further to the AXI transaction that provided the information in the first place. While doable, this can be quite painful.

Fig 9. Tracing from cause to effect can require a lot of investigation

Debug in hardware. Getting to hardware is painful–it requires building a complete design, handling timing exceptions, and a typically long synthesis process. Once you get there, tests can typically be run very fast. However, such tests are often unrevealing. Trying something else on hardware often requires a design change, rebuild, and … a substantial stall in your process which will slow you down. In the case of this student, he measured this stall time at 30min.

This stall time while things are rebuilding can make hardware debugging slow and expensive. Why is it expensive? Because time is expensive. I charge by the hour. I can do that. I’m not a student. Students on the other hand are often overloaded for time. They have other projects to do, and one class (or lab) consuming a majority of their time will quickly become a serious problem on the road to graduation.

Knowing what’s wrong when things fail in hardware is … difficult–else I wouldn’t be writing this note.

However, it’s a skill you need to have if you are going to work in this field. How can you do it? You can use LEDs. You can use your UART. If you are on an ARM based FPGA, you can often use printf. You can use a companion CPU (PC), or even an on-board CPU (ARM or softcore). You can use the ILA, or you can build your own (that’s me). In all cases, you need to be able extract the key information regarding the “bug” (whatever it might be) from the design. That key information needs to point you to the bug. Is it in Vivado generated IP? Is it in the Verilog? If it’s in your Verilog, where is it? You need to be able to bisect your design repeatedly to figure this out.

In the case of the video project I’m working on, this is (currently) where I’m at in my development.

In the case of the student above, I’d love to know whether assign led=1; would work, if the LED control wire was mapped to the correct pin, or if the LED’s control was inverted. Without more information, I might never know.

Build time? That is, how long does it take to turn the design Verilog into a bit file? Typically I deal with build times of roughly 12-15 minutes. The student above was dealing with a 30min build time. I’ve heard horror stories of Vivado even taking as long as a day for particularly large designs, but never had to deal with delays that long myself.

Test time? Most hardware tests take longer to set up than to perform, so I’ll note this as “almost instantaneous.” Certainly my video tests tended to be very quick.

Analysis time? “What just happened?” seems to be a common refrain in hardware testing. Sure, you just ran a test, but … what really happened in it? This is the problem with testing in hardware. It can take a lot of work to get to the “success” or “failure” measure. In the video processing case, video processing takes place on a pixel at a time at over 80M pixels per second, but the final “success” (once I got there) was watching the effects of the video processing as applied to a 4 minute video. Indeed, I was so excited (once I got there), that I called everyone from my family to come and watch.

While I’d love to say one debugging method is better than another, the reality is that they each have their strengths and weaknesses. Formal methods, for example, don’t often work on medium to large designs. Lint tends to miss things. You get the picture. Still, you need to be familiar with every technique, to have them in your tool belt for when something doesn’t work.

Conclusion

Again, the bottom line is that you need to know how to debug a design to succeed in this field. This is a prerequisite for anything that might follow–such as building an AXI slave. Perhaps a fun story might help illustrate my points.

You might also find the first article I wrote on this hardware debugging topic to be valuable.

Or how about the response from a student who then commented on that article, after struggling with these same issues?

In all of this, the hard reality remains:

Hardware debugging is hard.
There is a methodology to it. I might even use the word “methodical”, but that would be redundant.
You will need to learn that methodology to debug your design.
Once you understand the methodology of hardware debugging, you can then debug any design–to include any AXI design.

Hardware design isn’t for everybody. Not everyone will make it through their learning process–be it college or self taught. Yes, there are design communities that would love to help and encourage you. On the bright side, hard work pays well in any field.