ORCONF 2019 is coming up, and I’m planning on presenting slides on the topic of formally verifying AXI interfaces. My intent was just to share some of the bugs I’ve found and so to encourage folks to use formal verification tools, such as the SymbiYosys tool that I’ve been using. However, as I started to put the story together, I also started to realize just how important this topic is.
Here’s the bottom line: AXI is very difficult to verify using traditional simulation based methods. How difficult is it? So difficult that many of the major vendors out there have gotten it wrong.
Let me back up, though, and walk you through some of the details.
AXI, what’s that?
It begins with the chip manufacturing industry. There are a lot of companies out there building their own special and secret sauce into digital electronics. The reality, though, is that many of the tasks these chips are responsible for performing are challenging to do in raw digital hardware. It’s not that they cannot be done, but rather that it’s a lot easier to do them in software. That means that if you want to build a custom silicon widget, you are likely to want to include a CPU into your custom widget as well.
While building a basic CPU may be a college student’s class project, building and maintaining the assembler, linker, compiler, debugger, and indeed the whole tool chain is a task few design houses want to take on for themselves. It’s just easier to purchase someone else’s CPU, rather than to handle all of this work yourself.
One of the most popular CPUs in the embedded sub-chip IP market is the ARM. While RISC-V may well give ARM a run for their money, much of the industry has already standardized around a set of ARM bus protocol standards drawn from the AMBA bus protocol set. Of these, the high bandwidth standard is clearly AXI. As a result, if you want to connect your secret sauce to a CPU hidden inside your proprietary integrated circuit technology, you are likely going to be using an AXI bus to connect the pieces together.
Recognizing this reality, and the fact that FPGAs are often used to test the logic within integrated circuits before burning them into actual hardware, the two major FPGA vendors have been offering AXI based tooling for some time. This includes not only the AXI bus fabric that connects everything together, but also several basic design components like SDRAM controllers, memory movers, DMA engines, video controllers, flash controllers, as well as example designs so that their users can get more value from their hardware. These basic AXI designs are offered for free to anyone who uses their tools. As a result, the example designs in particular have become rather ubiquitous–so much so that any error in one or the other tends to show up over and over again in the code of anyone who used that example as a foundation for their own work. Again, this shouldn’t be surprising to anyone.
This was where I entered the picture. In October, 2017, I tried out formal methods for the first time on what I thought was a fairly basic and simple design. My plan was just to write a review about some new “formal verification” fad and then to go back to digital design the way I had been doing it. My plan changed when the tool found bugs in my design. So I then tried applying the tool to another one of my “working” designs. The same thing happened, the tool found bugs in it. Valid bugs. Over and over I applied the formal tools to my designs and found more and more bugs.
To understand how significant these bugs were, consider these: an SDRAM controller that might fetch the wrong memory address, a CPU instruction cache that returned the wrong instruction, or even an FFT that worked for one set of sizes and bit widths but not for others. The list goes on.
Eventually, I stopped verifying my designs after the fact and started going straight to formal verification tools before ever picking up a simulator. Design has never been so fast or so easy as a result. Indeed, the data cache that I had wanted to build for years suddenly came together inside of two weeks.
By the end of 2018, I’d also struggled many times to build my own AXI interface components. In one particularly painful example, I was forced to deliver a design to a customer that didn’t live up to the promises I had made to him. Even now, most of my work has been done using Wishbone–it’s much simpler to use. Sadly, though, if you want to interact with the modern ARM+FPGA SOC chips like either Xilinx’s Zynq or one of Intel’s SOCs, you will (eventually) need to learn to speak AXI.
As I mentioned above, my first attempts at building AXI components dismal failures. My second set of attempts weren’t much better. Finally, somewhere around last Christmas time, I started picking up AXI and formal verification together. I first built a set of formal properties for AXI-lite, and then again later for AXI. These properties make it easy to verify AXI bus components–much easier than simulation alone. The neat thing about this approach is that the property files can then be reused from one AXI design to another with only minimal configuration and set up. Hence, once I built one design, getting the next one right became easier.
Not knowing whether or not I understood the bus standard well enough, my first AXI-lite and AXI verification projects were to verify any publicly available code I could get my hands on. I started with Xilinx’s example code, and then moved on to verifying several projects on Github, and finally looked into Intel’s example AXI code. I was expecting to find examples that met the AXI standard that I could then learn from.
Let me pause at this point to describe what I mean by an “example” design, or even a “demonstration” design. These are vendor provided example logic designs, such as the one illustrated in Fig. 3 on the right and typically written in Verilog, VHDL, or both, that show the proper interaction between a bus component and the bus. Typically, they are written in a fashion so that you can adjust a user logic section however you might like, while another portion of the design provided by the vendor handles all of the more difficult AXI bus interaction logic.
Much to my surprise, most of the cores I examined had bugs in them. I’ve tweeted about many of these, and so now I’m collecting these examples into some slides that I can present at ORCONF in just a couple of weeks.
It’s worse than a software bug
One of the jokes in the flying world is that pilots need to maintain “the ratio”: one safe landing for every take off. AXI bus interactions are in many ways no different, there needs to be one response for every request. Further, just as most airplanes don’t offer ejection seats or parachutes, AXI doesn’t offer a “bus abort” capability. As a result, if you fail to keep the “ratio” then really “bad” things will happen.
What kind of “bad” things are we talking about? In the case of a bus, a bus master makes a request–whether or read or write–and the slave responds. In the case of AXI, the fact that it has no abort capability means that a master must wait for a response until it arrives. The standard doesn’t allowed it to time out and give up after waiting a second, minute, hour, or year. Even if the response gets dropped or otherwise becomes missing, the master must still wait for a response.
Sadly, I discovered the problem with this in my first Cyclone-V SOC design. At one point when I had a mistake within my own logic so that the design didn’t respond to the bus like it was supposed to, the bus and the ARM CPU driving the bus, locked up hard. Only a reset, forcing a CPU reboot, fixed the bug.
In my example, I was lucky. Knowing that there was a bug in my design, I was able to then turn around and simulate my design and find the bug. I haven’t always been this lucky, and things could’ve been much worse. Had I not been able to reproduce the bug in simulation, I might have ended up stuck in FPGA Hell myself, staring at a flawed design with no idea how to fix it.
Others haven’t been so lucky.
Even among those who ask for help there are a lot of problems that can’t be solved by a disinterested stranger looking over unfamiliar logic. It’s only gets worse when you have no clue where to start looking, as might happen with a design that suddenly locks up the whole system.
You need to understand, this isn’t a kernel panic type of bug. We’re not talking about the blue sceen of death here either. Nor are we talking about bugs where your application just failed. No, we are talking about something even more severe. A fault in the AXI bus structure whereby a component fails to respond properly is catastrophic. The result will be a whole system crash so hard that only a power cycle can fix it.
Given that I use the bus for debugging, such a crash leaves me blind to the cause.
So what kind of bugs did I find? I found both types of catastrophic bugs. Not only did I find peripherals that didn’t produce as many responses as requests they had received, but I also found bus masters that would drop requests before the bus had accepted them.
In one CPU I examined from a major vendor, two load or two store instructions, back to back, would cause a transaction to be dropped if the interconnect wasn’t immediately ready to receive to the request.
Worse, I even found an interconnect, something that connects multiple masters to multiple peripherals such as the one shown in Fig. 6, where a request from one master would get sent to the correct slave, but then the response from the slave would be returned back to a different master.
It wasn’t pretty.
Indeed, I found bugs in a very large percentage of the cores I examined.
It didn’t help that many demonstration cores, and how-to blog articles I found had bugs in their example cores that were then copied into various user designs. Neither did it help that the example designs from both major FPGA vendors had catastrophic bugs within them.
As you might imagine, I was rather perplexed by this. In many ways, none of this made any sense. If these problems were really so bad, how was it that no one had noticed them?
One possibility was that each vendor only checked their cores in their own environment, and somehow the environment was masking the bugs. As an example, some of the bugs I found would be only triggered if the bus was ever used in a particular configuration. I also found that one vendor had crippled the throughput of their interconnect–perhaps because no one really knew who or what was responsible for the user bugs.  
So I started browsing and participating in various forums.
Working the forums for answers
Back in the beginning, when I first started to try to sell my services as an FPGA design engineer, I worked Digilent’s forums fairly hard in the hopes that I might gain some free publicity and perhaps even a contract. Even to this date I’m one of their leading all time contributors, even though I’ve stopped working the forum as hard and the forum moderator’s have since caught up and one has passed me. Since I had just left the military service, I was hoping this might be a way to find some commercial work. The technique worked well, since I found two contracts, several long-term business relationships, and even some free hardware as a result of it–such as a Nexys Video board and even a Digital Discovery. This time, however, I worked up the chain on the major vendor’s forums, rather than those of their outlets. Specifically, I was looking for examples of the bugs listed above, or if not then I was hoping to discover why they weren’t being triggered. I was also looking for example designs to check my formal property files against.
While much of what I found was fairly benign, such as brand-new students struggling to figure out their engineering design homework, there were also many professional design engineers participating in these forums and asking for help regarding the problems they had come across.
In one conversation, I came across a user asking for help whose AXI slave design wasn’t working. When I asked if he would try my own formally verified design, he tried it and the bugs went away. Sadly, we never found or identified the bug(s) in his own (VHDL) code–perhaps because he never shared enough of his design. Incidentally, when I offered him my own design to try, I never told him that this was the first time the core had been tried in either hardware or simulation. It “just worked” the first time.
In another conversation, an FPGA design engineer had built a design and delivered it to his product team in a “working” condition for the software programmer to take over. The design contained a MicroBlaze CPU and whatever application sauce they were putting into it, such as is shown in Fig. 8. This FPGA design engineer then left the company. Some time later, the software engineer made some changes to his MicroBlaze software–not the hardware design, but the software for the MicroBlaze CPU within it. The result was that the design locked up following two adjacent store instructions and no one could understand why. This pattern also matched the bugs that I had found.
Someone else posted on a Reddit forum that their design wasn’t working, curious if anyone had seen similar problems before. I asked if he had formally verified his design. Instead of responding to me, he turned around and posted on the vendor’s forum no longer asking but now declaring that their interconnect was broken. Again, I asked if he had formally verified his design or, if not, if he’d release his source code and I’d check it for him. He then proceeded to ignored me for a second time and instead posted a trace that was supposed to show the bug. When you looked closer at the trace, however, you found a bug within his core and not the interconnect: one request was creating two responses. His design wasn’t keeping up the “ratio”. Formal methods would’ve caught that.
Another engineer posted, wondering what was wrong with the vendor tools. When I asked if he could or would share his source, he refused. Sadly, this is a rather common though unfortunate answer. Digital logic is designed and sold, and protecting the market tends to keep corporations tight-lipped about their secret algorithms–since it represents real money to them. Instead, this engineer offered a “sanitized” version of what he was doing. With the experience I had gained from formal methods, I could clearly see the bug in even his sanitized design–although I have to believe it must’ve passed his test bench for him to even open up like that.
When I dug further into the forums, I found several examples of folks who had written in, not knowing why their design was failing, but for whom no one had responded.
I also looked into any example code I could find. I noticed that most of the examples I found followed the vendor’s examples. This made it fairly easy to verify, since once I had verified one example any others that looked like it tended to have the same bugs.
At one point, I even looked into a major vendor’s interconnect structure. I was hoping I might be able to apply the formal tools to it, having verified my own interconnect and also since I had found some severe bugs in an academic’s interconnect–as mentioned above. As I worked through this design to determine if it would even be possible to set up the tools (it wasn’t–not all of the code was available as shown in Fig. 9), I noticed an optional “feature”: The default setting of their interconnect would allocate a channel from the master to the slave and at the same time from the slave to the master. This channel would then remain allocated until all transactions had completed–much like my own crossbar design. An optional setting would activate a different implementation, one that would route transaction requests in packets to the slaves, and then route them back to the masters in a way that required bus arbitration in both directions. As a result of the second “return” arbitration, the slave would experience “back pressure”–a necessary trigger condition for some (but not all) of the bugs I had found. This second implementation would’ve clearly triggered the bugs that I had found, whereas the first might not have. Given the code I examined, it is quite possible that a failure to test all configurations of this vendor’s design might’ve contributed to the problem manifesting in some designs and not others.
In another example from the same interconnect logic, I noticed that the vendor crippled both read and write channels in their AXI-lite to AXI bridge. This surprised me. Bridging from AXI-lite to AXI is really easy to do, requiring almost no logic. Then I realized, their crippled design probably kept some of these AXI-lite bugs from triggering. This appeared to be either a consequence of an engineer trying to fix an ill-defined logic bug, or perhaps it was legacy code remaining from a protocol version (AXI3) that had since fallen out of favor.
I also discovered a more shocking reason why many of these bugs may have stayed hidden, while examining some DMA reset questions.
For those who are not familiar with a DMA, it’s basically a hardware component that copies data from one bus slave to another. My own WB DMA, works by first copying the data to an internal memory, and only then writes it back to the bus. It will do this several times if necessary. If it ever encounters a bus error, such as I sometimes generate from a slave failing to respond, it issues a bus abort, ends the transaction, and reports an error.
This is not as easy to do with AXI.
In this case, several customers had posted to the forum over a couple of days asking asking whether or not it was possible to interrupt an ongoing DMA transaction. Apparently, these customers were trying to copy data between a user logic core and memory and the bus was locking up mid-transaction. They wanted to find out how to abort the transaction. I replied to these individuals that there is no such thing as a bus abort in AXI: every transaction request must receive a response. There’s no way to timeout a transaction.
Well, not quite. There is one way to timeout a transaction. You can use a “firewall” of some type, such as this bus fault isolator, to catch bugs in a faulty AXI slave. If placed between the slave and the rest of the bus logic, as shown in Fig. 11, it will catch faults and then isolate the downstream slave from the rest of the system–keeping those faults from propagating upstream and becoming catastrophic. Of course, once the fault is detected, you still don’t know if a subsequent response was for the missing earlier response or not. As a result, you can even configure the core to then reset the downstream slave as well.
Unfortunately, this approach is not a cure-all. Placing the bus fault isolator between the interconnect and user logic will slow the user logic transactions down to one at a time. As a result, just placing this core between the interconnect and user logic may resolve the problem just by itself–even if it never detects a fault.
Still, I shared this possibility on several of these forum threads. If nothing else, it would’ve helped isolate the cause of the problem.
What surprised me the most about this these DMA reset inquiries was that the vendor then deleted all but one of the three threads when it became apparent that it wasn’t going to be flattering to their product line or to the bus structure they had chosen. This left me wondering, how many other forum posts regarding these bugs have been deleted?
Yes, this problem runs deep. It also appears to be hard to find.
Why haven’t these bugs been caught?
Digital design has been around for a while, so why haven’t these problems been caught before? Why do they remain so hard to find?
I think the answer to this question comes down to how digital designs are tested and verified in the first place. The standard test methodology requires you to write a test bench that exercises your design. Much as I discovered with my first experience with formal methods, such test benches tend to only examine a “normal” design path. In my case, my own FIFO test bench didn’t check all combinations of reading and writing to either full or empty FIFOs.
My best guess is that the same thing is going on here.
I know one individual who wrote in to Xilinx’s forums proudly proclaimed that his core had passed Xilinx’s (AXI VIP/simulation based) verification and so any problem he was having must be Xilinx’s fault. Sadly, their verification step left him believing that his user core worked, even though it still suffered from the same AXI-lite bugs I had already found.
The basic problem is that if you only ever test your AXI user component against one transaction at a time, you’ll miss most of these bugs: You’ll start your simulation with a bus master initiating a transaction while the slave is already waiting for it. The simulated slave will then respond to the waiting master and voila you’ll convince yourself that your design works even when you haven’t checked all conditions. What then happens if the interconnect sends another request before the first one has returned? It wasn’t tested. What happens if either the interconnect or bus the master aren’t yet ready to receive the bus slave’s response? Not tested. Might the slave ever lock up, due to conditions that aren’t simulated, to where it stops processing the bus request? These aren’t things that are normally tested in simulation, but they do happen in real life.
Consider the DMA discussion above. A good DMA will push as much data into and through a core as fast as the bus will allow, yet the simulation approach we just discussed above will never test for this.
Consider the example of the vendor’s interconnect as well, shown in Fig. 9 above. If the interconnect has to arbitrate response packets back to the master that sent them, then it may not be ready to receive a particular response if another slave has the return grant to that master. This didn’t get tested either.
What about the crippled channels? If you had enabled them to be full speed, by packing request after request, many of these example designs would break. Again, this is something you won’t test if you just tested one transaction at a time in a simulation.
Some weeks ago an engineer wrote to me and asked if I had a test bench for one of my cores. You know, he said, something where he might write:
I was floored. Indeed, I didn’t even know where to start. Any test bench that only checked one transaction at a time, or waited for the read transaction to finish before starting a write transaction would mask many if not all of the bugs I had found.
This is why digital designers need formal methods now more than ever–because they actually check the cases you weren’t anticipating.
If you’ve never tried formal methods before, then understand that they are very different from simulation. Instead of trying one test after another in a long chain or sequence of logic, formal methods check all logic paths at once. Yes, “all”, in what typically is a breadth first search. This is both their greatest benefit, since they catch things no one imagines, as well as their greatest curse, since the computation required does not grow with polynomial time.
Another thing that makes formal methods different is that you don’t specify how the external environment should interact with your core. Instead, you specify how the external environment shouldn’t interact with your core. The subtle difference in logic means that, unless you tell the tool not to, it will try all kinds of crazy logic inputs that you might never expect.
One of the common misperceptions about formal verification techniques is that they are only useful or cost-effective for “mission critical” applications–things like aircraft, rockets, and satellites. Were I to buy into this misconception, I wouldn’t touch formal methods at all–since my own little “sandbox” is hardly anything anyone cares about.
That view might have even described me before I tried it for the first time. After using these tools, I’d never go back to the way I was doing digital design before starting with formal methods. For me, formal methods are a big time saver. One of the things I really dislike is trying to sort through a 15GB+ simulation trace file looking for a bug. It takes too long, and wears thin on my patience. Not so with formal methods. Indeed, most of the formal designs I do require less than 20 time steps to find a bug.
Posting to a forum? Personally, I’d never write into a vendor forum asking for help unless I was not only stuck in FPGA Hell, but I had tried every option I knew of to get out and still ended up stuck. I like the ability to do my own work and to call it my own, and so I’m the type of person who will wait until I had tried fixing my own design every way I knew how before ever breaking down and asking for help. Perhaps its a character flaw. However it’s this particular bias of mine, and I don’t think it’s all that uncommon, that makes me believe that those who had posted requesting help, and especially the professionals among them, must have been stuck in FPGA Hell for a while before they risked their pride and profit to ask on a public forum. Now imagine if you could find a logic problem quickly, perhaps even in less time than it took to synthesize your design or even write your post for help, and do it all at your desk.
“Mission critical” or not, formal methods have saved me and my little sandbox of the world a lot of frustration.
So how long does it take? Well, to give you an example, I just built my own AXI stream to memory converter. After writing this core, I included the formal bus properties into the core and started running the tools. This was before simulation–since I haven’t (yet) run the core through a simulation. At first, SymbiYosys founds bugs in seconds. Those first bugs tended to be focused on initial values and reset processing. Once fixed, the tool then took a bit longer and found several bugs in my AXI-lite control logic. Once I cleared those two hurdles, I was then stuck again between about 10-15 time steps for a while. During this time, the tools only required two minutes of processing to find a bug. At this point the bugs found included things like reading from an empty FIFO, initiating a bus burst requests before there was enough data in the FIFO, changing a bus request while it was waiting to be accepted, crossing memory pages within a single request and so forth. Now that I’ve worked with the core for a while, it only takes 56 seconds to know if any changes I’ve made have broken the core.
Think about that for a moment. It takes 56 seconds to know if any simple logic change has placed a bug in the core or not. Not only that, but by means of k-induction, those 56 seconds evaluate not only the first 15-timesteps, but every set of 15-timesteps from the beginning of time through all eternity–or at least until the hardware starts failing.
Good luck trying to get your simulation either to run that fast or to be that complete.
Most of the bugs I’ve found have now been posted, either in a blog article on this site, in Xilinx’s forums, or directly to the authors of any github cores either by filing an issue or sending an e-mail. Many of them have not yet been fixed as of this writing. For example, as of Vivado 2019.1, Xilinx’s bugs remain including one dating back as far as 2016.1. The good news is that their engineers are now not only aware of these bugs, but they’ve also promised to fix them. This is not yet true of the bugs I found in Intel’s design. Sorry, I’ve dropped the ball there. Other than tweeting about Intel’s bugs, I have yet to formally post or write about my findings.
What I will say is that I remember, years ago and before I ever started diving deeply into FPGA technology myself, sitting down with a co-worker to integrate their data collection logic into my mission requirements. Our goal was to demonstrate that we had a useful product to the boss. Instead, I remember this coworker struggling and struggling with the design, hitting the reset key over and over and over again, frustrated and confused at why the design wasn’t starting up like it was supposed to or like it “usually” did. No, I’m not going to repeat his language here, but let’s just say that over the course of time while he was hitting that reset button it became more and more colorful.
Now consider that the training material I’ve found has been broken. Indeed, any one of the AXI bugs I’ve found in these public source vendor training materials might’ve caused this same frustration. These include not only the bugs found in the training material for the vendor’s tools that I’ve already blogged about, but also the training material for the other major vendors on the market.
Now, as I’m slowly picking up my jaw off the floor, I’m starting to get the full impact of what’s been going on.
These bugs can be found and fixed–using formal methods.
If you are interested in repeating any of my work, I’ve done all of my work using SymbiYosys–either the open source or the commercial version. I’ve also posted copies of the Xilinx designs I’ve tested. If you’ve never formal methods, the open source version is capable of processing any Verilog (2005) code you might have with a limited number of SystemVerilog extensions. Feel free to download it and give it a try. You can also find my AXI-lite properties and several example designs on line as well.
You can also start with my beginner’s design tutorial. It’s free. Enjoy it. Even though Xilinx deleted my response recommending this tutorial to new engineers, you are welcome to it. I also teach a course on how to do formal verification. You can find those slides here, and contact me if you are interested in taking the course either on-line or in person on your site. Feel free to check those out too.
Finally, I look forward to meeting many of my readers at ORCONF 2019! Feel free to stop me and say hello.
Buy the truth, and sell it not; also wisdom, and instruction, and understanding. (Prov 23:23)