2019: AXI Meets Formal Verification

It’s a new year! Let’s continue our end-of-year tradition from 2017 and 2018 and take a moment to look back over 2019, from the perspective of the ZipCPU blog, and see what stands out.

Blog History

If you aren’t familiar with the back story, I started the ZipCPU blog back in 2017. Back then, times were tough. It had only been four years since starting Gisselquist Technology, and contacts and jobs were drying up.

Did I know what I was doing when I began Gisselquist Technology back in 2013? While we can argue about whether I understood digital design back then, I clearly did not understand business. I knew it too.

Prior to 2017, I’d had a couple of gift jobs: friends I knew who just happened to have just the right job for me. At one time, I remember traveling to visit my mother and then visited some friends who worked nearby. These friends asked me to come visit them at the office, during business hours, and so I found myself walking into a meeting where the foregone conclusion was that they wanted me to do a job for them.

This is what I consider a “gift”–not because it isn’t valid work, nor because there’s anything untoward going on, but simply because I was the right person for the job at the right time. I walked into someone else’s need. Such jobs are gifts from the Almighty.

That said, it’s hard to plan on gifts, and I needed to learn how to find business the hard way.

My original approach to business development was to build a portfolio of digital designs on OpenCores, and then use them as discussion pieces on various digital forums. Indeed, they made great example designs for that purpose. That said, this approach wasn’t bringing in any business (at the time). (I’ve since gotten several contracts from this work.)

Fig. 1. Inbound Marketing

Then, in May of 2017 my aunt came to visit. She took the whole family to the local used book store to pick out gifts that were to be from Grandma–who was getting too frail to visit everyone. At the time, I picked up a book titled, Inbound Marketing: Get Found using Google, Social Media, and Blogs. I had heard of inbound marketing before, and the concept appealed to me. Instead of cold calling prospective customers promoting my work, I’d promote my work and capabilities on a blog to the extent that prospective customers would contact me about what they wanted done, and I could make contacts that way.

The idea appealed to me, so I started up zipcpu.com and started writing blog articles. I also started creating a twitter feed–all at the suggestion of the Inbound Marketing book.

Thus began the ZipCPU blog. Since that time, my twitter following have grown to over three thousand. Wow. Thanks, everybody!

ZipCPU meets Formal Verification

Later that year, as I was preparing to head to OrConf for the second time, Edmund from SymbioticEDA contacted me. He wanted me to try out SymbiYosys, their new formal verification tool.

What was I to say? Did I need formal verification? Of course not! Why would I need some bright new gadget to help me do what I’d been doing already? What I did need, however, was marketing material for my blog. So I decided to condescend and see how this new formal verification tool worked, and then write a blog article about it.

When pride cometh, then cometh shame: but with the lowly is wisdom. (Prov 11:2)

Much to my surprise, the formal verification tool taught me some desperately needed humility. I applied the formal verification tool to a very basic design, a simple FIFO that I’d used for years, only to discover it had bugs in it that were never found by my test bench.

I then set out to formally verify the rest of my portfolio. Over and over I found bugs, sometimes subtle ones, sometimes not so subtle. I found bugs in all kinds of places, notably in designs that had passed all of my test benches: my prefetch and cache, my CPU, my SDRAM controller, my SD-Card controller, an FFT and much more. Indeed, I’ve since found so many bugs using formal verification, that I’m not sure I could go back to what I was doing before–I no longer trust my ability to write a test bench that would be “good enough”.

This has also made the blog quite unique: In a world where no one discusses hardware bugs, where bugs get quietly swept under the rug, I was discussing bugs in my own work.

Yes, I suppose the verse above is worth repeating.

When pride cometh, then cometh shame: but with the lowly is wisdom. (Prov 11:2)

Formal Verification meets AXI

Fig. 2. AXI uses 5 channels, any of which can stall

While most of my designs used a Wishbone bus, every now and again I needed something using AXI. So, back in late 2018, I started building a set of formal properties that could be used to verify an AXI component–much like the formal properties I’d already used for verifying my Wishbone components.

As with any project, I started off simple and just looked at AXI-lite. Unlike the full AXI protocol, AXI-lite doesn’t have nearly as many signals to it, and so it was fairly easy to work with. I began simply with the four basic bus properties I had learned to use when working with Wishbone:

Following a reset, everything should return to idle.
When a request is stalled, its details shouldn’t be changed
There shall be no responses without prior requests, and
All requests get responses.

These are pretty basic, and in the case of AXI-lite they weren’t all that hard to write out.

Fig. 3. Xilinx's 2016.3 AXI-lite demonstration design drops write acknowledgments. Fixed by 2018.3

I then looked around for a working design to try my new properties on. It didn’t take too long before I found Xilinx’s demonstration designs. Much to my surprise, I found bugs. The core would drop transaction responses, as shown in Fig’s 3 and 4 where, with just a little bit of back pressure, the second request’s response would get dropped.

The presumption, of course, was that my brand-new, untested formal bus properties were broken. This would be the only sensible conclusion. The Xilinx design I was trying to verify had been around for years. It had been used by many Xilinx customers. Indeed, you’d expect the bugs to have been worked out of them by the time I started working with them.

So I dug into the demonstration designs to see what was going on. Again to my surprise, I was able to verify that the bugs the formal tool found were indeed valid.

Fig. 4. Xilinx's 2016.3 demonstration AXI-Lite drops read acknowledgments. Bug remains in 2019.1

While this took place in late 2018, this was really the start of what I’m going to call, the year when AXI met formal.

My surprise at finding bugs in Xilinx’s AXI-lite core only intensified when one of Xilinx’s engineer’s contacted me to explain that not returning a response to a transaction wasn’t a bug, since the response might yet be returned later. Indeed, from just looking at Fig’s 3 and 4 you might not catch that the trace ends in a steady state! However, if you looked at the core, you could tell that the response had been dropped and would never be returned.

I then had to explain to them that this was their IP core I had found bugs in, and not my own. Unfortunately, this took more explaining than I was expecting. Yes, I had modified the core: I had adjusted the white space, removed white space from the ends of lines, and corrected spelling mistakes in the comments. No, the logic wasn’t modified, etc.

Fig. 5. The formal property file makes checking even the code of others easy

Of course, from this point out things only got easier for me. You see, now that I had a formal property file describing an AXI-lite interface, testing and checking other cores became routine. With this property set, plus the Symbiotic EDA Suite, I could now take any AXI-lite design, Verilog, VHDL, or even System Verilog, posted to any forum, attach my property file, and verify that the bus interface to such a core was (or was not) working—even before I understood all of the details of how the core was supposed to work. Several cores were forwarded to me at that time to verify. Almost all were broken, and worse most were broken in the same way. The most notable exception was an Analog Devices core–a pleasant surprise along the way since it just worked unlike the other cores I had been checking.

Now that I had my own formal property file, I could do more than check the properties of others, I could now build my own AXI-lite slave core as well. At this point, it was easy to do. Fig. 6 shows the kind of throughput I was able to achieve on the write channel,

Fig. 6. AXI-lite demonstration, showing 100% write throughput

and Fig. 7 shows the performance on the write channel. In both cases, I was able to achieve 100% throughput–shown at the end of the traces above. This is in contrast to Xilinx’s demonstration cores which achieved only 50% throughput in Vivado 2016.3 (less in 2018.3) and many of Xilinx’s AXI full IP cores.

Fig. 7. AXI-lite demonstration, showing 100% write throughput

I then turned my attention to building a full AXI4 property set, rather than just the AXI-Lite version.

AXI4 was much more of a challenge to formally verify, and that for a couple of reasons. First, the IDs make things challenging. An AXI slave is allowed to return transactions in any order, as long as all of the transactions associated with a given ID are returned in order. Second, the burst lengths are a challenge. In particular, it can be a challenge to verify that the RLAST signal is properly set after two or more read address requests have been accepted. In any implementation, a FIFO would fix this kind of problem nicely. Indeed, AXI processing and FIFOs work well together. This of course led to the third problem: verifying properties of the output of a FIFO can be quite a challenge.

When designing an AXI component, these various constraints aren’t really all that hard to deal with. The various transaction information may be placed into FIFOs within the slave, and dealt with accordingly–but how shall these extra properties be handled in the context of formal induction?

If you haven’t worked with induction before, you should at least know that induction has its own particular challenges. In particular, the formal engine will start in the middle of time–with your design already in some state. Only your assertions and to some extent your assumptions will hold that state consistent. While it is possible to provide assertions to describe every item in a FIFO, it’s typically an expensive and challenging thing to do. But without doing this, it would be easy for the design and the formal properties to get into an inconsistent state.

Fig. 8. Xilinx's 2018.3 AXI4 demonstration design didn't guarantee the right return ID

With some help from Clifford, I managed to put a set of AXI properties together.

As before, I first turned to one of Xilinx’s demonstration cores to test my properties. Again, to my surprise, Xilinx’s demonstration AXI4 (full) IP was also broken.

First, it didn’t guarantee the right packet ID would be returned on either read (Fig 8) or write channels. Second, the write channel couldn’t handle backpressure as shown in Fig. 9.

This left me somewhat perplexed. How could such example designs have been broken for so long? Indeed, Xilinx was using their examples in all of their training material. Surely these examples would’ve been important for them to get right?

Fig 9. Xilinx's core can't handle backpressure

After a bit of digging, I discovered reports of AXI designs that would hang dating back several years. Customers trusted Xilinx’s demo designs, and so believed the bugs were elsewhere–but then struggled to find the problem that was causing their design to lock up. Forum moderators typically blamed customer designs, since no one was able to reproduce the bugs in a test-bench. Not only that, but not all interconnect configurations or transaction combinations would trigger the bugs. Many of the more common interconnect configurations wouldn’t trigger the bugs at all. However, if you then switched configurations, the bug would get triggered and you’d end up looking in the wrong place.

Fig 10. Xilinx's 2018.3 AXI4 demonstration design checks for WLAST without also checking for WVALID. As a result, WREADY gets dropped before the transaction is complete

I then discovered that Xilinx would delete forum posts of dissatisfied customers, or of posters who would complain of broken infrastructure. Indeed, Fig. 11 shows a comment recommending the use of formal methods, the only method that has so far found these sorts of bugs, that Xilinx deleted from their forums.

Fig 11. Xilinx deletes posts that would lead you to their bugs

No wonder why the bugs went so long without ever getting fixed.

During this time, I had the opportunity to speak with Xilinx’s representatives as well. Thankfully, they (eventually) acknowledged the faults in their demonstration cores.

Fig 12. Xilinx's core checks for WLAST without also checking for WVALID

Xilinx’s explanation was that these “IP Packager” cores, the ones I call their demonstration cores, came from an uncertain open source origin and were never placed under Xilinx configuration management, and so they were never verified along with the cores Xilinx considers their IP. I was then assured that Xilinx’s proper IP cores would never have these problems. Those were verified by a “best in class” verification methodology (not formal) every night, so I could rest assured that these other cores were bug free. No, this “best in class” verification methodology did not use their AXI VIP. (I asked.) Apparently, they didn’t even trust their own Verification IP for this purpose.

Fig 13. Intel's demo core also checks WLAST without also checking WVALID and WREADY

I didn’t stop with Xilinx, however. I checked out Intel’s demonstration core too. This one was an AXI3 core, and so not quite the type of AXI4 core my properties had been built to handle. On the other hand, if you limited the proof to looking at one ID only, then it wasn’t hard to use the same properties for both.

Just like Xilinx’s demonstration core, Intel’s was broken as well–in multiple places. Fig. 13 shows a burst request, attempting to send AWLEN+1 or two words of data, but where BVALID is set high before the second WDATA element was received. Not only that, WREADY was dropped. Like the Xilinx bugs above, this would likely cause the design to freeze.

Nor was this the only bug. Fig. 14 shows an example where just a little bit of back-pressure from the first burst would cause Intel’s core to drop the second response.

Fig 14. Intel's demo core can't handle backpressure on BREADY either

The story didn’t stop there, however. Now that I had a formal property set to describe AXI4 transactions, I could verify just about any AXI4 interface. Doing so was as easy as creating a wrapper for the design in question, attaching the formal property set and the core in question to the wrapper, and then running the formal tools. Running the test rarely required more than a lot of typing.

As an example, I recently applied the Symbiotic EDA Suite, to Xilinx’s AXI Ethernet-Lite IP core. Here’s what I discovered:

Contrary to spec, Xilinx’s RVALID logic requires RREADY to be set

Fig 15. Xilinx's Ethernet-Lite, RVALID depends upon RREADY

This means that the design will hang if the interconnect doesn’t hold RREADY high during any read request

Fig 16. Xilinx's Ethernet-Lite, reads will never set RVALID if !RREADY

Not only will the design hang waiting for the master to raise RREADY, but it will also accept new requests during this time. The resulting returns might then have the wrong RID. Fig. 17, for example, shows a request of length ARLEN+1 or one data value using ID 3'b101. The response then comes back with an RID of 3'b100–an error, since the 3'b100 response needed 8'h91 values before getting a return with RLAST set.

Fig 17. Xilinx's Ethernet-Lite, reads will never set RVALID if !RREADY

Returns might even be given the wrong RLAST. While Fig. 17 hinted at this problem, you can see it clearly in Fig. 18 below. In this case, two requests are made for ARID=3'b000. The first request is for a single beat, the second for 1+8'h3E beats. However, when the core responds to the first request, RLAST is still low. The cause? Primarily the simple fact that this core can’t handle backpressure.

Fig 18. Xilinx's Ethernet-Lite, reads will never set RVALID if !RREADY

If all of that wasn’t bad enough, writes accepted at the same time reads are accepted will write their values to the address given on the read channel. You can see this by examining the code from their design. If you want to check your own install, check out the axi_ethernetlite_v3_0/hdl/axi_ethernetlite_v3_0_vh_rfs.vhd file in your Vivado {INSTALL}/data/ip/xilinx/data/ip/xilinx/axi_ethernetlite_v3_0/hdl directory.

Fig 19. Xilinx's Ethernet-Lite, reads will never set RVALID if !RREADY

To keep this from happening, their design prohibits reads during writes and writes during reads. The only problem is, they never check for read and write requests being made on the same clock cycle.

Fig 20. Xilinx's Ethernet-Lite, doesn't keep AWVALID & ARVALID from both starting accesses at the same time

Apparently, Xilinx’s professional “best in class” AXI property checker doesn’t include a formal property check. Just like my own first experiences with formal methods, they’ve now been burned by designs that passed a test bench without being specification compliant.

I’ve also applied formal methods to their Block RAM controller. Along the way I discovered that it could only handle reads or writes, never both at the same time–despite the fact that AXI has channels for both. (This seems like a common theme, no?) Not only that, but my own example design achieved better throughput on single channels. Here’s their best block RAM read performance, requiring N+3 clocks to read N elements.

Fig 21. Xilinx's block RAM controller, requires N+3 clocks to read N elements

Poor burst performance wasn’t limited to reads, but also affected the write channel as well.

Fig 22. Xilinx's block RAM controller, requires N+3 clocks to read N elements

Let’s now think this through. These bugs were found within just a few of Xilinx’s IP cores where they’ve publicly posted their design code. How many bugs would you now expect from IP that hasn’t been posted publicly?

This is where and why open source becomes so important. When the design source is open, you can verify the existence of any bugs on your own.

To this end, I’ve also managed to verify and demonstrate several IP cores of my own using this AXI4 property set as well:

An AXI Crossbar
Data movers: AXIMM2S and AXIS2MM.

These two are my own. They bare no internal resemblance to Xilinx’s (encrypted) data mover cores–or shall I say they bare no resemblance that I am aware of.

No, I haven’t verified Xilinx’s data movers as either working on not. Unlike their data movers, 1) these two cores work within Verilator, and 2) they can both achieve a 100% AXI throughput.
An AXI to AXI-lite bridge. Better yet, an AXI to AXI-lite bridge that gets 100% throughput–meaning you can write an AXI-lite slave that can still process AXI transactions without slowing down.

Fig 23. Read performance of my own AXI to AXI-Lite bridge

An AXI to Wishbone bridge

A Wishbone to AXI bridge

An AXI Firewall, which can detect any of the bugs discussed above, forcing a slave to either be compliant or to be reset. As a special bonus, the slave can be reset and re-integrated into the design–without either hanging or propagating any non-compliant responses upstream.

None of these insights would’ve been possible without either the Symbiotic EDA Suite, or the formal AXI4 property set for verifying AXI cores.

The New Tutorial

In the middle of all of this, I also built a beginners Verilog tutorial. My work on this tutorial started in 2018, although it took until May of 2019 for me to finish it. The tutorial was initially intended to be something that could be used as a set of lecture slides for a class. As a result, it consists of a series of PDF files and some partially completed (and deliberately broken) homework exercises.

Unlike many other approaches, my own approach doesn’t teach the full Verilog test-bench syntax. Instead, I chose to use Verilator and C++ design wrappers. My reason was simply due to the fact that I’d seen so many students get confused when attempting to synthesize what should’ve been test-bench only code.

The second big difference with my approach was that I taught how to apply formal verification to every design, starting in lesson three.

The third big difference was that I tried to be hardware agnostic. All you needed was a simulator–in this case, Verilator. As a bonus, if you had an FPGA, any FPGA with nothing more than serial port, a button (or switch), and several LEDs, you could build all of the designs for your board. Indeed, I avoided proprietary design components like the plague–in order to keep the tutorial fully generic.

The course has been well received, albeit with caveats:

Students with some Verilog background have balked at my liberal usage of C++ and Makefiles. Why, they’ve asked, should they be required to learn a new language? This is understandable. On the other hand, students with more of a software background have likely felt quite at home with this approach.
Since the rest of the industry uses Verilog test benches (or SystemVerilog, or VHDL …), the tutorial has often left students either without this valuable skill or wondering how they should be using it.
Since I used Verilator and the open version of SymbiYosys for all of the projects, there was no ability to add a parallel VHDL tutorial. Many students have asked for one. This is currently something that I am unable to provide using free tools.
Because I used PDF files, I can’t track downloads. This makes it hard to know if students are really interacting with the tutorial itself, or perhaps just the formal verification courseware slides that are further down on the same page. I suppose it doesn’t matter, both would be good things, I’d just love to know and understand more about my readers.

Finally, several individuals have asked for a course that goes into the next step–an intermediate design course. Such a course would teach design in the context of a system with either a Wishbone or an AXI-Lite internal bus.

Fig 24. Proposed intermediate tutorial structure

At this point, however, my world tutorial domination plans have gotten slowed down. Specifically, I want my intermediate design tutorial to remain vendor agnostic–while still being useful on SOC (FPGA+ARM) chips. That means that the tutorial will need to teach students how to connect bus components to a design using only open source tools.

As of today, I think I’ve finally got AutoFPGA lined up for that purpose. It now has an (untested, and quite likely buggy) development branch that supports not only Wishbone (pipeline), but also AXI-lite and AXI–with an appropriate set of crossbars, bridges, and bus simplifiers to make certain things work together properly.

Fig 25. Simplifying SOC component development using Wishbone

If the Lord is willing, I look forward to finally getting some of the lessons associated with this course written in 2020.

Viewership in 2019

With all that background aside, it’s now time to turn our attention to some statistics from 2019. Care to see how well the blog has done? As you can see from the chart below, the ZipCPU blog has really taken off this last year.

Fig 26. 2019 ZipCPU Page Views

Last year, the blog had 183,281 page views. This year, we’ve had 332,735 page views. Readership is definitely up.

Even better, the blog has gone from a maximum of 647 page views per day within a week, shown on the far left of Fig. 26 above, to 1,984 page views in one day during one week in December. This is nearly a three-fold increase in the number of page views.

If you are new to the blog, then, welcome!

That said, if you want to sell me web software to help my blog get noticed by the big search engines, then No, Thank you. The blog is doing quite nicely on its own.

Another thing to notice that’s fascinating about this chart is that most of the page views take place between Monday and Friday. This tells me that the ZipCPU blog isn’t just read by hobbyists–apparently the professionals find this information quite relevant as well.

Welcome, professionals!

Third, you’ll notice that readership slowed somewhat during June and July. Initially, I attributed this to the fact that I was working on so many contracts that it was difficult to write new articles. Now, looking over the months since then, I’m not so sure. Instead, I’m more tempted to believe that this slump is due to the end of the school year and either students not reading the articles, or professionals going on vacation.

Finally, I think that in many ways the reason why the blog took off this year is because of my AXI work above. I was pleasantly surprised to see how many hits the various AXI articles received, as I’ll discuss in the next section.