Xilinx deleted this post

I was originally going to title this article, “Digital design is not Lego design”, but then Xilinx deleted the forum post it was based upon, so I’ve since changed the title. Here’s the story:

Recently, someone posted some fairly benign recommendations for beginners on Xilinx’s forums. I’d point you to the forum post so you could read these in context, but like I mentioned above–Xilinx was apparently so upset by my comments (copied below) that they deleted the whole discussion.

Fig 1. Deleted!

Maybe I should start recording all of my discussions from now on …

The first individual to comment (sorry, his username has now been lost in the web somewhere), shared his recommendations for a beginning designer. In particular, he was noting how much time you’d need to spend using the simulator for a project, and if that was the case, why not spend your time learning to use the simulator well before purchasing a dev board?

For a real project, you will have no choice but to simulate the $@ out of it because it won’t work and your physical debugging will be meaningless. So why not focus on that? The HDL and simulation environment are already a lot to learn. Then when you literally can’t do any more without hardware, you will be in a better position to buy a board.

In many ways, I agree with the sentiment–but with a few caveats that I’ll share in a moment.

A well known and often very helpful user, u4223374, responded with his own perspective, recommending a design approach I’ll call “Lego design”. He writes,

Increasingly these days an FPGA designer can get away without any hardware knowledge. With Xilinx (and others) providing IP cores for most of the common interfaces, the unique processing hardware doesn’t tend to need any external-to-the-FPGA communications at all – it just talks to the I/O IP cores

… without any hardware knowledge? Incredible. Let’s just say I disagree.

After I responded, Xilinx then deleted the whole post. The next section contains my response, which is what I will assume offended them. I didn’t copy and paste it, though, but rather fat-fingered typing it in so you might find subtle spelling differences. I’ve also added hyperlinks links to the discussion for anyone wanting background information, since I’ve been scolded by a Xilinx forum moderator for offering too many links to my blog and my github IP. Specifically, I’ve been told me to keep the hyper-links in my forum replies at two or less. The reply quoted below originally had only the one link within it.

My Response

If you read through the forums, you’ll see that this [u4223374’s] view doesn’t hold water.

A very common question is, “Why doesn’t my design work?” This question is asked over and over, from those who have tried this approach, understand none of what’s going on under the hood, and then can’t figure out what’s wrong with their design when it doesn’t work. They are forever posting here [on Xilinx’s forums] needing to be rescued, unable to narrow down the problem to the component at fault, and so they tend to blame Xilinx’s cores of not working when it is their own code at fault.

While not always the case, it doesn’t help when the code block they are using actually has bugs in it already. Worse, because the beginner is never taught the fundamentals, but rather the idea that digital design is just like connecting Legos, they have no idea where to even start when trying to find the problem.

The problem gets worse for the designer who wants to make use of older hardware, hardware that might no longer be supported by Xilinx. Equivalenty, FPGA design is often seen as a gateway to ASIC design. If all you learn is how to connect pre-built Lego blocks, you’re not likely to get a job where you actually need to build something to sell.

HLS was designed to make HDL even easier. From what I’ve seen, it only makes HDL design easier for the individual who already understands HDL design in the first place. Many students I know who’ve tried it [HLS] have noted that small, subtle, and seemingly insignificant code changes will take a “working” design and render it non-synthesizable.

As to the recommendation from the original author,

I would suggest if you have no hands on experience and aren’t really itching to see sometimes [sic] happen in a circuit that you can focus on learning your HDL of choice and learn to write test benches to validate your designs.

I would agree somewhat, but with caveats based upon the mistakes I’ve seen students make who have followed this recommendation.

I’ve seen students confuse test-bench only code, code that cannot be synthesized, with synthesizable code. Often students don’t even know the difference after starting from this standpoint.
I’ve also seen way too many bugs pass bench tests, only to fail in real hardware. (This includes first and foremost my own bugs …)

This leaves me with two additional recommendations:

A beginner needs to learn how to debug their design in hardware as well as with the test bench–especially if his design hits hardware after only ever seeing a testbench. (Yes, there is a better way.)
I recommend using the synthesizer along the way. Even if the student doesn’t have any real hardware, it is possible to use a synthesizer to 1) recognize unsynthesizable code, and 2) discover what kinds of code can be synthesized within any particular clock speed and what kinds cannot.
Even better, a beginner should learn to use formal methods.

Why?

Because formal methods do a better job of finding bugs than test benches alone

Since I believe so firmly in this approach, I’ve put together a tutorial for beginners. It’s a hardware agnostic tutorial, so you should be able to learn regardless of what hardware you have as long as 1) You have a serial port, 2) a button, and 3) A couple of LEDs.

Why would Xilinx delete this comment?

I think Xilinx is working very hard to sell the idea that hardware design is as simple as playing with Legos: Connect this block to that block and Voila! Magically a working design appears. While (I hope) that this is often the case for those who use this approach, I have a different perspective: I tend to read the forum posts that were written by those for whom this simplified design methodology doesn’t work. (You might argue that my sampling Universe is skewed as a result.)

I also tend to be rather hard on Xilinx for the bugs in their example AXI designs–those that they recommend beginners start from. [1] [2] Sadly, these faulty designs have caused a lot of problems for Xilinx’s users:

One software engineer picked up from where the hardware engineer left off–after the hardware engineer “finished” the project, delivered a “working” design, and left the company … This software engineer made a very minor adjustment to his MicroBlaze code and the FPGA portion of his design stopped working. A key feature of his new design was a pair of adjacent store instructions just before the design failed. This could’ve been caused by either of the bugs I had found in their training material (cited above).
Several other users have been frustrated that Xilinx’s AXI DMA core has “caused” their design to hang. They’ve asked how to “reset” the core, so they can go on.

The problem here is important: AXI, by design, cannot recover from dropped transactions. If the master is no longer interested in the bus response, it has no way to drop the response without resetting the entire bus or just ignoring the response.

In the bugs I demonstrated above, transaction responses would get dropped in their demo code. This will cause the design (and the attached bus) to hang–unable to be reset until a power cycle.

One solution to this problem is to use a wrapper core with some kind of safety feature: Xilinx offers what they call an “AXI firewall” IP to check for AXI failures and set a flag that the ILA might notice. Of course, that IP would be forever broken until the next reset (i.e. power cycle). An alternative approach is to use my own bus fault isolator, and to configure it to reset the downstream peripheral once any error is detected. Sadly, while these two approaches will allow you to recover from a fault, it may well be that a faulty core doesn’t trigger either of the tests within these cores. I know from my own experience, that the logic within my own core, simplifies (and slows down) the AXI transaction, and so there’s a non-zero likelihood that this might prevent a design from failing downstream of this fault isolator.
It’s not that uncommon for an engineer to post to the forum, declare Xilinx’s code is broken, but then refuse to share his own code.

In many of these examples, I’ve asked if users will post their code so that I can test my formal properties on their core. While some have taken me up on the offer, there haven’t been that many takers. Of those who have taken me up on the offer, verifying the code is often quite easy: since most Xilinx users copy from Xilinx’s demo designs. As a result, the proof is usually as simple as one of the ones I’ve done before. Of those who haven’t posted their code, some have at least posted traces showing dropped acknowledgements.
There are “right” and “wrong” answers to this kind of criticism. The “right” response is to 1) acknowledge the problem (Yes, it’s broken), 2) identify the problem (it does this when it should do that, it’s been broken since 2016 or whenever), 3) promise a fix, and then 4) announce when the demonstration code works again again. Finally, they should 5) point out to users who might not know of the problem, that a fix has been issued.

Were they to do this, I’d either get quiet or repeat the corporate message. Up until now, Xilinx’s has been (nearly) silent on this topic.

Not commenting is one thing, but … deleting posts?

Sadly, getting rid of the messenger doesn’t change the truth of the message. It also leaves those who desire to build reliable designs in a lurch, since they still don’t know why their designs are failing.

At a glance, the problem is simply this: When a newcomer comes to the Xilinx forums to ask how to build an AXI design, the official answer is to point him to the demo designs. Sadly, when I pointed one such user to the bugs in Xilinx’s demonstration designs, they were quite disheartened.

Thank you, Dan, but I need something easier. …

My response below helps to highlight the core issue:

In many ways, it’s a shame Xilinx chose AXI as their protocol for connecting everything together. AXI is a very complicated protocol, and a hard one to get right. I like to use Wishbone, and find it much easier to work with. There’s also an AHB protocol that’s easier to work with than Wishbone.

That said, I really don’t know any way to make this “easy” as you would like. The complexity comes with the territory.

Sorry, folks, but we’re not building Legos here. If you want things to work, and to reliably work well, then you need to learn the fundamentals. Second, if you want to build AXI interfaces with confidence, you need formal methods. Neither Xilinx’s test benches, nor their Verification IP, found the bugs I’ve blogged about that many users have been struggling with and from.