Re: What does your design flow look like?
Someone recently asked me on reddit, “Do you have a blog post or something like that which explains the high-level view of your toolchain?”
The quick answer is that my preferred tools are shown in Fig. 1 on the right. Each of these tools has its purpose.
GNU make is fundamentally a software build management tool. It tries to build components by only rebuilding those pieces that have changed. It’s generic enough that it can be tuned for generating hardware build products as well.
AutoFPGA is the tool I use to compose a design built from multiple bus components. I’ve so far managed to (mostly) avoid vendor tools when doing this. As an example, I don’t (normally) use the Vivado IP integrator at all.
I say “mostly”, because I’m currently working with my first Zynq Ultrascale+ board and I’m still using the IP integrator for that design. I don’t expect to use it for long. Indeed, I’m not expecting to use it longer than it’s required to transition to an AutoFPGA based design, like I did with the Intel Platform Designer on my Cyclone-V design.
SymbiYosys, used for formally verifying any cores I work with. I’ve also used the Symbiotic EDA Suite quite successfully for full SV designs.
Verilator is my tool of choice for design simulation. I find it ideal for when I wish to integrate component emulation, such as a multi-GigaBytes SD card or a VGA display, with an integrated design test.
iverilog is a nice open source Verilog simulation tool. While I don’t normally use the simulation subset of Verilog, when I do I’ve used iverilog.
Yosys is an open source synthesis tool that has replaced several vendor synthesis tools. I’ve now used it successfully on my ArrowZip project, and all of my ECP5 and iCE40 projects.
NextPNR is an open source place and route tool that has also replaced several vendor place and route tools in my design implementation flow.
Vendor tools typically include Vivado or Quartus.
My recent attempts to work with Vitis have resulted in either the tool crashing (when following the vendor tutorial of all things!) or some other unexpected behaviors. Unfortunately, it seems like the product isn’t really ready for market despite being placed on the market. Forum support for bugs is also spotty at best.
I’m also not sure if I will use Quartus again. My last attempt to install Quartus failed due to a version incompatibility with KDE that had been years in the making. I haven’t gone back to see if they’d fixed it yet, but the bug report that I found describing my problem was months old when I found it and not only was it still unfixed, but Intel hadn’t announced any plans to fix it at that time.
GTKWave is my favorite waveform viewing tool. It’s simple and easy to use.
Wavedrom is my go-to program for drawing waveforms for sharing in either blog posts or in tweets.
Tikz-timing is my favorite program for generating waveforms when writing formal LaTeX documents. While I would prefer using Wavedrom to generate SVG files for inclusion in LaTeX (they look nicer), I like using underscores in variable names and my favorite SVG to eps converter, inkscape, leaves behind LaTeX files that can’t handle underscores very well.
GCC for all software compiling, to include compiling any programs for the ZipCPU. This also includes binutils under the hood. If I have to build a program for MicroBlaze, NiOS II, or even ARM, my preference would be to do so using GCC.
git is currently my preferred choice for managing source control. I’ve used Subversion (SVN) and Mercurial as well, but I use git often enough that I don’t (normally) have to check the manual for usage information. Put simply, when used well git provides the user the ability to “undo” any massive design changes they may have made since the last commit.
This also means that I dislike editors or other IDE’s that store design information in a file I cannot review line-by-line in order to unroll any changes that the tool might make to my design.
zipload isn’t really a project of its own right. Instead, it’s a program that I rebuild into each new project I make. As mentioned above, I use it to load software into block RAM, flash or SDRAM as desired. Many versions will also load FPGA configurations into flash as well.
dbgbus is a fundamental piece of any design I’ve ever built. It gives me an external debugging port into the design in the form of a (typically) UART to Wishbone bridge. I currently have two debugging buses I work with: the hexbus–a simple bus we build here on this blog, and a more complicated wbubus offering compression and more buffering capabilities. Recent upgrades to each bus have provided them with UART to AXI4 bus mastering capabilities.
wbscope is the tool I use for any hardware debugging I find that I need to do.
It was fun, some time back, to be able to present to Xilinx support staff a 4-second trace showing AXI interactions between my own design and the MIG-based memory core I was working with. Their MIG core was producing xVALID signals I wasn’t expecting, and their best guess was that I had requested something prior to the trace window. The only way to prove I hadn’t was to get a full 4 second trace from the initial state to the bug. How did I do it? There’s a version of the wbscope that compresses its observations. This allowed be to record long periods of idle time before the test of interest, and so I could tell Xilinx that their MIG core was actually producing xVALID signals that my design hadn’t requested. (The problem turned out to be a mis-identified DDR chip in the end, not the MIG core at all–but that’s another story for another day.)
The longer answer is that my design flow really depends upon what I’m doing. Am I 1) building a component, 2) integrating many components together into a design, 3) bringing up a new hardware interface, or 4) iterating on an existing design to narrow down the location of a bug.
Design components are special, primarily because they tend to be of a small enough complexity that they can be fully verified using formal tools. “Component” in this case, usually refers to a piece of a design having a bus interface of some type. Example “components” include the ZipCPU, bus crossbars, bus masters and bridges, flash controllers, spectrum analyzers and more.
After dealing with several painful errors earlier in my FPGA journey, I don’t connect anything to the bus that doesn’t pass a formal verification check of the bus interface. Why not? Well, for a couple reasons. First, its notoriously difficult to debug a larger design containing a broken bus within it. It’s also much easier to debug a bus components using formal methods than it is via simulation. Second, as I mentioned above, I use a bus based scope for debugging hardware interactions. If the bus ever breaks, I can’t use my scope and, well, that slows me down. (I’ll hang my head in dismay and close up shop for a day to cry about it. Eventually, I come back after some serious prayer time.) Third, you can often recover pieces of a broken design and still use it–as long as everything on the bus works.
So, rule #1 is that every bus component must have its bus interactions formally verified. I allow no exceptions–primarily because I get burned with every exception I make to this rule.
Rule #2 is to run
-Wall on everything early on in your process. Given that a
lint check takes only seconds to accomplish and that it can find a lot of bugs,
this is simply a time-saver. Vivado might take a minute to start up, or ten
minutes to find a bug. While
SymbiYosys isn’t bad,
Yosys doesn’t necessarily flag as many
As a result, running a lint-check with
is my first step when designing any new components. It’s a quick check, and
anything that doesn’t pass can usually be rewritten and fixed in very short
order. Even better, unlike the Verific
based parsers like Vivado and Quartus, it is reasonably possible to generate
a design with
that doesn’t have any warnings.
After that first step, though, my next step to component design is a formal
verification step. This usually starts with either a
cover() check or a
bounded model check (BMC). I may often switch back and forth between the
two as well as I work with my draft
If everything goes well, the
cover() check will tell me how long it takes
my core to accomplish whatever fundamental operation it does, while also
producing traces showing it doing that operation.
The bounded model check is an even easier formal check to run. With nothing more than a modest set of bus properties, it’s usually easy to run a bounded model check. Even better, BMC failures are easy to diagnose and identify from the VCD trace generated by the SymbiYosys. Indeed, BMC failures are much easier to diagnose and identify than the phantom bugs in Xilinx’s example IP that have left users dealing with hung designs for the last five years, but that’s another story.
During this time of
cover() and BMC checking, I’ll be adding any (additional) external interface
requirements the component needs in terms of
assume() statements, until the
I’m looking at start to look reasonable. Then, once the
starts looking realistic I’ll keep going until the BMC proof passes.
Of course, what good is a BMC proof that passes? All it does is tell you that
your design doesn’t have any bugs in the first
N timesteps. What about
How many timesteps do I use for BMC? That depends upon my patience. I’ll start with however long it takes my core to complete an operation or two–hopefully shorter than forty timesteps or so. This usually takes a couple of seconds of processing at first, but then as the design and the proof complexity increase, I’ll trim that down while trying to keep the entire proof at less than 3-4 minutes. In the case of my AXI DMA core, that meant a proof of only four steps. My crossbar’s proof depth was five steps. Four or five steps don’t offer you much context when examining any failing traces.
During this time frame, as BMC starts to pass at a level equal to my patience level, I’ll switch the formal tool to doing unbounded proofs. Unbounded proofs are more difficult to accomplish than bounded model checks since they typically require more assertions. At the same time, without the unbounded proofs, you might never find counter overflow bugs or recover from the time-step trimming you did to your BMC proof to keep it within your patience limit.
As an example, my AXI DMA core has roughly 330 assertions within it. Only 75 of those come from the bus property set. The other 252 are necessary for induction and to make certain that the bus property set properly describes the action of the core over time.
Once a design passes the
the task is usually just about done. The last step is the
This check makes certain that any desired transactions remain possible after
all of the interface assumptions you have made. The
cover() check will also
generates a set of waveforms demonstrating the performance of your core that
you can use to advertise the capabilities of the core to others.
Indeed, if you’ve been following me on twitter,
you may have noticed that I’d tweet examples of these cover traces once the
design got this far. If you’d like to check out some of these examples,
feel free to browse the cover check results I’ve posted for my
AXI cores. They should
all give you a decent idea of what you might expect from the respective
logic cores–and more importantly what you might expect from a
A good cover check will verify the complete operation of the core from start to the completion of its operation followed by a return to idle. If the core can handle multiple transactions, the cover check should demonstrate the core with every pipeline stage full–possibly covering two or more transactions completed. Don’t forget to cover the return of your core to it’s idle state! (See Fig. 6)
Integrating the design
Once all of the components of a given hardware design have been individually verified, then it’s time to move to integration. This is the step where I bring in both AutoFPGA and Verilator.
This is where I will start (and hopefully stop) using any graphical integration tool that I can’t get rid of. If I have to, therefore, this is the step where I would generate a giant user design that connects to whatever SoC bus interconnects you can connect to.
This also marks the end of any vendor IP integration flow for me. I might still use a synthesis or place–and–route capability, I just find designs are much easier to debug from RTL than they are with hidden wires and connection logic throughout.
The next step in my design flow is done using AutoFPGA. For this step, I generate a Makefile (Ex) that does nothing more than create and execute a command to call AutoFPGA on the various configuration files. The master Makefile (Ex) then copies these build products (if changed) into various locations in the rest of the design’s directory tree. If nothing changes, the design doesn’t need to be rebuilt.
This is why I like (GNU) make over a lot of vendor solutions. Using make, I don’t need to “export” any board design–AutoFPGA has already created any design products I might need. Indeed, at this point I should be able to start building all my support software–even before synthesis starts or completes–since the bus components and all addresses assignments will have been laid out.
Then, before ever touching a synthesis tool, I run a Verilator lint check on the entire design. As before, this is a quick and easy check to perform. Synthesis might take a half an hour, depending upon the design, but the Verilator lint check will only take seconds.
Well, not quite. I’m actually using Verilator to convert my design into a C++ simulation at this step. It’s still quite fast and easy.
If this is a Yosys/NextPNR design, this is also where I build the design implementation and check timing. If not, I might start a vendor build tool at this point in the flow.
Now that Verilator has built a software library consisting of my design, I’ll build simulation software to test that design.
Each hardware component (that I wish to test in simulation) needs to have an emulator written for it. This is typically done by hand.
My simulation script is usually quite sparse at this point. It typically involves receiving commands from a TCP/IP port and converting them to an emulated serial port. The actual simulation scripts are programs that would interact with the actual board. The TCP/IP port helps to convince those programs that they are interacting with the actual hardware instead of the simulation.
Are these test scripts complete? Do they achieve 100% coverage? Not usually. That was the purpose of the formal test in the last step. Instead, the purpose of these tests is just to make certain that everything still works when it’s all connected together.
- As a final check, or if the Verilator test takes too long, I’ll switch to running the same tests in hardware. If the test passes, that’s the end of the story. If not, I have to slow down and reconstruct the bug in Verilator (if possible).
Fig. 10 discusses some of the struggles I had trying to use this approach to check a high resolution FFT. If you are curious at all about this approach to building an FFT, feel free to check out the academic references on this page.
Bring up new hardware
It’s not always possible to emulate hardware you’ve never seen. There’s always the possibility that you’ve misunderstood the specification and built the solution to the wrong problem. The only way to check this is to make certain that you can actually interact with the real hardware. From there, you can back up and tune the Verilator based emulator to match reality.
So, when working with a new hardware component, the last thing I do is to write ZipCPU software to interface with that component.
Perhaps the best example of this is working with a new flash controller.
My first step is to start the design. This typically sends a series of commands to the flash from the controller. I turn these commands off, then I shut down the main read port of the flash. All of this is done by command line.
Then I issue commands, one at a time, from the command line using a program I’ve called wbregs. I’ll then examine the results using a WBScope to see how the flash responds to the commands I’ve sent.
Note that I’m not changing my design–I’m just issuing commands to my design from an external port. As a result, I can change my command script arbitrarily without needing to rebuild anything. I’m the one determining the “script” at run time as it is.
In the case of the flash, I’m usually first trying to read the device’s serial number, then to place it into QSPI mode to read the first several words from it. If all goes well, I should be able to read the synchronization word from the beginning of the bitstream stored in the first couple addresses of the flash–right where the vendor provided bit stream still (hopefully) resides.
Only after I start issuing the same commands over and over again in testing will I set those commands up into a script to spare myself the typing. In the case of the flash, my script takes the flash off line and out of QSPI mode, (or whatever mode it might have been in), reads the device ID register, returns the device to QSPI mode and then reads the first several words from the flash. Indeed, you’ll often find this script still lying around in many of my designs as a piece of abandonware.
From the script, I’ll move to a piece of host software to do the same thing but only faster and (IMHO) more professionally.
Once the whole test starts to work, and only then, will I start to create a ZipCPU program that then tries to access and/or control the device.
Hardware design and debugging is of necessity a very methodical process. FPGA Hell is very real. If you want to avoid it, you can’t really skip steps. You should always 1) verify all of your components individually, 2) check that your design will work in simulation, before ever 3) attempting to get it to work in hardware. Even then, the first step in hardware shouldn’t be the final software step. Instead, the process should remain methodical as each piece of your design is tested, verified, and brought on line separately.
Some time ago I discussed which should come first, the CPU or the peripherals (and memory). I’m a strong believer that the memory and peripherals should always come before the CPU–for the sole reason that a design with a CPU within it is just that much harder to test and verify since there’s that much more that can go wrong.
Of course, some of this bias might have something to do with the fact that I’ve built (and had to debug) my own CPU, but I’ll let you draw your own conclusions on whether or not that’s the case.
If the iron be blunt, and he do not whet the edge, then must he put to more strength: but wisdom is profitable to direct. (Eccl 10:10)