ZipCPU highlights from 2018
It’s hard to believe that the first full year of the ZipCPU blog has now come and gone. Last year I only blogged for a half a year, starting in May of 2017. 2018, however, contained a full year of blogging. The pace of the blog has settled down a touch, adjusting from moving quickly early on, when I had nothing better to do and a lot of designs to discuss, to the current pace as I’ve had to spend my time building new designs to blog about and even managing to make a touch of money along the way.
Shall we look together across the last year and take stock in what has taken place?
Formal Verification
Towards the end of 2017, I tried my hand at formal verification for the very first time. This was at the request of the SymbiYosys developers. Since I was a poor contractor with no contract at the time, I figured I had nothing to lose. Much to my surprise, I quickly discovered I wasn’t the awesome Verilog developer I imagined myself to be. Working my way from core to core within the ZipCPU, I found and fixed bug after bug in components that I had thought were “working”.
I would’ve never found these bugs without using formal methods.
As a result, I became a fast believer in formal methods. I moved quickly, and even created a formal verification course which I was pleased to teach both at multiple corporate facilities, and to several individuals online. If this is something that interests you, I’m currently charging $4,000 plus travel expenses to come to your facility and teach formal verification to up to six individuals at a time. Alternatively, I have been known to teach one or two individuals at a time via video chat.
Perhaps what surprises me the most about my new-found love for formal methods is that I’ve earned a bit of a reputation for being a formal verification evangelist. While others may see me this way, this is not how I envision either myself or my own journey. My goal has never been formal verification as an end in itself. My goal has always been to earn a wage by producing working designs. To date, I know of no better method to do this than to use formal verification along the way. It’s fast. It’s easy. It will lead you right to any bugs.
This is also why the blog has shifted over the last year to placing so much emphasis on formal verification: it works, it works well, and I also find it to be a lot of fun.
ZipCPU Development
While the ZipCPU’s ISA has remained constant over the last year, that doesn’t mean that there’s been no work done on the ZipCPU. The biggest news on this front is that I was able to spend some time to formally verify the ZipCPU as a whole, and then present some of the lessons I’d learned at ORCONF2018. Perhaps you’ve seen the video of my presentation?
Since that time, I’ve had the opportunity to go back and rework the proof of the ZipCPU. The proof is now faster, cleaner, and “cheaper” (fewer clock cycles) than ever before.
That’s not all, though. The ZipCPU now runs on its first iCE40 board–the ICO board.
Other big improvements include:
-
The ZipCPU now has a (formally verified) data cache!
This is one of the first of several rather complex cores I’ve built where I started with formal. In hind sight, I’m not sure I could’ve built this data cache without formal methods. If the Lord wills, I’ll have the opportunity to present this cache implementation here on this blog.
I later came back to this data cache and optimized it for single cycle reads and writes, although peripheral writes are still multicycle.
-
The CPU proof now uses SymbiYosys throughout. The result is faster and cleaner, and easier to verify multiple separate configurations.
-
I finally fixed the bug in the CPU that was preventing the simulation script from starting the CPU at any location other than the reset address.
-
I found a way to optimize both the divide, and a regular shift and add multiply. The optimized multiply uses less than 50% of the original logic, whereas the (now optimized) divide uses about 20% less than before.
-
The ZipCPU toolchain now supports soft floating point via the libgcc library
-
The ZipCPU remains a highly configurable CPU. Over this last year, configuration options have been carefully changed from internal macros to parameters. Such parameters are easier to control from the external environment, meaning that the
cpudefs.v
file now only sets the CPU default options–the environment may still override them. -
All of the proofs have been rebuilt so that only the multiple clock domain cores (not found internal to the ZipCPU) use the (now superseded)
clk2fflogic
yosys option option, or itsmulticlock on
SymbiYosys replacement
If nothing else, I’d like to think that formally verifying a CPU now ranks me among the category of those who know how to do formal verification.
Ri5cy!
Thanks to SymbioticEDA, I’ve now had the opportunity to formally verify a CPU that wasn’t one of my own designs. Using riscv-formal, I verified the Ri5cy soft core CPU. Yes, I did find discrepancies between the operation of the Ri5cy core and the riscv-formal formal property list. Those have been delivered to the Ri5cy team, and hopefully they have been fixed by now. (I haven’t checked.)
It still floors me that I’ve managed to go from knowing nothing about formal verification to formally verifying not only my own CPU, but also someone else’s CPU in my first year of working with SymbiYosys.
Three thoughts come to mind. First, I’m just getting started. Second, this has been a lot of fun. Third, if a beginning FPGA developer can do it, so can a more experienced one.
AutoFPGA
If you aren’t familiar with AutoFPGA, then just remember: this is the tool I use to compose designs together. It is designed to make it easy to reconfigure a design, by adding or removing a peripheral. Unlike its Vivado or Platform Designer counterparts, AutoFPGA is open source, generates an open source interconnect, and does this in a way where you can examine the source code of the process at every step of the way.
While I use AutoFPGA religiously within my own projects, I’ll also be the first to admit that the project isn’t all that mature. It works well once a project is set up that uses it, and it works well for pipelined Wishbone buses. It does not (yet) work well for any other types of buses, and I already know that the interconnect it generates would not pass formal verification (yet).
Still, AutoFPGA has seen quite a few upgrades over 2018.
-
I recently blogged about the AutoFPGA linker script generation upgrade, allowing it to create and support multiple types of linker scripts.
-
While AutoFPGA was designed from the beginning to allow a sort of script-inheritance, I only started playing with it recently in my OpenArty project. I like it! It makes creating WB scope scripts as simple as referencing the prior script and only overriding the parts you need to.
-
Constraint insertions
In one of my designs, the ethernet core runs at one clock speed and the core system clock speed at another. This has required the addition of many false paths into the constraint file in order to support the Wishbone Scope that crosses both clock domains. Even though the Wishbone Scope handles all of the clock-domain crossings properly, Vivado complains about them. These false paths keep Vivado from complaining.
AutoFPGA can now copy false path constraints from a configuration file to the constraint file when the core is inserted into the design, or remove them from the constraint file when the core is removed from the design. This keeps Vivado from complaining, and makes it easier to reconfigure the design with a cross-clock scope.
-
Multiclock support
Some time ago, we discussed what it took to get multi-clock support from Verilator. This technique is now working very nicely with AutoFPGA in more than one design.
Most of my AutoFPGA work has involved the WB bus, version B4 in its simple pipelined mode. Currently, AutoFPGA only really supports this WB bus.
That said, I’ve had to work with other bus structures as well. As I’m sure you will reflect, there is nothing more painful to debug than a broken bus structure: it can take down the entire design, to include any debug structures you have for finding and catching bugs. Therefore, my bus bag of tricks now includes formal properties for not only WB, but also the Avalon and AXI-lite buses. This also includes various formally verified bridges between these buses.
Perhaps you recall how easy it was to verify an AXI-lite peripheral, and to find bugs within Xilinx’s demonstration design?
If the Lord is willing, I hope to add the full AXI bus to my repertoire of buses I can formally verify with confidence.
I’d also like to build a set of formal properties that can be used to formally verify my AutoFPGA bus interconnect. I’m aware that there are some subtle bugs within the interconnect as built, and I’d love to build a better interconnect that could be formally verified to be bug free. Specifically, if you access more than one peripheral in the same bus cycle, you might get your acknowledgments mixed or lost upon return. If done properly, this would place AutoFPGA one step above its competitors which can compose designs together, but cannot formally verify that the sum of the parts even works in the end.
Others
I’ve now counseled quite a few individuals on-line as they’ve worked through their bachelor or even graduate theses. Of those who have taken the time to join me learning Formal Verification, none have been disappointed. Here’s a quote from one particular individual who was building first just a line-associative register set, and then a full-blown CPU.
There’s always a reason for formal verification in digital design. This person was the lucky one. His design worked when he handed it in.
One other individual I remember started his journey on IRC and asked what language he should learn. I counseled him to learn Verilog, since the free version of SymbiYosys works with Verilog. He chose VHDL instead, since “Europeans use VHDL more than Verilog”. (Really??) Two days after he started, he came back on-line asking what was wrong with his design. I then told him, had he chosen to use the free formal verification tools with Verilog, he would have had his answer right now.
What do you think? Do you think I might be a bit biased? Perhaps. But my bias is towards what has worked for me, and I’ve now found more bugs in my own designs using SymbiYosys than I found with any of my prior testing approaches.
Beginners Tutorial
While this remains a work in progress, I have put together the first six of what I hope will be about ten lessons for beginning digital designers. These lessons tie beginning level Verilog lessons together with Verilator simulation and co-simulation lessons and even introduce the topic of formal verification as well.
My thesis behind this work has been that too many students take a course in Verilog that teaches them only the semantics of the language and nothing about the skills they actually need to do design. Worse, the new student spends too many hours learning Verilog, and as a result learns the Verilog syntax to describe simulation as well. On top of that, the student then gets the two confused when he comes to building his first complex design. So rather than taking separate courses in Verilog, simulation, formal verification, and then (finally) FPGA design, the tutorial I’m working on mixes lessons from all four of these topics together in order to teach FPGA design.
While I haven’t had that much feedback from the tutorial, those who have contacted me have either said it was wonderful, or they’ve given me the feedback necessary to make it better.
I’m looking forward to completing this tutorial in the new year, should the Lord be willing.
Top Ten
It’s now time to present the top ten articles on the ZipCPU blog from 2018!
The fact that this is my second year with the blog means there are now two different measures I can use to determine what the top ten articles are. I could list the top ten articles written this year by their number of hits, or I could list the top ten articles by number of hits regardless of when they were written. Even better, can’t we do both?
So let’s start with the top ten articles written in 2018. Along the way, I’ll share how many hits each article has had, and how the article fairs under a web-search for it. I’ll use DuckDuckGo for this purpose, since I’m afraid Google knows me well enough to tailor the results and so show the ZipCPU blog articles at the top of its list. Starting with number ten then:
-
Debugging a Cyclone-V, with 935 hits
This year marked my first work with a Cyclone-V SoC+FPGA design. I’m sure it will surprise no one to learn that it took a little more creativity than normal to debug the design. Having done that, there was quite a surprise I found along the way.
Searching DuckDuckGo for “cyclonev debugging” yielded this article as the number one result.
-
Interpolation is just a special type of convolution, with 940 hits
This was one of my big DSP hits for the year, proving that a signal processing interpolator was nothing more than a convolution. This is a really big result for anyone into signal processing, simply because it means that you can study your interpolation method in terms of the Fourier transform of the filter representation of your interpolator.
Searching DuckDuckGo for “interpolation convolution” yielded this article as the number one result.
-
ZipTimer: A simple countdown timer, with 1,091 hits
While I know I dig into some deep, difficult, and complex topics from time to time, blogging about the ZipTimer was light and fun. This article followed the development of a counter all the way from being a simple count-down counter, to an interval timer, to a programmable interrupt timer peripheral. As such, it has something for everyone from the beginning designer on up to the SoC designer.
Oh, and yes, I even went so far as to show how you might formally verify the timer peripheral as well.
This article was harder to find on DuckDuckGo. It didn’t show up on either the first or second page after searching for either “FPGA interrupt timer” or “programmable FPGA interrupt timer”. It did show up as the number one hit for “ZipTimer”, but that’s kind of cheating, so we won’t count that.
-
Reasons why Synthesis might not match Simulation, with 1,143 hits
Those who’ve followed the blog know that I’ve dedicated it to keeping digital design engineers out of FPGA Hell. One of the more frustrating positions to find yourself in is the one where everything works in the simulation, but nothing works on the hardware.
I’ll admit, writing about this article today feels like DeJa Vu all over again, simply because I’m struggling with this problem even today on a design I’m working with.
A DuckDuckGo search for “Synthesis simulation mismatch” yield’s this article as number eleven.
-
A Quick Introduction to the ZipCPU Instruction Set, with 1,326 hits
Thank you to all who have indicated an interest in the ZipCPU’s ISA! This article presents a basic introduction to that ISA, explaining the basics of how it is laid out.
Unlike many RISC-based CPUs, the ZipCPU only has a 5-bit opcode space. This limits the number of possible instructions to roughly 32. Of course, a CPU is more than just its opcodes, so the article also presents the register set of the ZipCPU, the basic form of an instruction’s operands, how most instructions can be executed conditionally, and much more.
It should come as no surprise that this is the top article returned following a DuckDuckGo search for “ZipCPU instruction set.”
-
How to build a SPI Flash Controller for an FPGA, with 1,378 hits
Although it doesn’t show it on the hits for this page, this article really starts out with a description of what formal development looks like in practice. That description goes over the development of this SPI flash driver. It was only some time later, after I’d worked with the driver for a bit, that I wrote up this essay describing the SPI flash driver, and what the important parts to it are.
In many ways, I’m rather surprised and flattered that this article was so well received.
Searching DuckDuckGo for “spi flash controller” yielded a touch of a surprise. Their number one article referenced my Quad SPI Flash Controller on OpenCores. The number five reference was this article.
-
Crossing clock domains with an Asynchronous FIFO, with 1,434 hits
To me, this asynchronous FIFO article was or at least should have been one of the big hits of the year. In many ways, I’m surprised this article didn’t take off with more hits than a fourth place finisher. An asynchronous FIFO is just so useful in so many designs, that any digital designer should really understand how one is built. Further, unlike other presentations you might come across, this presentation also includes properties that can be used to formally verify that an asychronous FIFO even works in the first place.
If you’ve ever needed to move lots of data across a clock domain boundary, and eventually we all need to do it, then you will want to read and understand this article.
Searching DuckDuckGo for “asynchronous FIFO” yielded this article as number twelve on its list.
-
An Open Source Pipelined FFT Generator, with 2,288 hits
Some of the readers of this blog had indicated that it felt like I was only writing for hobbyists. Because the focus of the blog has been on the ZipCPU, a design built for low logic implementations, they felt that the tools and techniques discussed here didn’t really apply well to the more industrial applications they had come across in their experience.
While this is far from true, I can understand how they came to this conclusion. One of my difficulties, as an author, is to present a complete topic in each article. This limits the things I might post about to things that can fit, from beginning to end, within a single article.
This FFT article pushed those limits. While I didn’t discuss the FFT implementation in its full gory detail, I did point out that 1) it had been formally verified, 2) many of the pieces weren’t all that hard to do, 3) one or two were very hard, and 4) it discussed at a broad level how I went about solving these issues.
DuckDuckGo apparently thought this article was a hit, since it comes up as number two following a search for “FFT generator.” I’m not sure this is a good or bad thing, since I have to imagine that most folks looking for an “FFT generator” are looking for something like FFTW. Still, I’m flattered.
-
Verilog Beginner’s Tutorial, with 2,706 hits.
In many ways, this isn’t a fair comparison: this web-page isn’t really an blog or article on it’s own, but rather a tree node pointing to several tutorial PDFs. Unlike the blog pages, someone might visit this tutorial page many times over. However, I’m including it here because the number of hits on this page was relatively high compared to all of my other blog articles.
Looking for this page by searching on “verilog tutorial” on DuckDuckGo found nothing. On the other hand, a search for “beginning Verilog tutorial” found this tutorial as number seven on their search results.
If the Lord wills, we’ll bump this result higher next year.
-
About the ZipCPU, with 12,710 hits
Like the Verilog Beginner’s Tutorial, this isn’t really a blog post. However, it did go viral this year after someone posted it to Hacker News. Because it got so much attention, it only makes sense to list it here.
How viral did this page become? Well, you can see in Fig. 13 below. Fig. 13 shows the distribution of ZipCPU.com page views across all of 2018. The hits for this ZipCPU about page form the largest spike in page views over the course of the entire year.
It should come as no surprise that a search for “ZipCPU” using DuckDuckGo yield’s the blog first, and the about page for the ZipCPU second.
Let’s now look at our top hits overall for the year, this time including articles written in 2017 as well as those written in 2018.
-
Implementing the Moving Average (Boxcar) filter, with 3,653 hits
As you may recall, a boxcar filter is a very basic filter that just averages values together. It’s a very basic filtering component, one that is easy to build and easy to use. This article describes that component, and was apparently well received last year.
If you search DuckDuckGo for “Moving average filter, verilog”, this comes up as the number two answer.
-
Minimizing FPGA Resource Utilization, with 3,658 hits
This article traces many of the design choices I needed to make in order to get a multi-processing system running on a Spartan 6/LX9. It’s a fun article, explaining the basics of what LUTs are and how to count LUTs when reading your code.
As I understand things, I think this particular article went viral last year before I was counting page views. By the time I started counting page views, it was no longer the highest hitting page. This year, it comes back again as number nine.
A quick search on “minimizing LUTs” using DuckDuckGo turns this up this article as its number one search result.
-
Using a CORDIC to calculate sines and cosines in an FPGA, with 3,967 hits
If you are doing DSP, you need a sine wave. The standard “textbook” way of generating a sine wave is to use a CORDIC. In 2017, I spent a lot of time developing both what a CORDIC was, how to use it to generate a sine or cosine wave, how to use it to evaluate an arctangent, and even how to verify it through simulation.
I’m looking forward to demonstrating next year, Lord willing, that there’s a better, lower logic approach that you can use that just requires two FPGA DSP elements (i.e. multiplies).
This is the nineteenth search result for “CORDIC” using DuckDuckGo. The fact that the github repository containing the CORDIC code presented here is number twelve was a bit of a surprise.
-
FPGAs vs ASICs, with 4,536 hits
Every now and then I like to thump my chest and think that I’m doing the same things ASIC designers do. Then I have to come back and review this article to get put back in my place. This article is popular enough that I have to believe there are others out there who are sharing my delusions.
Searching for “FPGA ASIC” on DuckDuckGo yielded this article as number seven.
-
Taking a New Look at Verilator, with 4,622 hits
This is my basic introduction to Verilator article. I reference it a lot. It’s also referenced from the Verilator web-site, so that might explain the number of page views here.
DuckDuckGo surprised me with this one too. Searching for “Verilator” yielded this article as number seventeen in its list. The surprise was that number 16 was my article on how to handle multiple clocks using Verilator.
-
The simplest sinewave generator within an FPGA, with 4.850 hits
Didn’t I say generating sinewaves was a fundamental DSP task? This article looks at the easy ways of doing that. These aren’t necessarily the best ways, but that depends upon your application.
Searching DuckDuckGo for “simple sinewave” yielded this article as number seven.
-
Getting the basic FIFO right, with 5,613 hits
I really need to revisit this article. While I think it’s basically a good article, it presents a FIFO with
2^N
storage elements as having(2^N)-1
usable elements. The asynchronous FIFO article shows how to get that last item into the FIFO for a minimal cost. This is another item on my to-do list–but not on my list of new year’s resolutions.I’m sure there’s a better DuckDuckGo search term, but neither “FIFO” nor even “FPGA FIFO” are turning up a reference to this page.
-
Arrow’s Max-1000: A gem for all the wrong reasons, with 5,850 hits
I’d like to think that this article was written before Trenz and Arrow got their act together with the Max-1000 FPGA development board. I’m looking forward to coming back to my Max-1000 project and running the ZipCPU on this board. It hasn’t happened yet. This article shares my frustrations with the board.
Perhaps I should be concerned that this article is listed as number three overall for the year?
If you search for “max-1000 fpga review” on DuckDuckGo, this article comes up as the number seven search result. Strangely enough, my TinyFPGA BX review comes up as result number eight.
-
Some Simple Clock-Domain Crossing Solutions, with 7,074 hits
Every FPGA designer needs to learn about clock-domain crossings. This article is my entry article into describing the basics of and the problems with clock domain crossings. I’m honored that it shows this favorably.
This one makes the number eleven search result position, following a search of DuckDuckGo for “clock domain crossing”.
-
About the ZipCPU, with 12,710 hits
The about page for the ZipCPU remains the number one hit for the year overall. Thank you to everyone who made that possible!
The Audience
Let me share one other fascinating chart with you before closing. Fig. 20 below shows the number of page views in December 2018 against the number of page views in December 2017.
The first obvious conclusion is that web hits are up. The actual number reflects a 60% increase in web hits over last year. Thank you.
The next very fascinating conclusion is that the number of hits is very cyclic on a weekly basis. The blog gets many more hits Monday through Friday than it does on Saturday or Sunday. This tells me that the majority of individuals looking at the ZipCPU blog are professional developers, and not so much the hobbyists. Thank you again. This is encouraging, although it wasn’t at all what I was expecting.
Happy New Year
To all of have chosen to read and share this blog, and to all who have been encouraged by it, let me wish you a hearty, “Thank You” and a “Happy New Year”! I’m looking forward to the year coming, and to continuing my blogging work here. Let me also wish you success in your chosen FPGA endeavors this year. I pray that your work would be blessed by the Lord as well, lest you find yourself stuck and wasting time in FPGA Hell.
The blessing of the LORD, it maketh rich, and he addeth no sorrow with it. (Prov 10:22)
May God bless you and your efforts.
Rejoice in the Lord alway: and again I say, Rejoice. (Phil 4:4)