In bullets, the ZipCPU is:
A 32-bit CPU. All registers are 32-bits wide, addresses are 32-bits wide, instructions are 32-bits wide, etc.
A RISC CPU. It implements only a minimum set of instructions, a much smaller set than most other “RISC” CPU’s.
A Load/Store architecture. Only load and store instructions can access memory
A Von-Neumann architecture. Both instructions and data share a common bus
A pipelined architecture, having stages for prefetch, decode, read-operand, execute, and write-back. The execute stage is implemented by one of four blocks: an arithmetic logic unit (ALU), a memory unit, a divide coprocessor, and a (not yet implemented) floating point coprocessor. The last two blocks, the divide and floating point coprocessor, will only ever be options to the CPU. Their inclusion will be chosen based upon your implementation needs.
The ZipCPU was designed to be implemented on a low cost FPGA board. Reasons for this include:
I couldn’t afford the FPGA board that I really wanted, the VC707, much less the license I would need to program it. Instead, I could afford the much smaller boards ($50-$150), such as the ones Digilent sells.
Prior to purchasing any boards or licenses, I simulated my designs using Verilator. However, Verilator only works with Verilog source, not encrypted proprietary IP components. Hence, when I wanted to simulate an FFT with neither hardware nor proprietary IP, I was forced to build my own. The same became true for the ZipCPU. Incidentally, the “simulate before you buy” technique worked so well for my first board, that I had my initial design running within two days after I received the board.
Unique Features of the ZipCPU
o Your simulation CPU’s power is the limit of this simulation. As an example, you could, if you wished to, run the CMod-S6 simulation all the way from power up through several rounds of 4x4x4 Tic-Tac-Toe. (Just don’t run it in debug mode all night, at the peril of filling up your disk drive.)
Since the ZipCPU was designed around the pipelined Wishbone bus, found within the Wishbone B4 specification, the ZipCPU enjoys memory accesses that are between three and thirty times faster than the OpenRISC core. (Their cache implementation is still better than mine, though …) Further, because of the many, many options and channels required in order to implement the AXI bus used by Xilinx’s IP, the Wishbone is simpler and hence both easier to work with, and it requires less logic to use.
Because Gisselquist Technology, LLC, owns all of the code for the ZipCPU and its peripherals, proprietary licenses may be purchased. This sets the ZipCPU apart from the other OpenSource soft core CPUs, such as OpenRISC, whose IP may not be owned by any single entity with whom one might negotiate a purchase.
You can find out many of the details of this CPU within the ZipCPU repository on GitHub. There, you will find the specification for the CPU which contains not only the obligatory description of its instruction set, but also examples of how to program with it as well as my ongoing “honest assessment” of it as a CPU.
Example ZipCPU designs
However, Gisselquist Technology has publicly released several designs that use the ZipCPU, all available on GitHub for you to examine. These include the S6SoC project, which fits on the Xilinx Spartan-6LX4 found within a Digilent CMod S6, the OpenArty which fits on a Digilent Arty, or the xulalx25soc which fits on Xess.com’s XuLA2-LX25 board. Upon customer request, the xulalx25soc now has a build option which can be used to build a version for the Spartan-6LX9 found on the XuLA2-LX9 board which Xess.com used to sell.
Another ZipCPU design you may wish to look at is the basic ZipCPU design called zbasic. This design has support for a flash, block RAM, and a serial port. It’s also my testing grounds for getting the SD Card controller to work.
The ZipCPU has undergone several instruction set revisions, going from a four bit opcode supporting only 32-bit bytes, to a five bit opcode, and finally to a 5-bit opcode with support for 16-bit compressed instructions and 8-bit bytes. I see no reason at this time to adjust the instruction set any more.
The current instruction set has newlib, GCC, and binutils support–although the soft floating point emulation support is lagging a touch.
A minimal O/S exists for the ZipCPU. Further O/S support is expected, but lagging behind getting more peripheral support.
Here’s a summary of the clock rates the ZipCPU can achieve on a variety of commercial boards:
|Ico-Board||40 MHz||?||This icoboard design is still in development|
|S6SoC||80 MHz||18||When running from flash|
|Arty||82 MHz||1||Clock speed limited by SDRAM clock|
The XuLA design was used in the summer of 2016 to benchmark the ZipCPU using the Dhrystone benchmark. At the time, the ZipCPU only supported 32-bit bytes, so it wasn’t a proper fit for the Dhrystone benchmark. Still, it was able to accomplish 0.744 DMIPS/MHz packing one character in each 32-bit byte. If you instead packed four characters into the 32-bit bytes used by the ZipCPU at the time, the CPU could achieve 0.95 (modified) DMIPS/MHz. These results were presented at ORCONF, 2016.
Since then, the CPU has been modified to support 8-bit bytes. I have not returned to the Dhrystone benchmark, though, to update its performance measure.
As of ORCONF, 2016, the ZipCPU used between 1286 and 4926 6-LUTs, depending upon how it is configured. This number is out of date, though, for the same reason that the Dhrystone benchmark measure is out of date: the ZipCPU has since been modified to support 8-bit bytes. I can say, though, that I was able to pack more logic into the CMod-S6 as a result than I could pack into it before, suggesting that the new and improved CPU uses even fewer resources.