If you aren’t familiar with the ZipCPU, then you should know that it is my attempt at improving the publicly available softcore CPU architectures. It has been designed from the ground up to be a truly Reduced instruction set computer, or RISC machine, to have a simple pipeline implementation, and yet to be able to run a multi-tasking operating system if desired. Unlike many of the other more common soft-core CPUs, such as MicroBlaze or the NiosII, the ZipCPU has been created in a completely open source fashion.
The ZipCPU was also designed to run on the cheaper, more commodity, FPGA hardware platforms. Indeed, in many ways this has always been the philosophy behind the ZipCPU: be small and simple, yet fully and completely functional. I judged, as I built it this way, that not only would it be easier to build and debug a simpler CPU, but also that it would be easier to add to an FPGA project as an afterthought if it was small.
Consider, for a moment: if you bought an FPGA, you did so for a purpose. If you wanted a CPU instead then there are many other CPU’s that you could have bought that would have run faster, and cost less, than the FPGA you purchased. As an example, Fig 2 shows a picture of the TeensyLC–a small CPU that sells for only $15USD. Since you didn’t purchase a TeensyLC, you must have purchased that FPGA for a reason–to perform a task that you couldn’t do with an off-the-shelf CPU. Indeed, I would imagine you want as much of your FPGA available to complete that task as possible. If, in the process, you find yourself needing a CPU on the same chip as your FPGA–then you want that CPU to stay out of the way, and to consume as few resources as possible.
This is, and was, the purpose of the ZipCPU.
We’ve already discussed several of the parts and pieces of the ZipCPU across many articles over the last year. For example, we discussed the divide unit when we discussed minimizing FPGA resource allocation. We discussed the ALU unit when describing how a simple ALU might be structured. We discussed the debugging needs of a CPU in general, as well as how to meet those needs in both simulation and in the hardware. More recently, we presented and formally verified a simple prefetch engine for the wishbone bus. Indeed, my recent post about the ugliest bug I’ve ever encountered was also based upon my experiences with the ZipCPU.
Today, let’s take a look at how the ZipCPU instruction set is laid out, and discuss a few of the ways it is different from some of the other, more common, soft-core CPUs of today. Our intention will by no means be to present a complete description of the ISA, but rather an overview. The ZipCPU specificationshould provide any missing details–if not, please let me know if you find something missing and I can add it in.
The Basic Operations
The ZipCPU was designed around a set of instructions all having the very simple form,
You can read this generic instruction as: if
X is true, then
OP is applied
to the number
# plus the value of register
Rb, and the register
and the result is placed into
Ra. Here I’m using
# to refer to an
immediate value–a fixed number encoded within the instruction stream.
Fig 3 attempts to show this operation graphically. Two registers are read
from the register file, noted here as
Rb. An immediate is added
Rb, or alternatively the immediate replaces
and the result joins
Ra to be operated upon.
For ALU instructions, the result is only written back if the condition is true.
Memory instructions are just a touch different. In the case of a memory
#+Rb value (immediate number plus the value of register
Rb) is used as the address for the memory operation. Further, the
operation only begins if the condition is true.
Ra is used as the
data source for a store operation, or the data result of a load operation.
The encoding for this and other ZipCPU instructions are shown
in Fig 4 on the right. Four bits are used to encode the destination register,
Ra, five bits are used to encode the opcode,
OP, three bits are used to
encode the condition,
X, and the remaining bits are used to encode whether
Rb is used and what immediate offset is used by the instruction.
Two instructions have special formats: the
MOV (move) and
immediate) instructions, we’ll come back to these further on.
The ZipCPU can also support compressed instructions, with their format shown at the bottom of Fig 4. While we’ll only touch on the Compressed Instruction Set today, you can read all about these two-for-one instructions in the ZipCPU specification if you are interested.
The Basic Operations, in more detail
The ZipCPU was designed to be a truly Reduced instruction set computer. As a result, it doesn’t have nearly as many instructions as its competitors: the lm32 processor, OpenRISC, RISC-V, NiosII, and MicroBlaze. We can go over some of these differences later.
For now, let’s take a quick look at the ZipCPU instruction cheat sheet, shown
in Fig 5. From here, you can see that the ZipCPU supports 25 basic
It has four special instructions,
and another six instructions reserved for a floating point co-processor–these
FP instructions. Further, eight instructions have been chosen to
also have a compressed representation.
That’s it. There are no more or hidden instructions, although a lot of the instructions within this list have some special functionality.
Shall we walk through these instructions, and discuss what each does in turn?
#+Rbfrom the value in
Ra, leaving the result in
I’ll write this as
Ra <= Ra - (#+Rb)to facilitate a simpler notation, since just about all of the instructions will have this form.
Ra <= Ra & (#+Rb)
Ra <= Ra + (#+Rb)
Ra <= Ra | (#+Rb)
Ra <= Ra ^ (#+Rb)
Ra <= Ra >> (#+Rb)(assumes Ra is unsigned)
In all of the ZipCPU’s shift instructions, the last bit shifted out of
Rais placed into the Carry flag.
Further, these shift instructions accept requests for shifts outside of the reasonable bounds
0--31, permitting instead any shift amount between
2^31–not that you’d need these extra amounts.
Ra <= Ra << (#+Rb)
Ra <= Ra >> (#+Rb)
This instruction implements an Arithmetic right shift. This is done by first assuming that
Rais signed, and then propagating the sign bit from the MSB down.
BREV, This is the “bit-reverse” instruction. For this instruction,
Rais assigned the value of
(#+Rb)but not until
(#+Rb)has been “bit-reversed”. That is, bit 0 of
(#+Rb)becomes bit 31 of
Ra, bit 1 becomes bit 30, etc.
This instruction is very unique to the ZipCPU, and yet it is also very fundamental to how the ZipCPU operates. By using a BREV instruction, the ZipCPU can load any 18-bit value into the upper bits of a register. If it is then followed by a LDILO, the pair of instructions can then load any 32-bit value into a register.
The BREV instruction is also very useful for bit-reversed addressing and bit-manipulation functions–such as counting trailing zeros in a number. It’s also used for the CLR (clear register) derived instruction.
LDILO, or Load Immediate Lo, assigns the lower 16 bits of
Rato the lower 16 bits of
MPYUHI, or multiply unsigned values and return the upper 32-bits, sets
(Ra*(#+Rb)) >>32. The multiplication involved assumes both
(#+Rb)are unsigned numbers.
MPYSHI, or multiply signed values and return the upper 32-bits, is identical to
MPYUHI, with the exception that the multiplication is done assuming both
(#+Rb)are signed numbers.
MPY, a 32x32-bit multiply which returns the lower bits of the result. Basically, this is given by
Ra <= Ra * (#+Rb), with the exception that
Rais set to the lower 32-bits of the product.
DIVU, a 32x32-bit unsigned divide.
Ra <= Ra / (#+Rb)
DIVS, a 32x32-bit signed divide.
Ra <= Ra / (#+Rb)
CMP, Compare. Sets the flags according to
Ra - (#+Rb). This instruction is implemented identically to the SUB instruction above, save that only the flags are affected by a CMP instruction.
Rais not written back to the register file.
TEST, is identical to the AND instruction, save that like the CMP instruction, TEST only sets the flags register and leaves
Two more basic instructions have subtly different forms.
MOV, a move instruction,
Ra <= (#+Rb). In this case, the move instruction always has an
Rbregister. If you want to move just the constant into a register, then use the LDI instruction instead.
The MOV instruction has the additional capability of moving values between register sets–something we’ll need to get to later. As a result, the range of the immediate values supported by the move instruction (13-bits) is not quite as many as those supported by the rest of the instructions above (either 14-bits or 18-bits).
LDI, or load immediate, has a little bit of a different form. The LDI instruction has no
Rbregister option. It is used for loading arbitrary values into
Ra, and written as
Ra <= #.
This instruction has also been stripped to its bare essentials to be able to load the largest value into a register as possible. As a result, it can load any 23-bit signed value into a register. Anything more requires a combination of a
BREVinstruction and an
The next six instructions are memory instructions. These are written a little differently, but they still read from left to right. For example,
stores the value of
Ra into the address given by
# plus the
loads the value of
Ra based upon the contents of memory given by the address
Rb plus the offset,
#. Both of these can be executed conditionally,
if the condition
X is true. (More on that later.)
Both of these instructions operate on a word, hence their mnemonic is store word or load word respectively. The ZipCPU supports four other memory instructions:
- LH or load halfword. This instruction loads a 16-bit value from memory
Raand then clears the upper 16-bits to get the result to fit into 32-bits.
- SH or store halfword. This instruction stores the bottom 16-bits of a register into memory.
- LB or load byte. The upper 24-bits are cleared.
- SB or store byte
These are all of the basic ZipCPU instructions.
See anything missing? If you are familiar with other CPUs, you may notice a lot of missing instructions. None of these, however, are truly required as combinations of the instructions above can be used to implement almost any instruction you might need. For example, Fig 6 shows several examples of instructions the assembler understands, yet whose implementation is derived from the instructions above.
Let’s back up a bit, though, and discuss the registers on the ZipCPU.
The Basic ZipCPU Register Set
The ZipCPU supports sixteen 32-bit registers in two sets, but
we’ll come back to the issue of the different register sets later. For now,
each set of sixteen registers is organized into registers
the hardware standpoint, all but the last two are general purpose, whereas
the compiler treats all but four of these registers as general purpose
Of these sixteen registers, the hardware treats the last two as special purpose
R15 is the
PC. This register maintains
the address of the next instruction the ZipCPU will execute within it.
R14 also has a special purpose–it is the condition code and
CC. Flags, such as whether or not the result of the last
operation was zero or not, are stored in the bottom four bits of the
LDI #,PCloads a 23’bit signed immediate value into the program counter,
PC. It can be used any time the absolute address of the destination is known by the assembler before linking–which isn’t very often.
ADD #,PCadds an amount to the program counter,
PC. This will execute a local branch, causing the CPU to read its next instruction
#words earlier or later depending on the sign of
#. Since this is such a common instruction, this is often abbreviated with by a branch always instruction,
BRA <address label>.
BRAinstruction is often used for jumping between locations within a given function–such as when executing a loop or an if. It is used anytime the assembler can tell that the distance to the target will fit within 18-bits.
LW (PC),PCfollowed by a new address, reads a new value of the program counter from the next word in instruction memory. This instruction is used heavily by the linker, since the next address in memory can later be simply set to any value once that value is known. This is also known as a long jump instruction, and so the assembler understands the
LJMP <address label>mnemonic, where
<address label>is an assembly label of where you wish to jump to.
R14, and (sometimes)
R13 as special registers.
R0 is known as the link register,
LR, where the return address to a
subroutine is kept. This leads to two other instructions:
MOV 8(PC),R0followed by
LJMP <function>is how the ZipCPU implements a long jump to subroutine command, LJSR–a shorter jump can be implemented with the
BRAinstruction–but only if the destination is known at assembly time to be in range. The assembler handles all of the constants, and selects between the instruction forms for you, so you may find these instruction pairs written in assembly as either
MOV R0,PCloads the link register back into the program counter. This is often the last instruction in any function. Indeed, it is so common that the assembler will also accept the RETN mnemonic for this instruction.
The GCC back
R14 as the stack
SP. Hence, you
SW R0,(SP) which will store
R0 onto the
If you choose not to optimize your code, and sometimes even if you do,
the compiler will use
R13 as a frame pointer,
FP. This register is
similar to the stack pointer,
and used to reference local variables within a function. In
general, I’ve tried to keep the compiler from using
R13 as a frame
pointer though–since it further limits the 14 general purpose registers.
It’s for these reasons that, at the beginning of any function, you’ll often see code looking like:
This allocates three words on the
of four bytes each, and then uses the first two of them
R1 respectively–presumably because the compiled routine
is going to clobber those registers and wants to use their values or restore
You may then find, at the end that function, code that looks like,
When I introduced the form of the
ZipCPU instructions above, I mentioned
that almost all instructions had the form
OP.X #+Rb,Ra. We’ve examined
the various operations,
OP, and the various registers that
can take on, but we haven’t discussed the conditions,
It is the condition
X that allows almost every
instruction to be executed conditionally.
The ZipCPU supports 8 conditions,
or possibly values for the
X, as shown in Fig 8.
If no condition is specified with the instruction, then the ZipCPU will always perform the indicated instruction.
Z, or the zero condition, will cause an instruction to only execute if the
Zflag in the condition codes is set.
If you are not familiar with condition codes, the basic idea is that if the result of the last instruction was a
Zero, then the
Zbit will be set. Hence, if you compare (i.e. subtract) two registers and the result is zero, then you know the registers are equal and you can use the
Zbit to do logic assuming the registers were equal.
LT, the less than condition, will cause an instruction to execute only if the result of the last instruction was less than zero. This is a signed comparison result, focusing on the
Negative bit in the condition codes.
C, will cause an instruction to execute only if the carry bit is set.
This is also how the compiler implements an unsigned less than condition.
The last three
are just negations of the earlier
NZ, or not zero,
GE, or greater than or equal to, and there’s
NC to test if the carry bit is not set (i.e., unsigned greater than or
The most common use of these conditions is in
ADD.Z #,PC will cause the
to jump only if the
Zero bit is set. Since this is also a
very common operation, the
if not zero),
if less than),
if greater than or equal),
if carry is set),
if carry is not set), and finally
You can also use these conditions to test multiple things at once. For
example, suppose you wanted to know if registers
R3 were all zero, and you wish to
to some target if they are all zero. In this case, you might write:
You could also do a test of whether or not just one of them was zero, such as:
These work because the
TST instructions always set the
when executed conditionally. Other instructions,
when executed conditionally, don’t affect the
strings of conditional instructions to all depend upon the same condition.
A good example of multiple instructions depending upon a single condition
would be an integer absolute value calculation.
Suppose you wanted to calculate the absolute value of
R0 and leave the
R0. You might then write,
The first instruction
-1–the default value if no other
value is given to
TEST. Since this is a
left unchanged and only the
flags are affected.
In this case, the
N flag will be set if
R0 is negative. We can then
complement every bit and add one to
R0 to negate it.
Notice how, in this process, the
XOR instruction didn’t affect the
flags, making it possible
to string the
ADD function to this chain as well–all operating only if
R0 was negative.
Why are conditional instructions a good thing? There is a real method and purpose to this madness. Conditional branches on the ZipCPU cost about 5-clocks, whereas conditionally executed ALU instructions still cost only one clock. Hence, the absolute value calculation above costs 3-clocks (ignoring prefetch stalls), whereas the alternative
would cost four clocks if
R0 needed to be negated, and six clocks if it
didn’t. This is in comparison to the three clocks for both conditions
You may notice that for all of the extra functionality in this section and the last, the ZipCPU still only offers the same basic 25-instructions. Branches, jumps, and subroutine calls are just special cases of these same instructions.
There are also some subtle details here as well. For example, some
instructions aren’t allowed to set the
LDILO instructions and anything that
writes to the
PC or the
condition code register,
In a similar fashion, any conditionally executed instruction, with the
TEST, will not affect thew
Well, not quite. The ZipCPU does have four more special instructions that we need to discuss in the next section.
The ZipCPU also supports four special
Other special instructions, such as
derived instructions from the basic instructions listed above. We’ll
come back to the
RTU instruction in the next section when we discuss
the purpose for the two separate register sets.
BREAK instruction was built for the
By replacing any instruction with a
BREAK instruction, the currently
running code will halt at that instruction–without executing it. This will
in a state where the debugger can then examine what’s going on within it,
single step over the break, and then continue until the next break.
LOCK instruction is used to support
instructions are ones where you want to read something from the bus, operate
upon it, and then return the modified value. For example, an
increment might look like,
LOCK function works by disabling
and then making sure that
CYC line is not
lowered between the
LW (load word) and
SW (store word) instructions.
After three instructions, the number shown above, the lock is released.
SIM instructions are very similar, although they look
different on the surface.
NOOP is a simple
no-operation instruction–an instruction that doesn’t do anything. When the
ZipCPU encounters a
it does nothing. When the ZipCPU
SIM instruction, while running in hardware, the
ZipCPU halts with an illegal
These two instructions have some other capabilities when used within the
simulator: they can be used to send values to the simulation terminal via
SOUT (a SIM) or
NOUT (a NOOP).
For example, you can print either a single character to the terminal,
NOUT 'c', a register’s value,
NDUMP R0, or even the full register bank to
NDUMP, using the
lower bits of these commands. The
also understands mnemonics allowing you to string together multiple
characters into a single
SSTR, to print to the terminal.
In the case of the
NOOP instructions, once placed onto the actual hardware
these simulation only capabilities will be quietly ignored.
Let’s now come back to those two register sets, since they are used to help the ZipCPU handle interrupts. Indeed, the ZipCPU has a fairly unique interrupt architecture. For example, the ZipCPU only recognizes one type of interrupt. When the CPU recognizes an interrupt, the ZipCPU just switches from user to supervisor register sets.
Basically, it works like this: upon any reboot, the ZipCPU boots into supervisor mode. This mode uses one set of sixteen registers—the supervisor set. When the CPU is ready to enable interrupts, it switches to user mode where the other set of registers are used—the user set. Then, on any interrupt, user trap, or processing exception, the CPU returns to supervisor mode.
To make this possible, the
MOV instruction has been given a special
capability. It can be used to
MOV registers between the two register
sets–but only when the
is in supervisor mode.
For example, on most processors, an interrupt will:
Automatically place a couple of user registers (the program counter, stack pointer, etc.) into a special place. This may either be onto the stack, in older ISAs, or in a couple of special purpose registers–as on more recent CPUs.
This table needs to be carefully set by a microcontroller, often in a special memory location or special purpose register. Any mistake in this process and the CPU will try to execute instructions from a non-existent memory address.
A special instruction such as an IRET instruction (interrupt return) is issued at the end of the ISR to return to the previously running program.
As shown above in Fig 9, the ZipCPU
starts its processing in supervisor mode. Before the
can switch to user mode, it creates a set of registers for
user mode. These are either loaded via
MOV instructions, or by the
zip_restore_context(int *) C-language
built-in. This latter function call loads and sets all of the
ZipCPU registers from a memory array.
It can then switch to user mode via an
RTU instruction. The
instruction itself is implemented by an
OR instruction that just sets the
GIE) bit in the
A C-language built-in,
zip_rtu(), can also be used to execute this jump
RTU instruction is issued, the
ZipCPU starts executing instructions
using the user register set.
If the user program needs to return to supervisor mode, it can can clear the
GIEbit with either an
LDIinstruction. This will send the ZipCPU back into supervisor mode as well.
WAITinstruction will cause the CPU to enter into user mode (if it isn’t in user mode already), but then sleep until the next interrupt. This instruction is also implemented via a basic
HALTinstruction acts in an identical fashion when executed in user mode. When executed in supervisor mode it will actually
Once the CPU returns from user mode, it will return to the supervisor code where it left off. You can see this in the multi-tasking code found in the S6Soc kernel software. From a high level, that code looks like:
Did you notice where the return to userspace
RTU instruction was? That
zip_rtu() instruction. Between when this instruction is issued
and when it returns, any user space program might run.
How about interrupts?
Did you notice where the
pic = _sys->io_pic grabbed the current state of the
(an external module), which could then be queried to see if the reason for
zip_rtu() returning was because of an
Indeed, once I realized how easy it was to swap between different tasks in a multi-task concept, I found myself personally rather excited by the possibilities that the ZipCPU offered for studying Operating System fundamentals from C.
Differences between other CPU’s
Okay, so that’s what the ZipCPU instruction set looks like. But how does it compare to other soft processors? In particular, the ZipCPU instruction set could easily be compared to many other soft-core CPUs, such as the lm32 processor, OpenRISC, NiosII, and MicroBlaze. Let’s take a look at some of the key differences between the ZipCPU and some of these other processors.
The first big difference is that the
ZipCPU does not support three operand
instructions. An example of such an instruction might be to set register
Rd to the sum of
ADD Rd,Ra,Rb. Did you notice how
this instruction read right to left? This is common with other instruction
sets as well.
Why doesn’t the ZipCPU offer three operand instructions? Simply because
it would complicate the instruction
In particular, you’d need to decode more than just the four basic instruction
formats above. Most of these processors, for example, have instructions that
take zero operands, instructions that take one operand and an immediate
LDI), instructions that take two registers and an immediate, and
instructions that take three registers and then have barely any room
for any immediate values (11’bits).
The next thing you’ll notice is that the ZipCPU has a 5-bit opcode to select among the various instructions. These other processors use a 6-bit opcode, and when that isn’t enough they steal bits (as in the case of the MicroBlaze CPU) from their immediate space. The resulting reality is that the ZipCPU actually has a more Reduced instruction set than these other processors.
When it comes to special registers, the ZipCPU is actually very unique. In contrast to MicroBlaze’s 25 special registers, or the 65+ special registers of either OpenRISC or RISC-V, the ZipCPU has only two special hardware registers–the program counter and the condition codes register. Other functionality, such as the interrupt controller, or even the direct memory access (DMA) engine’s control registers, are placed on an external bus near the CPU, so that these pieces may be added (or removed) according to the needs (and logic scarcity) of your particular environment and application.
A fifth way the ZipCPU is unique is in the number of registers. The ZipCPU
offers 14 general purpose registers to user space. Most of these other
offer 32 registers–but only with a lot of caveats.
For example, you can’t use
R0 since the compiler
depends upon it to be equal to zero. Another register may be used to form
constants in the assembler, and so its off limits to the compiler. By the
time you drill down further, you’ll discover that perhaps only 24 registers
are available. Of these 24, a rough half of them are assumed to be clobbered
on any function call and need to be saved on the stack anyway. Further,
saving registers to the stack is really the limiting factor in any choice of
register size. As a result, the
14 general purpose registers really don’t limit the
ZipCPU’s performance significantly
in comparison to these other
When you start looking at actual instructions, the
ZipCPU might initially appear
less capable. For example,
SUBC instructions (add or
subtract with carry), neither does the
ZipCPU offer any
SEXT sign extension instructions,
CLZ count leading
(or trailing) zero instructions,
left (or right) instructions and more. However, these are all fairly rare
instructions and workarounds are easy to come by. Indeed, the
once had a rotate left instruction. That instruction was later removed
because 1) the compiler never used it, 2) very simple alternative
instruction combinations were already available, and
the ZipCPU needed to support
8-bit bytes in order to be POSIX compliant.
Of course, the next place the ZipCPU shines is with its simplified bus architecture. I’m not sure if you saw this thread or not, but it shows that the ZipCPU–even without a data cache, can still outperform a MicroBlaze simply due to (what I believe is) its simplified bus architecture.
There are actually many other features contained with the ZipCPU, and even other differences between it and other softcore CPUs, then this simple post could discuss. For example, the ZipCPU can single step code from either supervisor or user mode and more.
Further, time wouldn’t permit discussing the various I/O peripherals that can be optionally added to the ZipCPU–peripherals such as an interrupt controller, performance counters, a DMA controller, simplified timers, and more. At least, time today won’t permit it. These components are all fair game for future blog posts.
Some parts of the ZipCPU, however, remain a work in progress. For example, while an MMU exists, I have yet to integrate it into the rest of the CPU. In particular, the prefetch cache will need to know when to invalidate cache lines due to writes, something I haven’t gotten to yet. Likewise, while a data cache implementation exists, it also has yet to be integrated and has since become a touch out of date. Once those two are integrated, my next plan is to host Linux from the ZipCPU–I just haven’t gotten that far yet. Perhaps the reason is … I haven’t needed to.
Know ye not, that to whom ye yield yourselves servants to obey, his servants ye are to whom ye obey; whether of sin unto death, or of obedience unto righteousness? (Rom 6:16)