Getting the basic FIFO right
NOTE: Since designing the logic for this FIFO, I’ve discovered several bugs in my own FIFO implementations using formal methods. I no longer trust the code below to properly handle reads on underflows, or writes on overflows. For a more reliable development, please see the tutorial. Even better, the tutorial example shows how to handle having 2^N elements in the FIFO, how to verify that the FIFO works properly, and how to handle backpressure on the two channels.
If you’ve ever waited in a line, then you understand what a FIFO is about. A FIFO refers to a First in, First out (i.e. FIFO) data structure that has many applications in both computers and electronics. The data structure is designed to support the digital equivalent of waiting in line.
Perhaps I need to underline that. FIFO’s don’t have one application, they have many applications in both software and electronic digital design.
Let’s see if we can find some examples, shall we?
-
Interrupting a CPU costs time and performance. It’s not a simple thing to do, in spite of the hard work of many men to make interrupts fast and efficient. Hence, at some data rates, handling requests one at a time can just cripple a CPU–it’s just spending too much time handling the interrupt.
To deal with this, peripheral hardware is often created so that the CPU can mange many items at the same time–spreading the cost of the interrupt across many accesses.
Perhaps a good analogy here might be an airport taxi line. Only one person (family) can get into any taxi at a time, and then all the taxi’s move forward so the next person can get into the next taxi. In this case, when an airplane arrives (the CPU gets busy), the line suddenly swells as many people get off and start waiting in line for a taxi.
-
Memory is also another candidate for a FIFO. Unlike the airport taxi line, Memory acts more like a school cafeteria buffet line. Only one class is allowed to use the line at a time, yet there are several stations that each student needs to visit to get his lunch. Hence, in the case of memory, you want to make many transactions at once, fill the pipeline, and then get the whole data through the line as soon as possible so the next class can come through.
In this case, you want a FIFO that can be filled, or nearly filled, and then make all its transactions at once and release the memory so that something else can use it.
(Example: DMA Controller, and a Wishbone-UART debugging bus FIFO)
-
A Universal Asynchronous Receiver Transmitter (UART), sometimes known as a serial port.
Computers like to load a serial port with data. However, the serial port can typically only handle one item every so many clocks. Often, the serial port can operate much faster than the computer can interrupt, but slower than the computer can issue characters to the port.
In this case, a FIFO can be used to allow the computer to write many characters to the port. These characters will then “wait in line” to be transmitted, and when the line is (nearly) empty, the computer can then send a whole bunch more characters to the port–keeping the port busy at all times.
A similar case for a FIFO can be made on receive as well. The computer just waits for the line to fill up, before it processes all of the elements in the line quickly.
-
Audio
Audio is very much like the UART above. The CPU can write at one speed, but the audio may only be able to read at another. A FIFO can allow a CPU the ability to write a lot of things to the audio port at once, after which the audio hardware reads the samples out one by one.
Audio receiving is in many ways like some buses I’ve traveled on. The line for the bus grows and grows. Once it reaches a certain length (in the case of the FIFO), the “bus” comes along and clears the line.
-
Video
Unlike UARTs and Audio, the video FIFOs I’ve worked with have rarely (if ever) required the CPU’s attention. They usually work in the background. When receiving video data, once a FIFO buffer fills, the buffer is then dumped into memory. Likewise on the transmit: the video controller reads from memory, fills its buffer, and then slowly sends it to the display one pixel at a time.
These are all examples of a FIFO.
Basically, you need a FIFO anytime something is going to be produced (written) at one rate, and consumed (read) at another. The buffer in the FIFO, then, adjusts like any line as items are added, or removed, from it.
It’s really a fundamental digital design component.
Let’s see what we need to do to build one.
The Goal
For our simple example here, we’ll assume there are three operations that can be done on a FIFO, and some parts and pieces of FIFO status that we may wish to know.
-
The FIFO may need to be reset. This clears any items from the line, and empties the buffer regardless of how many items were in line or how full the buffer ever was.
-
You can add someone to the end of the line. This is what happens when data is written to the FIFO.
In digital electronics, though, there is a maximum length to any line. Hence, if the line is too long, attempts to add one more item to the line will fail.
-
You can process the first item in the line. This is what happens when data is read from the FIFO.
Any attempt to read from an empty FIFO must of necessity fail.
Those are the operations that can take place on a FIFO.
For our discussion below, we’ll support three pieces of status:
-
We’ll want to know how many items are in the line. That way we can announce that the line is empty, full, half-full or … whatever condition is necessary to move forward.
Eventually, we’ll want to know if the line is full or empty, but we’ll just start with trying to know how many items are within it.
-
We’ll want to know if we ever spilled an item by adding too many items to the line (overflow), or if
-
We ever tried to read from the FIFO when it was empty (underflow).
Further, to outline the necessary logic, we’ll start with a software example of this FIFO, and then convert it to digital logic (i.e. Verilog).
Circular Buffer
Wikipedia’s FIFO article discusses creating a FIFO using dynamically allocated objects and pointers.
Where this fails is in digital logic, because physical logic resources are limited–dynamically allocated memory is unavailable.
Hence, we’re going to use a different form of algorithm here. We’ll use instead a circular buffer with a fixed size.
In this buffer, there are two pointers: the write pointer and the read pointer.
Whenever the read and write pointers are identical, as in Fig 1 below, we’ll use that as the indication that the buffer is empty. Initially, both pointers will point to the same address in the buffer (zero). Further, we’ll return both pointers to zero any time the user requests that the buffer be reset.
When an item is written to the buffer, the write pointer is incremented. This pointer then always points to the item not yet written to.
When an item is read from the buffer, the read pointer is incremented. This pointer always references the next item to be read.
If you look at a diagram of this, such as Fig 2, it looks like the read pointer is chasing the write pointer.
What happens when the write or read pointer gets to the end of the buffer? The pointer in question simply wraps around to the beginning of the buffer. Because the buffer pointers just wrap around, this type of buffer is called a circular buffer.
You can tell when this type of FIFO is full, because write pointer one plus will equal the read pointer, as in Fig 3 below.
Further, we’re going to use a special property of FIFO’s that are 2^N
in length. That is, the read and write pointers will always fit into
N
bits, so no boundary checking is required. Likewise, the number of elements
within the FIFO is given by the bottom N
bits of the difference between the
write and read pointers.
A Simple Software FIFO
Our focus today is going to be on implementing this operation in Verilog. Sometimes, though, it helps for formalism of expression to first implement something like this in C++. Hence, we’ll use the following C++ code to illustrate how a FIFO works.
First, we’ll declare what we stated above: our FIFO will have three operations. These are reset, write, and read. We’ll also want to be able to query how many items are in the FIFO.
Resetting the FIFO will simply set both read and write pointers back to zero. It will also clear any error flags. Note that we aren’t going to clear the memory here–mostly because we won’t be able to do that in Verilog very easily later.
Writing an item is almost as simple as setting the memory and incrementing
the pointer. We’re going to add two more pieces to this logic, though. The
first is required: the address of the next write pointer, nxtaddr
may need
to wrap around the buffer to be valid. So, we’ll check that first. Second,
if the newly calculated next write pointer, nxtaddr
, is equal to our read
address, that’s an indication that the buffer is full. Here, we’ll cowardly
refuse to write to a full FIFO.
Reading from the FIFO is just about as simple: we want to return the next item in our buffer and increment the read pointer, wrapping if necessary. As before, though, there’s a twist: we only want to increase our buffer pointer if the FIFO wasn’t empty. Attempts to read from an empty buffer should create an underflow (error) condition, while leaving the FIFO empty.
Finally, our last operation is to return the number of items found within this FIFO. This is given by the write pointer minus the read pointer, but done in such a way as to never return anything less than zero or greater than the size of our buffer.
In many ways, this set of operations isn’t really complete. We haven’t returned the number of empty (unused) values in the buffer, we haven’t created any boolean values to return whether or not the buffer is non-empty, half-empty, or half full, neither have we made any methods to read whether or not an error condition has occurred. We’ll leave these exercises to the student.
First Verilog Cut
If we just tried to translate this C++ code to Verilog, we might end up with the always block shown below:
With a little thought, it’s not too hard to see that this initial approach is going to have some problems.
-
What happens when a read and write request come in together?
-
wraddr + 1'b1
needs to be limited toN
bits, otherwise we don’t get the proper wrap around the end of the buffer effect.
We’ll need to make some changes therefore.
Next Verilog Cut
Let’s see if we can’t improve on our first attempt to write the Verilog code for a FIFO with four basic changes.
The first change we’ll make is to separate the read and write tasks, and likewise the various variables, into separate always blocks. This will make it easier to handle concurrent reads and writes.
Our second change will be to capture the logic associated with testing whether
or not the FIFO is full or empty into two flags: full
and empty
. This
will help us figure out how to optimize things later.
Third, to deal with concurrent reading and writing, we’ll allow the following: Concurrent reads and writes are allowed, as long as the FIFO isn’t empty. This is to allow the memory a full clock to handle the write before the memory is available to be read.
Finally, we’ll separate the calculation of the number of items in the buffer into its own logic section.
So, to walk through the code, on every clock we will write to memory.
If there’s no write taking place on this clock, the memory write will just change the contents of an unused data item.
Likewise on every clock we will read from memory and create one of our outputs.
As with the memory write, if the FIFO isn’t being read on this clock then it won’t hurt us to set the output anyway.
Many FPGA’s will also allow you to gate these memory operation for no additional logic cost, as in:
While such gating may be useful, it’s not required in this example.
The next step in writing to the FIFO is handling the FIFO write pointer,
wraddr
.
Since the logic used to adjust the FIFO write pointer is almost identical to the
logic to adjust the overrun flag, we’ll place the two into the same always
block.
Other than separating this into its own always block, the biggest change
between this version and our prior one is the introduction of the !full
flag. This flag replaces the (wraddr+1'b1)!=rdaddr
calculation from before.
The other more subtle change is that we will now write to the FIFO any time the FIFO is not full, or any time the FIFO is full and a read is taking place on the same clock.
The read address, rdaddr
, and underrun
error flag calculation below
follow a very similar form to that of the write address and overflow
calculation above.
The big difference is that the FIFO must be non-empty to read, even if a write is taking place on the same clock. This gives the memory a clock to store the value, before trying to read it.
The next and final trick is determining how full the FIFO is.
You might be tempted to set o_fill <= wraddr-rdaddr
just like we did in our
C++ implementation. This, however, doesn’t work. Such a fill measure will
always be one clock out of date.
Instead, we’ll count how many items are in our buffer as those items are added or removed from the buffer. That means, though, that we’ll need to pay attention to not only the read and write request lines, but also whether or not the FIFO is full or empty when the request is made. This simple logic fits nicely into a case statement.
That leaves as our final task determining whether or not the FIFO is full, as well as what the next write address would be, were we to write to our FIFO on this clock.
The above approach should yield a working FIFO.
It just won’t be a high speed FIFO, or a low logic FIFO implementation.
The problem is specifically the cost of calculating full
and empty
.
Remember when I discussed keeping the conditions on an if-then-else
branch
simple? We just
violated that rule.
We’ll have to try again, therefore.
High Speed Verilog Cut
This time, let’s keep everything the same as before, with the exception of the
full
and empty
flags. Let’s spend some time focusing on those.
The problem with these two flags is that they are calculated within the
nested if’s above. Further, they are calculated within the nested if of a
large amount of logic, since each of the address lines sets N
bits.
What this means is that once the logic has been calculated, it then needs
to be distributed among many elements–an additional timing cost.
We’ll solve this problem by pushing back the full
and empty
logic one
clock earlier. This will keep the full
and empty
designations
synchronous to the actual state of the FIFO on any given clock.
We’ll get a warning for adding a 32-bit number to an N
bit number when
calculating dblnext, but rather than complicating the logic to get rid of
the warning, we’ll leave it as is so you can see and understand what’s going
on.
That’s it, though. Now you know how to build a simple (application independent) FIFO in Verilog.
Next Steps
Knowing how to build a FIFO is really the first step in many steps. It’s a required part of many component designs, and so being able to build one is a good skill to have.
Indeed, now that we’ve discussed how to build a FIFO, we can return to our debugging bus and create an asynchronous capabilities. Such a capability will be required to demonstrate the pipeline modes of the wishbone bus.
If you ever want to reference other examples, you’ll find many FIFO implementations posted on OpenCores.
Strangely, from my own experience, I tend not to reuse the FIFO’s I’ve built from one design need to the next. Sure, the FIFO I built for a UART tends to stay with that UART, but the FIFO’s I build for audio or video tend to be different from the one for the UART–even though the design outline above is roughly the same.
Instead, whenever I need a FIFO, I tend to copy from a working FIFO and then adjust the code to fit my new needs. The reason is simple: it seems that every application requires a different interface to the FIFO.
For example, a CPU will want to know from a
UART receiver
how many items can be read (i.e. o_fill
), but from the transmitter it wants
to know how many items can be written to it (N-1-ofill
). Likewise, the CPU
might wish to wait until the receive queue is halfway full (o_fill > N/2
),
whereas the transmit FIFO would want to wait until the transmit queue is half
empty (ofill < N/2-1
).
I guess this means that FIFO’s are one of the few places where the question of whether the glass is half full or half empty becomes important.
Another strange FIFO I’ve needed is the one to handle AXI bus requests. In this case, instead of a single set of read and write pointers, I’ve needed to use multiple pointers so as to properly know where a transaction request is in the process, and whether or not the AXI ready signals need to be dropped.
The point is just that, in spite of the basics, FIFO’s tend to differ from one context to another.
Let me know your thoughts and experiences below!
He hath made everything beautiful in his time: also he hath set the world in their heart, so that no man can find out the work that God maketh from the beginning to the end.