NOTE: Since designing the logic for this FIFO, I’ve discovered several bugs in my own FIFO implementations using formal methods. I no longer trust the code below to properly handle reads on underflows, or writes on overflows. For a more reliable development, please see the tutorial. Even better, the tutorial example shows how to handle having 2^N elements in the FIFO, how to verify that the FIFO works properly, and how to handle backpressure on the two channels.

If you’ve ever waited in a line, then you understand what a FIFO is about. A FIFO refers to a First in, First out (i.e. FIFO) data structure that has many applications in both computers and electronics. The data structure is designed to support the digital equivalent of waiting in line.

Perhaps I need to underline that. FIFO’s don’t have one application, they have many applications in both software and electronic digital design.

Let’s see if we can find some examples, shall we?

A CPU, such as the ZipCPU

Interrupting a CPU costs time and performance. It’s not a simple thing to do, in spite of the hard work of many men to make interrupts fast and efficient. Hence, at some data rates, handling requests one at a time can just cripple a CPU–it’s just spending too much time handling the interrupt.

To deal with this, peripheral hardware is often created so that the CPU can mange many items at the same time–spreading the cost of the interrupt across many accesses.

Perhaps a good analogy here might be an airport taxi line. Only one person (family) can get into any taxi at a time, and then all the taxi’s move forward so the next person can get into the next taxi. In this case, when an airplane arrives (the CPU gets busy), the line suddenly swells as many people get off and start waiting in line for a taxi.
SDRAM Memory

Memory is also another candidate for a FIFO. Unlike the airport taxi line, Memory acts more like a school cafeteria buffet line. Only one class is allowed to use the line at a time, yet there are several stations that each student needs to visit to get his lunch. Hence, in the case of memory, you want to make many transactions at once, fill the pipeline, and then get the whole data through the line as soon as possible so the next class can come through.

In this case, you want a FIFO that can be filled, or nearly filled, and then make all its transactions at once and release the memory so that something else can use it.

(Example: DMA Controller, and a Wishbone-UART debugging bus FIFO)
A Universal Asynchronous Receiver Transmitter (UART), sometimes known as a serial port.

Computers like to load a serial port with data. However, the serial port can typically only handle one item every so many clocks. Often, the serial port can operate much faster than the computer can interrupt, but slower than the computer can issue characters to the port.

In this case, a FIFO can be used to allow the computer to write many characters to the port. These characters will then “wait in line” to be transmitted, and when the line is (nearly) empty, the computer can then send a whole bunch more characters to the port–keeping the port busy at all times.

A similar case for a FIFO can be made on receive as well. The computer just waits for the line to fill up, before it processes all of the elements in the line quickly.

(Verilog example code)
Audio

Audio is very much like the UART above. The CPU can write at one speed, but the audio may only be able to read at another. A FIFO can allow a CPU the ability to write a lot of things to the audio port at once, after which the audio hardware reads the samples out one by one.

Audio receiving is in many ways like some buses I’ve traveled on. The line for the bus grows and grows. Once it reaches a certain length (in the case of the FIFO), the “bus” comes along and clears the line.

(Verilog example code)
Video

Unlike UARTs and Audio, the video FIFOs I’ve worked with have rarely (if ever) required the CPU’s attention. They usually work in the background. When receiving video data, once a FIFO buffer fills, the buffer is then dumped into memory. Likewise on the transmit: the video controller reads from memory, fills its buffer, and then slowly sends it to the display one pixel at a time.

These are all examples of a FIFO.

Basically, you need a FIFO anytime something is going to be produced (written) at one rate, and consumed (read) at another. The buffer in the FIFO, then, adjusts like any line as items are added, or removed, from it.

It’s really a fundamental digital design component.

Let’s see what we need to do to build one.

The Goal

For our simple example here, we’ll assume there are three operations that can be done on a FIFO, and some parts and pieces of FIFO status that we may wish to know.

The FIFO may need to be reset. This clears any items from the line, and empties the buffer regardless of how many items were in line or how full the buffer ever was.
You can add someone to the end of the line. This is what happens when data is written to the FIFO.

In digital electronics, though, there is a maximum length to any line. Hence, if the line is too long, attempts to add one more item to the line will fail.
You can process the first item in the line. This is what happens when data is read from the FIFO.

Any attempt to read from an empty FIFO must of necessity fail.

Those are the operations that can take place on a FIFO.

For our discussion below, we’ll support three pieces of status:

We’ll want to know how many items are in the line. That way we can announce that the line is empty, full, half-full or … whatever condition is necessary to move forward.

Eventually, we’ll want to know if the line is full or empty, but we’ll just start with trying to know how many items are within it.
We’ll want to know if we ever spilled an item by adding too many items to the line (overflow), or if
We ever tried to read from the FIFO when it was empty (underflow).

Further, to outline the necessary logic, we’ll start with a software example of this FIFO, and then convert it to digital logic (i.e. Verilog).

Circular Buffer

Wikipedia’s FIFO article discusses creating a FIFO using dynamically allocated objects and pointers.

Where this fails is in digital logic, because physical logic resources are limited–dynamically allocated memory is unavailable.

Hence, we’re going to use a different form of algorithm here. We’ll use instead a circular buffer with a fixed size.

In this buffer, there are two pointers: the write pointer and the read pointer.

Whenever the read and write pointers are identical, as in Fig 1 below, we’ll use that as the indication that the buffer is empty. Initially, both pointers will point to the same address in the buffer (zero). Further, we’ll return both pointers to zero any time the user requests that the buffer be reset.

Fig 1: Empty FIFO

When an item is written to the buffer, the write pointer is incremented. This pointer then always points to the item not yet written to.

When an item is read from the buffer, the read pointer is incremented. This pointer always references the next item to be read.

If you look at a diagram of this, such as Fig 2, it looks like the read pointer is chasing the write pointer.

Fig 2: FIFO with data

What happens when the write or read pointer gets to the end of the buffer? The pointer in question simply wraps around to the beginning of the buffer. Because the buffer pointers just wrap around, this type of buffer is called a circular buffer.

You can tell when this type of FIFO is full, because write pointer one plus will equal the read pointer, as in Fig 3 below.

Fig 3: A full FIFO

Further, we’re going to use a special property of FIFO’s that are 2^N in length. That is, the read and write pointers will always fit into N bits, so no boundary checking is required. Likewise, the number of elements within the FIFO is given by the bottom N bits of the difference between the write and read pointers.

A Simple Software FIFO

Our focus today is going to be on implementing this operation in Verilog. Sometimes, though, it helps for formalism of expression to first implement something like this in C++. Hence, we’ll use the following C++ code to illustrate how a FIFO works.

First, we’ll declare what we stated above: our FIFO will have three operations. These are reset, write, and read. We’ll also want to be able to query how many items are in the FIFO.

class FIFO {
	bool	m_overflow, m_underflow;
	int	*m_mem, m_len, m_rdaddr, m_wraddr;
public:
	FIFO(const int lglength) {
		// On any allocation of a new FIFO,
		// allocate memory to hold our buffer,
		// and record how big the buffer is.
		m_len = (1<<lglength);
		m_mem = new int[m_len];
		reset();
	}

	void	reset(void);
	void	write(const int item);
	int	read(void);
	int	fill(void) const;
};

Resetting the FIFO will simply set both read and write pointers back to zero. It will also clear any error flags. Note that we aren’t going to clear the memory here–mostly because we won’t be able to do that in Verilog very easily later.

void FIFO::reset(void) {
	// Reset our pointers to the beginning of the memory
	m_wraddr = m_rdaddr = 0

	// Reset the error flags to false
	m_overflow = m_underflow = false;
}

Writing an item is almost as simple as setting the memory and incrementing the pointer. We’re going to add two more pieces to this logic, though. The first is required: the address of the next write pointer, nxtaddr may need to wrap around the buffer to be valid. So, we’ll check that first. Second, if the newly calculated next write pointer, nxtaddr, is equal to our read address, that’s an indication that the buffer is full. Here, we’ll cowardly refuse to write to a full FIFO.

// Produce an item / Write one item to the FIFO
void FIFO::write(const int item) {
	// First, calculate the address of the next
	// write pointer.
	int nxtaddr = m_wraddr + 1;

	// Adjust for any wrapping around the ends of the buffer
	if (nxtaddr >= m_len)
		nxtaddr = 0;

	// If this next pointer is the same as the read pointer,
	// the FIFO is full: cowardly refuse to write to the FIFO
	// in that casej
	if (nxtaddr != m_rdaddr) {

		// Actually write an item to the FIFO
		m_mem[m_wraddr] <= item;
		m_wraddr = nxtaddr;
	} else
		// Set an error flag on any attempt to write to a full FIFO
		m_overflow = true;
}

Reading from the FIFO is just about as simple: we want to return the next item in our buffer and increment the read pointer, wrapping if necessary. As before, though, there’s a twist: we only want to increase our buffer pointer if the FIFO wasn’t empty. Attempts to read from an empty buffer should create an underflow (error) condition, while leaving the FIFO empty.

// Read/consume one item from the FIFO
int FIFO::read(void) {
	int	item;

	// We'll return the next item to be read
	item = m_mem[m_rdaddr];

	// Only consume if the FIFO is not empty
	if (m_rdaddr != m_wraddr) {

		// Increment the read pointer
		m_rdaddr += 1;

		// Check to see if it wraps ppast the end of the buffer
		if (m_rdaddr >= m_len)
			m_rdaddr = 0;
	} else
		m_underflow = true;

	return item;
}

Finally, our last operation is to return the number of items found within this FIFO. This is given by the write pointer minus the read pointer, but done in such a way as to never return anything less than zero or greater than the size of our buffer.

int	FIFO::fill(void) const {
	int count;

	count = m_wraddr - m_rdaddr;
	if (count < 0)
		count += m_len;
	return count;
}

In many ways, this set of operations isn’t really complete. We haven’t returned the number of empty (unused) values in the buffer, we haven’t created any boolean values to return whether or not the buffer is non-empty, half-empty, or half full, neither have we made any methods to read whether or not an error condition has occurred. We’ll leave these exercises to the student.

First Verilog Cut

If we just tried to translate this C++ code to Verilog, we might end up with the always block shown below:

parameter	LGLEN = 8;
reg	[(LGLEN-1):0]	rdaddr, wraddr;
reg	[(INPUTWIDTH-1):0]	mem	[0:((1<<LGLEN)-1)];

always @(posedge i_clk)
	if (i_reset)
	begin
		rdaddr <= 0;
		wraddr <= 0;
		overrun  <= 0;
		underrun <= 0;
		o_fill   <= 0;
	end else if (i_write)
	begin
		if (wraddr + 1'b1 != rdaddr)
		begin
			mem[wraddr] <= i_item;
			wraddr <= (wraddr + 1'b1);
			o_fill   <= o_fill + 1'b1;;
		end else
			overrun <= 1'b1;
	end else if (i_read)
	begin
		if (wraddr != rdaddr) // If not empty
		begin
			o_item <= mem[rdaddr]
			rdaddr <= rdaddr + 1'b1;
			o_fill   <= o_fill - 1'b1;;
		end else
			underrun <= 1'b1;
	end /// Ooops!  What about ((i_write)&&(i_read))

With a little thought, it’s not too hard to see that this initial approach is going to have some problems.

What happens when a read and write request come in together?
wraddr + 1'b1 needs to be limited to N bits, otherwise we don’t get the proper wrap around the end of the buffer effect.

We’ll need to make some changes therefore.

Next Verilog Cut

Let’s see if we can’t improve on our first attempt to write the Verilog code for a FIFO with four basic changes.

The first change we’ll make is to separate the read and write tasks, and likewise the various variables, into separate always blocks. This will make it easier to handle concurrent reads and writes.

Our second change will be to capture the logic associated with testing whether or not the FIFO is full or empty into two flags: full and empty. This will help us figure out how to optimize things later.

Third, to deal with concurrent reading and writing, we’ll allow the following: Concurrent reads and writes are allowed, as long as the FIFO isn’t empty. This is to allow the memory a full clock to handle the write before the memory is available to be read.

Finally, we’ll separate the calculation of the number of items in the buffer into its own logic section.

So, to walk through the code, on every clock we will write to memory.

always @(posedge i_clk)
	mem[wraddr] <= i_item;

If there’s no write taking place on this clock, the memory write will just change the contents of an unused data item.

Likewise on every clock we will read from memory and create one of our outputs.

always @(posedge i_clk)
	o_item <= mem[rdaddr];

As with the memory write, if the FIFO isn’t being read on this clock then it won’t hurt us to set the output anyway.

Many FPGA’s will also allow you to gate these memory operation for no additional logic cost, as in:

always @(posedge i_clk)
	if (i_write)
		mem[wraddr] <= i_item;
always @(posedge i_clk)
	if (i_read)
		o_item <= mem[rdaddr];

While such gating may be useful, it’s not required in this example.

The next step in writing to the FIFO is handling the FIFO write pointer, wraddr. Since the logic used to adjust the FIFO write pointer is almost identical to the logic to adjust the overrun flag, we’ll place the two into the same always block.

always @(posedge i_clk)
	if (i_reset)
	begin
		wraddr <= 0;
		overrun  <= 0;
	end else if (i_write)
	begin
		// Update the FIFO write address any time a write is made to
		// the FIFO and it's not FULL.
		//
		// OR any time a write is made to the FIFO at the same time a
		// read is made from the FIFO.
		if ((!full)||(i_read))
			wraddr <= (wraddr + 1'b1);
		else
			overrun <= 1'b1;
	end

Other than separating this into its own always block, the biggest change between this version and our prior one is the introduction of the !full flag. This flag replaces the (wraddr+1'b1)!=rdaddr calculation from before.

The other more subtle change is that we will now write to the FIFO any time the FIFO is not full, or any time the FIFO is full and a read is taking place on the same clock.

The read address, rdaddr, and underrun error flag calculation below follow a very similar form to that of the write address and overflow calculation above.

// Set wraddr and underrun
always @(posedge i_clk)
	if (i_reset)
	begin
		rdaddr <= 0;
		underrun <= 0;
	end else if (i_read)
	begin
		// On any read request, increment the pointer if the FIFO isn't
		// empty--independent of whether a write operation is taking
		// place at the same time.
		if (!empty)
			rdaddr <= rdaddr + 1'b1;
		else
			// If a read is requested, but the FIFO was full, set
			// an underrun error flag.
			underrun <= 1'b1;
	end

The big difference is that the FIFO must be non-empty to read, even if a write is taking place on the same clock. This gives the memory a clock to store the value, before trying to read it.

The next and final trick is determining how full the FIFO is.

You might be tempted to set o_fill <= wraddr-rdaddr just like we did in our C++ implementation. This, however, doesn’t work. Such a fill measure will always be one clock out of date.

Instead, we’ll count how many items are in our buffer as those items are added or removed from the buffer. That means, though, that we’ll need to pay attention to not only the read and write request lines, but also whether or not the FIFO is full or empty when the request is made. This simple logic fits nicely into a case statement.

// Calculate the fill
always @(posedge i_clk)
	if (i_reset)
	begin
		o_fill <= 0;
	end else casez({ i_write, i_read, !full, !empty })
	4'b01?1: o_fill <= o_fill - 1'b1;	// A successful read
	4'b101?: o_fill <= o_fill + 1'b1;	// A successful write
	4'b1110: o_fill <= o_fill + 1'b1;	// Successful write, failed read
	// 4'b11?1: Successful read *and* write -- no change
	default: o_fill <= o_fill;	// Default, no change
	endcase

That leaves as our final task determining whether or not the FIFO is full, as well as what the next write address would be, were we to write to our FIFO on this clock.

assign	nxtaddr = wraddr + 1'b1;
assign	full  = (nxtaddr == rdaddr);
assign	empty = (wraddr  == rdaddr);

The above approach should yield a working FIFO.

It just won’t be a high speed FIFO, or a low logic FIFO implementation.

The problem is specifically the cost of calculating full and empty. Remember when I discussed keeping the conditions on an if-then-else branch simple? We just violated that rule.

We’ll have to try again, therefore.

High Speed Verilog Cut

This time, let’s keep everything the same as before, with the exception of the full and empty flags. Let’s spend some time focusing on those.

The problem with these two flags is that they are calculated within the nested if’s above. Further, they are calculated within the nested if of a large amount of logic, since each of the address lines sets N bits. What this means is that once the logic has been calculated, it then needs to be distributed among many elements–an additional timing cost.

We’ll solve this problem by pushing back the full and empty logic one clock earlier. This will keep the full and empty designations synchronous to the actual state of the FIFO on any given clock.

wire	[(N-1):0]	dblnext, nxtread;
assign	dblnext = wraddr + 2;
assign	nxtread = rdaddr + 1'b1;

always @(posedge i_clk)
	if (reset)
	begin
		full <= 1'b0;
		empty <= 1'b1;
	end else casez({ i_write, i_read, !full, !empty })
	4'b01?1: begin	// A successful read
		full  <= 1'b0;
		empty <= (nxtread == wraddr);
	end
	4'b101?: begin	// A successful write
		full <= (dblnext == rdaddr);
		empty <= 1'b0;
	end
	4'b11?0: begin	// Successful write, failed read
		full  <= 1'b0;
		empty <= 1'b0;
	end
	4'b11?1: begin	// Successful read and write
		full  <= full;
		empty <= 1'b0;
	end
	default: begin end
	endcase

We’ll get a warning for adding a 32-bit number to an N bit number when calculating dblnext, but rather than complicating the logic to get rid of the warning, we’ll leave it as is so you can see and understand what’s going on.

That’s it, though. Now you know how to build a simple (application independent) FIFO in Verilog.

Next Steps

Knowing how to build a FIFO is really the first step in many steps. It’s a required part of many component designs, and so being able to build one is a good skill to have.

Indeed, now that we’ve discussed how to build a FIFO, we can return to our debugging bus and create an asynchronous capabilities. Such a capability will be required to demonstrate the pipeline modes of the wishbone bus.

If you ever want to reference other examples, you’ll find many FIFO implementations posted on OpenCores.

Strangely, from my own experience, I tend not to reuse the FIFO’s I’ve built from one design need to the next. Sure, the FIFO I built for a UART tends to stay with that UART, but the FIFO’s I build for audio or video tend to be different from the one for the UART–even though the design outline above is roughly the same.

Instead, whenever I need a FIFO, I tend to copy from a working FIFO and then adjust the code to fit my new needs. The reason is simple: it seems that every application requires a different interface to the FIFO.

For example, a CPU will want to know from a UART receiver how many items can be read (i.e. o_fill), but from the transmitter it wants to know how many items can be written to it (N-1-ofill). Likewise, the CPU might wish to wait until the receive queue is halfway full (o_fill > N/2), whereas the transmit FIFO would want to wait until the transmit queue is half empty (ofill < N/2-1).

I guess this means that FIFO’s are one of the few places where the question of whether the glass is half full or half empty becomes important.

Another strange FIFO I’ve needed is the one to handle AXI bus requests. In this case, instead of a single set of read and write pointers, I’ve needed to use multiple pointers so as to properly know where a transaction request is in the process, and whether or not the AXI ready signals need to be dropped.

The point is just that, in spite of the basics, FIFO’s tend to differ from one context to another.

Let me know your thoughts and experiences below!