Some time ago, an individual wrote into Digilent’s forums asking why their FFT wasn’t working. You can read that interchange here [1] [2] [3] [4] [5].

My advice to anyone working on such a problem like this is that you need to build the debug infrastructure first, before you try to implement an FFT. Here on this blog, we’ve already discussed most of the pieces describing how to do just that:

  1. You start by getting a simple means of communicating with the device working

    We discussed an example of getting a basic serial port up and running here

  2. You use that communications channel to get some kind of bus up and running on your FPGA. (I prefer wishbone.) You then use this bus to read from the internal variables of your FPGA, or set variables within it.

  3. Once built, you can then use one of your peripheral registers to control a stepping signal, so as to step all of your logic under test by one clock.

    We discussed how to turn a serial port into such a debug peripheral here. We’ll discuss it more in the context of an FFT below.

  4. This works when debugging the Xilinx FFT, as well as my own FFT. Indeed, debugging any FFT is no more difficult than debugging any other component, as we’ll discuss in this post.

Why you need to break up the problem

The first step, though, is to break the problem into pieces, and to debug each piece individually.

FFT’s are rarely found all alone. Usually, they are found within a larger context. They are often connected to a sampling device, there may be other processing in front of them, and the whole often runs faster than your debug interface. Put together, a simple FFT architecture might look like:

The problem with this simple architecture is that, unless you can isolate the FFT component by itself, you will never know which of the components in this processing chain is failing. This was the problem the Digilent poster had when trying to get his FFT working.

This post will discuss how to isolate just the FFT.

Ideally, you could build a simulation which would allow you to simulate how this FFT works. However, if you are like me and enjoy building simulations from open source tools only, Verilator for example, then you’ll be stuck and unable to simulate a proprietary IP core anywhere other than on the FPGA itself. Hence, we’re going to run our test benches on the FPGA hardware itself.

To do that, we’re going to add a wishbone slave interface to our FFT, so that you can debug the FFT, and only the FFT, just like you would read and write from any wishbone slave peripheral.

My example

The following is an example piece of code, cut from a time when I needed to debug Xilinx’s FFT within one of my designs (I was comparing their implementation to my own at the time). Minimal edits have been made to simplify the presentation.

As with any test, you want to start from known conditions. This test is no difference. Hence, our first step will be to reset the FFT. We’ll do that by setting the reset line any time the user writes to the zero address associated witht he FFT.

// Writes to the control register reset the FFT.  Note that the reset lasts
// only one clock
always @(posedge i_clk)
	fft_reset<=(i_wb_stb)&&(i_wb_we)&&(i_wb_addr[3:0]== 4'h0);

The register names should be familiar from either the wishbone spec, or our discussion on how to build a simple wishbone slave.

The next step is to set up the input value for the each FFT clock. In our case, we’ll set one input value any time someone writes to the bus. Well, almost. In my example, I have two input samples because I was testing a two-sample input FFT.

// Writes to the FFT control logic
always @(posedge i_clk)
	if ((i_wb_stb)&&(i_wb_we))
	begin case(i_wb_addr[3:0])
		// 4'h0:	fft_reset <= 1'b1;
		4'h2:	fft_in_left <= { i_wb_data[31:(32-FFTBITS)],
					i_wb_data[15:(16-FFTBITS)] };
		4'h3:	begin
			fft_in_right<= { i_wb_data[31:(32-FFTBITS)],
					i_wb_data[15:(16-FFTBITS)] };
		default: begin end
	endcase end

Since I was setting the values two at a time, you’ll notice the FFT input values are name fft_in_left and fft_in_right—the even and odd inputs to the FFT respectively. Likewise, you may also notice that I accepted FFTBITS per input. This allowed me to experiment with input samples having less than 16-bits each, even though I was passing two values at a time (real and imaginary) packed into the upper bits of each half-word.

Now that the FFT has its inputs given and assigned to it, we then need to step the clock by one tick, and one tick only. To do this, we’ll use the clock enable (ce) line found within each FFT. We connect this clock enable line to the bus via a bus write: any time the user writes to address 3 of our bus, the clock enable line will get set for one clock tick.

// Writes to register #3 step the FFT
always @(posedge i_clk)
	fft_ce <= (i_wb_stb)&&(i_wb_we)&&(i_wb_addr[3:0] == 4'h3);

You may notice that this is also the register for one of our inputs (fft_in_right above). In this fashion, we only need to set the inputs in order to have the FFT step forward by one clock tick.

The last step is to read the results from the FFT.

// Reads from the FFT control logic
always @(posedge i_clk)
case(i_wb_addr[3:0]) // Read
	// Read from a control register
	4'h0: o_wb_data <= { 28'hff0000, 3'h0,
				fft_sync };
	// Read from the input port for the left channel
	4'h2: o_wb_data <= fft_in_left;
	// Read from the input port for the right channel
	4'h3: o_wb_data <= fft_in_right;
	// Read the FFT outputs, from first the left then the right ports
	4'h4: o_wb_data <= fft_out_left;
	4'h5: o_wb_data <= fft_out_right;
	// And ... set every other register to zero
	default: o_wb_data <= 32'h0000;

The first register allowed me to read back the status from the FFT itself. In particular, the FFT sets a synchronization flag on the first valid output from the FFT. In order to align our results with the FFT, we need to read that flag.

Reads from addresses two and three allowed us to verify that the bus was working, by simply reading back the values we’d written to the input channel.

Reads from addresses four and five allowed us to read the result from the FFT.

The o_wb_ack and o_wb_stall lines can use the same logic as we used for our simple wishbone slave implementation.

That’s it! You can now debug an FFT as a wishbone slave component, feed it with your test data, and single step it to see what it does and how it works!


So, we’ve now discussed how to debug an FFT isolated from everything else. With a little ingenuity, you should be able to figure out how to debug any other DSP logic on your FPGA in a similar fashion. This approach should get you to the point of being able to debug your processing flow all the way from the Pre-DSP component through to your reported results.

Where this approach fails is when you have real–time inputs to your FFT that you cannot slow down–such as the results from any analog to digital converter. There are two approaches to that problem:

  1. You can copy the outputs of your sampler directly into a buffer, record that buffer, and then use the data from that buffer as inputs to your FFT. That will allow you to continue using this debugging approach.

  2. You can also use some form of a scope to capture a snapshot of the real–time data as it runs through the FFT. This is the approach used by the wishbone scope, and an approach we’ll slowly work up to within this blog.

Which solution should you use? Both! But … we’ll get back to that in a later post.