Even I get stuck in FPGA Hell

This site is dedicated to keeping students and other digital design developers out of FPGA Hell: that state in the design process where your design doesn’t work, and you have absolutely no clue why not.

I’d like to present myself as immune from ever having that problem. How else shall I be respected as a teacher of others who can teach students how to avoid this problem?

Today, though, I have a confession to make: I get stuck in FPGA Hell from time to time as well.

By the grace of the Almighty, I’ve recently received three reprieves, so that I can now tell you both how I got stuck, and how I got unstuck.

HDMI Video

The background: One of my ongoing projects is an HDMI Video project. This project has two components. The first is the receiver. This component is supposed to receive an HDMI signal and stuff the pixels into memory. The second component is the transmitter. This component is supposed to read an image frame from memory, and transmit it to my monitor. The ultimate goal of this project is to be able to process the HDMI signals associated with 3D head-sets, such as the Oculus Rift.

I’m doing my development on a Digilent Nexys-Video board. This board has not only the required HDMI input and output ports, but also enough memory to tackle the data transfer. (We can talk later about memory bandwidth, which for this application will require some creative solutions.)

I had the project working some time ago to the point where I could lock on to the received HDMI signal and calculate it’s pixel clock rate, frame rate, and even the number of lines per frame, pixels per line, horizontal sync length, vertical sync length, front porches, back porches, etc. In other words, from the incoming data, I can calculate all of the required video parameters to set an associated mode line.

I can also read and process the EDID information using the wbi2c project.

Then I tore the project apart so that I could incorporate the 512MB DDR3 SDRAM memory into the design.

Fig 1: Broken HDMI

The symptoms: Since finishing the restructuring changes necessary to get the DDR3 SDRAM memory to work, I hadn’t been able to get the incoming video to lock at all, and I’ve been struggling to figure out what was wrong. Fig 1 shows my test setup. I was using my wishbone scope to capture frames of video data. I could then use these with a home-made HDMI simulator to simulate my code. The resulting code just didn’t contain the synchronization pattern that I knew was there. The data was somehow wrong, but I just couldn’t figure out what logical transformation would correct it.

The problem: Just this week, I figured out what was going on.

Fig 2: Broken HDMI bug found

Since adding the SDRAM, I had changed my system clock from the incoming 100MHz clock rate, to the 100MHz clock used by the Xilinx MIG generated SDRAM controller. (I’d still like to use my own DDR3 SDRAM controller, but that project is currently on hold.) This controller clock is subtly different from the 100MHz input clock, even though the two are at the same frequency: it takes some time for the PLL to settle, and there’s a phase difference due to the distribution network. (These are only the differences I know of.)

You can see what the broken configuration looked like in Fig 2.

These differences were apparently enough that a reset line I was setting with logic on the 100MHz clock, was failing to reset the ISERDESE2 component on the 148.5MHz HDMI pixel clock.

Fig 3: Broken HDMI bug fixed

How did I find the problem? The worst way to find a problem like this is by desk-checking your code. It is, however, how I ended up finding it. I knew, from pixel captures, that the problem had to be in the ISERDESE2 component. I had traced it there via my wishbone scope. Then, thanks to the Almighty, in one sudden piece of inspiration I realized the problem.

The code in question now includes an asynchronous reset, with a synchronous release, such as is shown in Fig 3.

wire		async_reset;
reg	[2:0]	reset_pipe;
always @(posedge i_clk, negedge i_ce)
	// The !i_ce signal is our reset indication
	if (!i_ce)
		reset_pipe[2:0] <= 3'h7;
	else
		reset_pipe[2:0] <= { reset_pipe[1:0], 1'b0 };
assign	async_reset = reset_pipe[2];

I also added a separate piece of logic to “synchronize” the CE signal to the HDMI pixel clock:

wire		lcl_ce;
reg	[1:0]	syncd_ce;
always @(posedge i_clk)
	syncd_ce <= { syncd_ce[0], i_ce };
assign	lcl_ce = syncd_ce[1];

These two changes fixed the problem.

This, by the way, is one of those reasons why metastability can be so problematic. The symptoms of metastability tend not to make any sense. They draw you away from looking for clock domain transfer problems, convincing you that your logic is somehow strangely at fault. In the end, I’ve only ever found metastability and other clock-domain crossing related problems by desk-checking my code. (If you know of another way, please feel free to share …)

ICO Board Parallel Port

The background: The second problem I was struggling with was on my ICO Board project. The ICO Board is designed to demonstrate the utility of a full open-source tool-chain for FPGA development. It is based upon an ICE40 FPGA with 8k logic gates. My goal with this project was to create a series of beginner demonstration designs that others could reference. Indeed, the board was given to me by the project team as a gift for that purpose.

My problem was that I’ve been struggling to get a debugging bus running over the parallel port between the FPGA and the Raspberry Pi for some time.

You can see how far I had gotten in Fig 4 below.

Fig 4: Broken IcoZip Project

The symptoms: The bus worked fine in simulation, but somehow struggled any time I tried to actually place it on the board. I placed an extra 8-LEDs onto the board, but still couldn’t figure out what was going wrong. In particular, I couldn’t tell if it was that the Raspberry Pi was talking too fast, or whether something was wrong within the ICO Board. At one time I was afraid I was creating a short across the parallel port interface, and so somehow the ICO Board was losing power in the middle of a transaction.

I even went so far as to connect a PMod USBUART to the board, so that I could use a known working debug-bus, based upon my proven UART code, to find the fault, and … even my proven code didn’t work right.

How did I find the problem? In this case, a heart-felt thank you goes out to both the Almighty, and to the Digilent sales team, who were looking for some feedback on their Digital Discovery device, shown in Fig 5.

Fig 5: A Digital Discovery logic analyzer

Out of the blue, they asked if I’d like to review the device for them, and send them back my thoughts. Of course I’d be interested, I said.

If you aren’t familiar with the Digital Discovery it’s an FPGA based external logic analyzer. Digilent sells the device for $200 USD. It boasts the ability to capture and analyze data signals at up to 800MHz. However, my ICO board design was only running at 40MHz–a slow clock rate since I couldn’t tell what was wrong with the design, and I wasn’t certain how well I could trust my timing analyzer. (In the end, the timing analyzer wasn’t the problem, and I could’ve trusted the icestorm tools just fine.)

To use this device, I connected one of the PMod ports of the ICO Board to the Digital Discovery, and started making and examining captures.

One particular capture showed one of my logic signals holding a value for less than 10ns. This didn’t make any sense, since the clock period was supposed to be 25ns. Further, it didn’t make sense as a spurious value that hadn’t yet settled yet, since the design then continued as though this value had been high for a full clock cycle. (To the extent that I could tell anything about what this erratic design was doing …) That left me wondering what the actual clock rate was, so I decided to dump the system clock to one of the output pins and examine it.

You can see the final test setup in Fig 5 below. It was enough to see the problem.

Fig 5: IcoZip Problem found

The Problem: As with the HDMI problem, this problem also turned out to be clock related. In this case, I had given the wrong parameters to the iCE40’s PLL primitive. Sure, I read through the manual, but the manual didn’t explain all of the configuration parameters very well. As a result, I hadn’t set all of the PLL parameters correctly. The iCE40 documentation, recommends using their proprietary wizard. However, since I never managed to get their proprietary software installed, I was using the open source yosys toolchain instead and instantiating the primitive directly.

The result was that the iCE40’s PLL primitive (SB_PLL) wasn’t locking, and so my system clock was unstable.

Eventually, I found the icepll open-source program which told me what the PLL parameters needed to be set to in order to get a stable clock.

Now that I have my debugging bus up and running within my ICO Board project over the parallel port, I can now start to add (and debug) other capabilities. This will include the ZipCPU implementation for the board, the SRAM, as well as (hopefully) the flash on the board. My goal is to get the point where I can play 4x4x4 tic-tac-toe on the board, using only the standard C-library. Others, I imagine, will be more interested in the extensibility offered by an AutoFPGA based platform making it easy to add and remove functionality from the design.

Arbitrary clock rate generator

The background: The third design that I got stuck on is an arbitrary clock rate generator. This is an FPGA only design, requiring no external clock, save only the 100MHz clock already provided to the board. It is also one of those designs that demonstrates something few people think possible–the ability to create a suitable clock signal from logic alone.

Indeed, I wasn’t certain if it was possible myself.

For those interested, the design is based upon the logic presented here. Using that logic, together with the input 100MHz clock, I can request a clock frequency within 0.2Hz or so of any desired clock frequency, and then create a clock that accurately matches that frequency. Further, if I use the PMod GPS, I should be able to generate a clock frequency with absolutely stability at any known frequency. (“Should be able to” means I haven’t tried to yet.)

As with the HDMI project above, the hardware I was using was the Digilent Nexys-Video board.

My goal with this project was to be able to create an output pixel clock, to be sent via the HDMI output port, of an arbitrary frequency so that it could support any reasonable display timing.

Fig 6: Broken arbitrary clock generator

Symptoms: Normally, I wouldn’t think twice if this design didn’t work. It requires that I can get my hardware to work in a way that it wasn’t designed to work in, and so I was never certain it could work in the first place. However, the first time I fired up the design it appeared to work. Then, after making several changes (without git backups), I lost that appearance of working and … I struggled to understand why.

Indeed, the clock would appear to lock onto frequencies such as 50MHz, 75MHz, 100MHz, 125MHz, and so forth, but never lock onto the frequency I was requesting, such as 131.415928MHz.

I was stuck. I desk-checked and desk-checked my code. I read through the Xilinx clocking guide. I found and fixed several “problems”, but never fixed the problem.

Definition: Voodoo computing. A noun describing the process of fixing what isn’t broken in an attempt to find and fix what is. It is usually characterized by a complete lack of understanding as to what is causing the problem, and so the “fixes” applied tend to be quite irrelevant to the actual problem at hand.

How did I find the problem: In this case, I turned again to the Digital Discovery. As with the icozip project, I routed my generated clock signal to a PMod port.

Much to my surprise, my code wasn’t generating the clock that I thought it was generating.

The problem: This sent me back to my clock generation code, where I was able to find the problem. In this case, my problem was associated with the Xilinx OSERDESE2 primitive. The primitive was mis-configured. (I had set the DATA_RATE_TQ parameter to “DDR” instead of “SDR”, for a functionality I wasn’t using.) Once fixed, the whole design started working.

As a result, I can now create arbitrary clock frequencies within my Nexys-Video board, without requiring any additional hardware.

Conclusion

In each of these examples, the easiest part of the design to get right was the logic. The hardest part, the part which had sent me to FPGA Hell in the first place, was dealing with those parts and components of my design which I could not simulate. Further, in two out of three of these examples, an external logic analyzer, Digilent’s Digital Discovery, rescued me.

My point, though, is simply this: even those who have been designing digital logic for years can still get stuck. If you work with an old hand, ask for some of their stories over lunch time. You might find that lunch can actually be entertaining—without discussing either religion or politics.

Perhaps next time I’ll know to check the PLL-locked output signal, though, rather than assuming that any PLL will always lock.

The code for all three projects is available for those sufficiently interested. The ICO Board project, together with its debugging bus and support infrastructure, can be found on Github. The clock generation core is available upon request for any of my Patreon sponsors who supports me for $10 USD or more, and will probably be eventually included within the HDMI video project. It’s not well documented (yet), but with sufficient interest in it that can be changed. (It’s only about 300 lines of code or so …)

The HDMI video project, though, needs sponsors in order to bring it to completion. It still needs more development work before it will be released on Github, and that work isn’t (yet) paid for. If you are interested in this project, please consider supporting me on Patreon, and then sending me a note to let me know that it is a project you are interested in.