When I first learned digital design, I never simulated any of my designs: I just placed them directly onto the hardware and debugged them there.

I’ve since become convinced in using simulation for several reasons: simulation can be faster than synthesizing a design. Indeed, any time I run Verilator I can find many syntax errors in my design before Vivado fully starts up and shows me one bug. But that’s just synthesis. For small designs, simulation is still faster. Of course, ultimately, the hardware is always faster–but in the time it takes to get there, you might manage to get an answer via simulation.

The second reason why I like simulation is that a simulation generated trace will contain every wire within the design. For this reason, when something doesn’t work in hardware, I’ll almost always return to simulation and try to do the same thing in simulation to see if I can come across the same bug. That allows me to be able to turn around quickly and find the bug.

Or … not so quickly. On one recent design, I read the entire 16MB from a SPI flash memory, only to have the design fail when reading the last word from the flash. Not knowing where to start, I started with simulation–but then had to trim down the trace before filling up every bit in my computers disk drive.

But what happens when you cannot simulate the problem? When your design works perfectly in simulation, but fails on the hardware?

I’ll admit this happened to me recently as well. I think it happens to everyone at some point.

Therefore, to help keep you from FPGA Hell, I asked on Reddit for a list of things that might cause your simulation not to match reality. When I asked, I thought I knew most of the reasons. To my surprise, the kind Reddit readers were glad to share with me many more reasons why simulation might not match actual hardware performance.

Let me try to list and explain the reasons I’ve found here, and see if I managed to (finally) get all of the reasons given to me on Reddit.

Timing

Digital designs don’t work if the time between clock pulses isn’t sufficient for all of the logic to take place between when one flip flop sets its value and the next one needs the value to be held constant. This is often the meaning of the word “timing” in this context. Here are some reasons why a design might fail associated with this design problem.

  • Design failed to pass timing, yet was used anyway

    Following place and route, you need to check whether the resulting design ensured that all the setup and hold requirements for all of the flip flops within (or external to) your design were met. Usually the tools will do this for you automatically. However, if you fail to check this result and use the design anyway … then it is likely to have some problems. Worse, the behavior you see might masquerade as a completely different problem.

    For this reason, whenever I have a design that doesn’t work, I first double check the timing report.

  • The timing checker wasn’t given the right clock rate

    If you tell the timing checker you either have no clock in your design (yes, I did this once), or if you give it the wrong frequency, your design may appear to pass the timing check–even though the check is invalid.

  • Using delays in test bench design

    This is one reason why I avoid the “#” syntax in Verilog, such as a <= #2 b;. Just because you tell the Verilog simulator that something will happen “2.5ns” later, doesn’t mean it will achieve that “2.5ns” result in hardware. Worse, these statements are often ignored by the synthesizer. Hence, if you use them, don’t use them on code you intend to place on actual hardware.

  • Just being wrong about the clock frequency on the board

    This is subtly different from giving the timing analyzer the wrong rate. For example, if you think your clock rate is 100MHz, and get your design to pass the timing check for 100Mhz, even though the clock rate is really 50MHz, any logic that depends upon this number is not likely to work.

Metastability

We’ve discussed metastability a couple of times on this blog–mostly associated with crossing clock domains. Metastability is caused when a signal input to a flip flop is changing right as the flip flops clock arrives. In that case, the design might have a value that is neither “1” nor “0”, causing unpredictable results in subsequent logic. Because metastability is only caused if the signal changes right at the clock edge, it is a rare event–but often not rare enough. Either way, the simulator will rarely if ever notice it.

Here are some examples of things that might cause metastability.

  • No synchronization of async signal

    Inputs to a design may be asynchronous. A good example is a button press, or a serial port input. Such inputs need to be synchronized before use! This is actually a common problem among beginners–they’ll use a value without synchronizing it, ignorant that this might cause problems.

  • Improperly managed clock domain crossing

    This is another classic problem. When you cross from one clock domain to another, you need to manage the clock crossing with either a synchronizer or an asynchronous FIFO–which will use synchronizers internally.

  • Any time a register is clocked by two different clocks in the same process

    I haven’t personally come across this one, but imagine a process that is sensitive to @(posedge i_clk, negedge i_reset, posedge something_else). This can be a recipe for a metastability disaster.

    You can read how we handled this with the asynchronous reset here. However, I tend to try to avoid this situation by just not writing code of this type. This was one of those reasons why I recommended to beginners that only clock edges should ever be in the sensitivity list.

  • Timing errors due to incorrect multipath constraints that are not checked in simulation

    When crossing clock domains, it’s not exactly clear upon which edge of the next clock a particular signal will arrive. Hence, if you have a signal defined in one clock domain, yet crossing into another in order to be the input of two separate pieces of logic, you’ll be surprised that in the real hardware, the two pieces of logic might not do the same thing.

Let’s illustrate this last problem with an example.

module test(i_clk_a, i_clk_b, i_ina, o_outb);
	input	wire	i_clk_a, i_clk_b;
	input	wire	i_ina;
	output	wire	o_outb;

	// Here's our first synchronizer
	reg	threesync, threein;
	initial	threein = 0;
	initial	threesync = 0;
	always @(posedge i_clk_b)
		{ threein, threesync } <= { threesync, i_ina };

	// Here's some logic dependent upon its result
	reg	[15:0]	bythrees;
	initial	bythrees = 0;
	always @(posedge i_clk_b)
	if (threein)
		bythrees <= bythrees + 3;

	// That's the first path, now let's look at the second path
	// It starts with a separate synchronizer
	reg	fivesync, fivein;
	initial	fivein = 0;
	initial	fivesync = 0;
	always @(posedge i_clk_b)
		{ fivein, fivesync } <= { fivesync, i_ina };

	
	reg	[15:0]	byfives;
	initial	byfives = 0;
	always @(posedge i_clk_b)
	if (fivein)
		byfives <= byfives + 5;

	assign	o_outb = byfives[0] ^ bythrees[0];

Now, let’s assume that i_ina is some logic that is set on i_clk_a’s positive edge. You’d expect o_outb to be zero at all times, right? (Both registers will toggle the lowest bit on any i_ina input. You might be surprised by the hardware when it isn’t.

Blocking vs Non-blocking assignments

Every now and again I need to remind myself why blocking assignments are so bad. A blocking assignment sets the value of a register immediately, whereas a non-blocking assignment waits until the clock tick to set the value with it’s new result.

So, tell me, what would happen in the following code,

initial	a = 0;
always @(posedge i_clk)
	a = a + 5;

initial	b = 0;
always @(posedge i_clk)
	b <= a + 5;

After the first clock tick, what will the value of b be? Will it be five, or will it be ten?

In hardware the result will always be five. In simulation, the answer is … it depends. Specifically, it depends upon which of the two always blocks the simulator decides to evaluate first.

Poor simulation model

This one seems to hit the beginner the first time he uses simulation, when the inputs to his simulation don’t quite match how the real hardware acts. You can read one students account of how this problem bit him here on this site.

  • Buttons may be the most classic example

    Buttons tend to be the first thing a beginner works with. They are easy and simple to work with, and seem to impact your design in a very reliable way.

    The beginner quickly learns about buttons, and the next step is a counter. He wants to know if his counter is working, so he creates an example piece of code much like the following. (We’ll assume he gets the synchronizer right, although this does tend to be rare.)

module btnled(input i_clk, i_btn, o_led);
	input	wire	i_clk, i_btn;
	output	reg	o_led;

	// Let's synchronize the button, to avoid two issues
	reg	syncd, last, value;
	initial	syncd = 0;
	initial	value = 0;
	initial	last  = 0;
	always @(posedge i_clk)
		{ last, value, syncd} <= { value, syncd, i_btn };

	always @(posedge i_clk)
	if ((value)&&(!last)) // i.e. the button was just pressed
		o_led <= !o_led;

This beginner will be surprised when his LED doesn’t necessarily toggle on every button press. The problem? Buttons bounce! Feel free to take a look at this article for an illustration of the problem.

  • I’ve also personally struggled with Xilinx’s ICAPE2 interface

    Yes, I know Xilinx described the interface in their Configuration user’s guide. But how often have you misunderstood the specification and built your simulation component to simulate the wrong interface?

  • My own I2C story

    Buried within the repository for my wishbone scope, is an article about how I once seriously misunderstood the I2C specification. I built a simulation model for the wrong specification, and managed to get my design to work with it. When I moved to hardware, … it didn’t match like I thought it should.

    That’s all fixed now, though. You should find my simulation model for I2C fully working … now.

  • Example: a vendor model for an SDRAM didn’t perform under burst access like the hardware did

    This one hasn’t happened to me yet. Yet. However, it follows the same basic idea. You have a design that matches a simulation specification, but that simulation was only partially accurate. Perhaps it didn’t implement every mode of the device.

    Either way, you’ll be surprised when your design doesn’t work, and then stuck debugging your design in hardware–assuming you didn’t immediately get stuck in FPGA Hell.

Asynchronous Reset triggered by spurious RF

I wouldn’t have believed this one myself if I hadn’t come across it while browsing Xilinx’s forums. You can read the article I found here. The basic sum of it is that the reset wire can act as a high frequency antenna, and so send spurious reset signals through your design. Ouch.

This just happens to be one more reason to use a synchronous reset within an FPGA design.

Failure to start the design in a known configuration

Over the years I’ve discovered that values not initialized on a Xilinx device default to all ones. They may start out as something different in simulation. For example, I had one simulation environment that would initialize all values to zero. Indeed, the formal tools based upon Yosys assume all unspecified memory has an initial value of zero.

This problem also highlights one of the differences between FPGA development and ASIC development: ASIC designs need that initial reset to set their values. They have no problems setting all RAM values to zero or one. FPGA’s on the other hand truly honor the initial conditions given in the design.

  • Failure to set initial values for registered outputs

    This is pretty much what I just described: when you don’t give your design an initial value, it will still start with an initial value–it just might not be the one you are intending.

  • Failure to match reset values to initial values

    Would it surprise you if I told you this was one of the most common, and yet simple, bugs I find with formal tools? It’s so common that I’ve gotten into a rut testing for it.

	reg	f_past_valid;
	initial	f_past_valid = 1'b0;
	always @(posedge i_clk)
		f_past_valid <= 1'b1;

	always @(posedge i_clk)
	if ((!f_past_valid)||($past(i_reset)))
	begin
		// For each input constrained by an initial or a reset
		assume(some_input == its_initial_value);
		// .. Repeat as necessary

		// Likewise for each local register or output
		assert(some_register == its_initial_value);
	end

Perhaps you’ll find this pattern useful in your own designs as well–it helps to guarantee that both the reset and the initial value do the same thing.

As another example, on some designs I’ll assume it starts with a reset.

	initial	assume(i_reset);

Insufficient test bench cases

Sadly, this one was common for me–especially before I started using formal verification. Perhaps you may remember the problem I had with the test bench for my initial FIFO implementation? Sure, I had built a test bench for my FIFO, it just didn’t quite test all of the possible paths through my FIFO’s logic.

This hit me hard with my first I-cache design as well. Sure, the design worked in my simulation test bench. It just didn’t when one day when I placed it onto the hardware. It wasn’t the first day I had placed the cache into hardware either–it had worked before. What was the problem? That is a story in itself.

Perhaps I just don’t have the imagination to think of every way a design component might be accessed–correctly or incorrectly in order to truly test every path through a design.

In many ways this isn’t really a failure of simulation to match the synthesized design in hardware, rather it’s a failure to completely test the design in simulation. As a result, the solution is to go back and to simulate the design in the same way it just failed on the hardware (assuming you can), and to see if you can try to find the bug.

An even better solution is to turn to formal methods …

I found myself in just this situation this last week: after reading 128MB less the last four bytes from a flash device, the reader received a bus timeout error on the very last word. No, I hadn’t simulated that test case because … well, who wants to simulate reading 128MB from a flash device over a slow debugging bus? That said, it was the only way I found the problem. (The bug was mis-configured bus arbiter. Yes, the arbiter itself had been formally verified. It wasn’t the arbiter’s fault, from that perspective, I had just hooked it up wrong and never verified the parent module.)

Symbols left out of the sensitivity list

I don’t normally use sensitivity lists, but let’s see if we can build an example of this problem.

always @(a)
if (a)
	b = c;
else
	b = !c;

See the problem? If a changes, b will also change. However, the simulator won’t adjust the value of b if c changes–even though the value of b will change in hardware upon any change of c.

Latches

Remember the latch we placed into our clock switch design? Here’s what one Reddit user wrote about latches:

Latches. Definitely more of a problem only beginners will run into but still good to be aware of. Depending on synthesis settings it may fail or it may just produce warnings but this was the most common problem I helped students with when I was a TA for our intro to digital logic class.

Not familiar with a latch? Here’s an example:

always @(*)
if (A)
	B = C;

Notice how B isn’t being set on a clock, yet it’s required to hold its value if !A is true. This is a latch.

A latch is what the synthesis tools will infer anytime you don’t set the value of a combinational result for all combinations. The rule of thumb I’ve been taught to make sure you avoid this is to always set the value at the beginning of the block–then the value is set no matter how ugly the following logic gets.

always @(*)
begin
	B = 0;
	if (A)
		B = C;
end

Another user recommended I beware of the full_case and parallel_case directives. I’d never heard of these before! However, you can read more about misusing these directives here.

Forgetting to assign pin locations

What happens if you don’t assign an output pin to a physical location? Some tools will pick a location for you. How much do you want to bet that they don’t pick the right location?

A related bug is not forgetting the pin assignment, but rather assigning the wrong pin to your logic.

The solution? Always double and triple check your pin assignments. The master xdc, ucf, pcf or whatever file is very likely going to need to be changed for your design from the one given you by the manufacturer of the board.

Comparing with ‘X’ values

(False in simulation, might be true in H/W)

I’m told that the ARM development team once got themselves caught in an ugly way with this bug. According to the story, that happened some time ago, but since then the story has become ingrained into their culture: don’t use x assignments!

Why not? Well, a 1'bx value has a different meaning between synthesis and simulation. In synthesis, 1'bx is a don’t care–the synthesis tool is allowed to set the value to whatever it would wish. In simulation, 1'bx is a specific value that a register might contain. (Verilator doesn’t support 1'bx, so I don’t use them often.)

What happens when a=1'bx and b=1'b0? a==b will be false in simulation. Worse, a != b will also be false in simulation. However, in hardware the result will be tested based upon the actual achieved voltage value, whether it be a 1 or a 0. See the different result? Avoid setting any values to 1'bx to keep yourself from this bug.

You can read more about the problems with x values here.

Tool problem

Yes, it is possible that the tools might not work for you. There are bugs within most if not all tool suites, they just tend to take a special design to trigger. Don’t believe me? Read the forum posts associated with each vendor’s tool suite. Sometimes bugs get fixed. Sometimes the fixes create other bugs. At other times, they are reported and the vendor does nothing.

Asynchronous Systems

I don’t usually design asynchronous systems, although I have done so once or twice. What happens when you need the asynchronous system to operate in an ordered fashion?

Here’s what one Reddit user wrote:

When you design asynchronous systems with matched delay elements between each sequential stages. In that case, logic delay is part of the system behaviour. The alternative to synthesis is to use a not synthesizable model for delay chains based on « transport … after » statements. Although I must say, synthesis of asynchronous system is also a pain.

Generics

Here’s one I’ve struggled with personally: using one set of top level generic values (VHDL term for what would be called a parameter in Verilog) for simulation, and another for synthesis.

I worry about this one when using formal methods especially. Sometimes the design is just too complicated to fully verify–a 12x12 multiply might be such an example, or a delay by 2047 time-steps. So I’ll limit the design, using a Verilog parameter (VHDL generic) to a smaller/simpler design that I can then prove–for example, a delay by 7 time-steps instead of 2047. I try to convince myself that the proof will be equivalent, but … will it be?

Using different source files for simulation and synthesis

I do this all the time. I simulate the main module, which is a subset of the toplevel module. I place into my toplevel all of the hardware specific items that Verilator can’t simulate.

What happens when one of my bugs is in that top level? You can read about my struggles with that here.

This is why you want to do everything you can to make certain that the design you simulate is also the same design you intend to synthesize.

Block RAM’s with other than power of two sizes

I try to only ever use block RAM’s with a power of two size. I often forget why.

Once when I used a non-power of two block RAM, I wrote to an address that wasn’t in the RAM and crashed Verilator. Why? Verilator only allocated, in C++, the number of elements I told it were in the array.

Even if you don’t write beyond the array, you might read and get a different answer than you were expecting from simulation alone.

VHDL Specific

If you know me, you’ll know I don’t work in VHDL. Others who do were kind enough to offer my the following examples specific to VHDL.

  • clk'event and clk == '1' doesn’t behave the same between synthesis and simulation. Always use rising_edge(clk) instead. (This is another one of those issues where clk might be neither 0 nor 1, such as the 1b'x example we discussed above.)

  • Forgetting to add if rising_edge(clk) in a clocked process. I think this would then fit under both the latch example above as well as the signals left out of the sensitivity list. Feel free to correct me here if I am wrong.

  • Comparisons with a null range vector is “true” in Aldec and “false” in Synplify (null ranges often occur with extensive us of generics.)

  • Any time a different architecture is used between synthesis and simulation

Sense a reoccurring theme?

Verilog

I’ve been surprised as I’ve worked with Verilog to discover the system model for a verilator based design is specified to be the same as if all the files were concatenated together before synthesis, and then that single concatenated file is synthesized. A define in one module can therefore impact another–the modules are no longer independent. If multiple files define the same value differently, and then the order of the files changes between simulation and synthesis … then you’ll get different results between the two.

This was not something I was expecting, and I was a bit surprised to learn it. Once it was explained to me, it made sense, but it sure seems like a backwards way to do things–especially for someone like me who was first trained in C.

SystemVerilog

One user wrote the following:

I’ve found when using some of the more new SystemVerilog features [that] simulation and synthesis can differ. I read about unions in Vivado being an issue here.

Personally using unpacked arrays and passing them between modules and accidentally writing something like this:

moduleA has output logic bus [3:0]
moduleB has input bus[4]
// connection between them was logic[3:0]

Modelsim and Quartus produced different results

Since I don’t use any of SystemVerilog’s special features beyond the formal properties we’ve already discussed on this blog, I haven’t come across this one personally yet. For those who do use SystemVerilog, look out for this bug!

Hardware Failures

Here’s a set of problems most software engineers will be surprised by: hardware failures. Why do I say it that way? Because with all my own years of working on software, I could reliably depend upon the fact that the hardware always worked–unless in very rare cases it didn’t. Sherlock Holmes’ logic makes the most sense here, “Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.” (Arthur Conan Doyle)

That said, here are two hardware problems I’ve suffered from.

  • Noisy or insufficient power supply

    In one RF design, the noisy power supply crept through the device into the powered antenna and … well, the result wasn’t the pretty sampled data I was expecting.

    In another design, this one for motors, the design failed because the motor PMod required more power than the board could supply. In that case, the FPGA was powered from a Raspberry Pi and the power supply just didn’t cut it for what we needed.

  • PLL’s haven’t converged

    Remember when I wrote about this earlier?

    At one time I assumed that PLL’s will always converge. Then I tried to a design for the iCE40 that used a PLL. No, I didn’t use icepll–I should have. Instead, I just assumed that the PLL converged. For the life of me, I couldn’t figure out why my design wasn’t working. I spent months scratching my head until some friends at Digilent were kind enough to provide me with one of their Digital Discovery’s. Yes, it took that external logic analyzer for me to figure out what the problem was.

Conclusion

You may want to keep this list in your back pocket, and remember these reasons the next time your design doesn’t work. Some of these reasons require good desk checking: check your pin outs, double check your timing, etc. Other items require an external scope, such as the PLL that hadn’t converged. Still others need a good internal scope, such as when the simulation model doesn’t quite match how the hardware actually works. Finally, it might also be that you haven’t fully simulated the design.

The bottom line is that hardware design isn’t like software design. There are a lot more things that can go wrong, and figuring out the problem can require more sleuthing than you plan upon.

This is also why I like working with hardware. Sure, it’s a greater challenge, but so too is the joy and excitement when everything works as designed on the hardware.