In many ways, metastability is the big boogeyman within FPGA design. It is hard to see when desk-checking a design, it doesn’t show up on all simulations (certainly not with Verilator), your synthesis tool can’t solve it, and timing analysis often just gets in the way of dealing with it. Metastability, though, can make your design unreliable. If your design has a problem with metastability, then it might never work. It might work today and not tomorrow. It might work perfectly for months, and then have a fatal flaw.

In many ways, metastability problems are the worst of all errors. They are hard to trace. You might deliver to a customer a design that passes all of your internal tests, only to have that (now) disgruntled customer tell you it doesn’t work. Then, to add insult, when you get the hardware back to examine, it works again. This is the nature of an unpredictable problem such as metastability.

Ouch!

So, what causes metastability? Metastability is caused when the set up and hold time requirements of a flip-flop aren’t met. The flip-flop then enters a state which is neither zero nor one, neither high nor low. It may be read by some of your logic as a zero, and by other parts of your logic as a one. Metastability, therefore, can cause your logic to do some very unpredictable and (apparently) illogical things.

For the digital designer, metastability can take place any time a signal crosses from one clock domain to another. This is called a “Clock Domain Crossing”, or CDC, and it needs some special engineering to be done properly.

Today, therefore, let’s look at several basic solutions to solving CDC issues.

What is a clock domain

If we need to pay special attention to clock domain crossings, the first question that we need to answer is, just what is a “clock domain”?

A “Clock Domain” is that portion of your circuitry that is generated and processed by a single clock. I like to build my component IPs to use a single master clock that I call, i_clk. All of the registers, then, that are set within such components on the positive edge of this i_clk clock signal form a single clock domain. Indeed, all of the registers set within an entire design on the same edge of a same clock form a single clock domain. Combinatorial logic based upon this register set is also within this same clock domain.

Fig 1: Four clock domains
Blobology image, showing four separate clock domains: asynchronous inputs, posedge clock_one, negedge clock_one, and posedge clock_two

As an example, Fig 1 shows four separate clock domains within a design. Perhaps this might make more sense, though, if we looked at how to recognize these examples within some Verilog RTL.

Let’s examine the positive edge of clock_one. Any register set on the positive edge of clock_one is within one clock domain–the yellow domain in Fig 1. Hence, r_reg_one in the example below will be within this clock domain.

reg	r_reg_one;
always @(posedge clock_one)
	r_reg_one <= (some_logic);

Any logic can be created, based upon r_reg_one and transitioning on the positive edge of clock_one, without crossing a clock domain. Hence, r_pipe_one below is still within the same clock_one clock domain.

reg	[(N-1):0]	r_pipe_one;
always @(posedge clock_one)
	r_pipe_one <= { r_pipe_one[(N-2):0], r_reg_one };

This applies to combinatorial logic as well, not just register logic. Any combinatorial logic depending only upon inputs created within the same clock domain is also within that clock domain. Hence, w_wire_one below remains within the clock_one clock domain.

wire	w_wire_one;

assign	w_wire_one = |r_pipe_one;

Anything depending upon another clock, though, is in a different clock domain. For example, r_reg_two below is in the posedge clock_two clock domain, shown in red in Fig 1 above.

always @(posedge clock_two)
	r_reg_two <= (some_other_logic);

Likewise, the negative edge of a clock is a separate clock domain from the positive edge of the same clock.

always @(negedge clock_one)
	r_reg_three <= ...;

So if that’s a clock domain, what’s a clock domain crossing?

Fig 2: Clock Domain Crossings (CDCs)
Blobology image, showing four separate clock domains: asynchronous inputs, posedge clock_one, negedge clock_one, and posedge clock_two

A clock domain crossing (CDC) takes place anytime the inputs to a given flip-flop were set based upon something other than the clock edge used by that flip-flop. Fig 2 illustrates three examples of this that we’ll discuss below.

The clearest example of a CDC is when the inputs to a register, say r_reg_two, are set based upon one clock, clock_one, yet the output is set based upon a second clock–in this case clock_two.

always @(posedge clock_two)
	r_reg_two <= (some_function_of r_reg_one);

This also applies to crossing from the positive edge of one clock to the negative edge of any clock as well. Hence, the following is a CDC, since r_reg_one was set on the posedge clock_one whereas r_reg_three is set on the negative edge of the same clock.

always @(negedge clock_one)
	r_reg_three <= (some_function_of r_reg_one);

The third type of clock domain crossing you are likely to see is the asynchronous input. If you have an input whose value isn’t changing on your clock, then setting any register based upon it represents a clock domain crossing:

always @(posedge clock_one)
	r_input <= i_value;

Classic examples of asynchronous inputs that need carefully engineered CDCs are buttons, switches, UART receivers, and incoming SPI clocks, although other examples abound. You may even remember the struggle I had crossing from an HDMI pixel clock domain to my memory clock domain, as I discussed in a previous post.

The problem with all of these CDCs is that they need to be managed, so as to mitigate the risk of any flip-flops being placed into a metastable state. How to mitigate these CDCs is the topic of the rest of this article.

Re-synchronizing a slow logic signal

Chances are that, if you’ve only superficially looked into CDCs before, you’ve been told that the way to synchronize a value going from one clock to another is to pass it through two flip-flops clocked with the new clock, as shown in Fig 3.

Fig 3: CDC solution: Two Flip-flops
Crossing clock domains via two flip-flops

Each of the two flip-flops in this figure is clocked with the clock from the new clock domain, whereas the input to the first one was created within the old clock domain. While the result of the first one may have a high probability of metastability, the output of the second flip-flop has a much lower probability of metastability.

Some engineers will even recommend not two flip-flops but three. In many ways, how many flip-flops you use is dependent upon your application space, and how catastrophic any metastability problems would be.

Example code for this might look like,

always @(posedge new_clock)
	{ new_val, xfer_pipe } <= { xfer_pipe, i_val };

where xfer_pipe is either one or two bits wide.

The trick to remember in this process is that none of your code should reference the output of the flip-flop(s) in the metastable region of the transfer pipe, referred as xfer_pipe above. The value of xfer_pipe is the value in the metastability danger region, shown in Fig 3 above in red. Instead, you should wait one more clock and use new_val (in this example) instead.

This works nicely for cases where the value from the old clock domain changes slowly–much slower than the frequency of the new clock domain’s clock. Not all CDC problems, though, are that simple. Other problems require different solutions, but almost all such solutions are dependent upon this first basic method.

Asynchronous Reset Assertion, Synchronous Release

One particularly common example of a CDC is an asynchronous reset. I know I’ve given beginners the advice of never using an asynchronous reset. I stand by this advice for beginning FPGA designers. [Xilinx Ref] There comes a time, though, when you need an asynchronous reset.

For me, I last found myself needing an asynchronous reset was when I needed to reset a circuit whose clock I had stopped. Without the clock, I couldn’t reset the circuit and so I needed an asynchronous reset.

The problem isn’t so much entering the reset state, the problem is exiting the reset state. The exit needs to be done synchronous to the clock. If it isn’t, you not only risk metastability problems, but you also risk some parts of your design getting released from the reset state before (or after) others. (This Xilinx Ref discusses some of that)

Creating an asynchronous reset line with a synchronous release isn’t really all that hard to do in Verilog. Indeed, you might argue that the Verilog solution below looks a lot like our two flip-flop solution above.

In this example, we assume an active low asynchronous reset, and we synchronize the exit from this reset state using another active low signal, s_reset_n.

reg	s_reset, r_pipe;
always @(posedge i_clk, negedge i_reset_n)
	if (!i_reset)
		{ s_reset_n, r_pipe } <= 2'b00;
	else
		{ s_reset_n, r_pipe } <= { r_pipe, 1'b1 };

You can then use the s_reset_n as an asynchronous active low reset signal throughout your design.

Engineers have argued about whether active high or active low resets are better. In general it doesn’t make a difference within an FPGA, however the peripherals the FPGA controls may have specific requirements. ASIC designers, on the other hand, tend to like the active low reset. Verilog designed to handle both FPGA and ASIC implementations will likely need this circuitry therefore.

The Cross-clock Handshake

One of the problems with the flip-flop chain method of handling CDCs is that nothing guarantees that the input is stable long enough to know that the output was received. Sometimes, for example, you want a CDC method that can handle going from a slow clock to a fast clock, from a fast clock to a slow clock, or even from one clock of unknown speed to another clock having no known relationship to the first. This calls for another approach: the handshaking method, shown in Fig 4.

Fig 4: A Request-acknowledgement hand-shake
Crossing clock domains with a handshake.  First a request crosses, then the acknowlegement returns.  When both are clear, a new request can take place

In this figure, two clock domains are shown, yellow and green, together with the metastable region between the two shown in pink or red. Time goes from top to bottom, showing messages being passed back and forth from the two sides.

The figure also shows how the hand-shaking method works. The first thing that happens is that a request is made from the old clock domain and then passed to the new clock domain. The request goes through the metastability region using the flip-flop chain method described above. Once it gets to the new clock domain, an acknowledgement is sent back–also going through the same flip-flop chain method, but this time with the old clock driving the flip-flop chain. Once the acknowledgement has been received, the request signal may be dropped (cleared), at which point the new clock domain drops its acknowledgement flag.

One trick of this method, though, is that no new request can be made until the acknowledgement has been cleared. This “I’m busy” region is shown in Fig 4 as a bright red bar, during which time no new requests may be sent.

Let’s walk through this approach in Verilog, shall we?

First, something starts this off by setting the req register.

always @(posedge old_clock)
	req <= (some_logic);

This logic isn’t quite complete, but we’ll come back to it in a moment.

Then, on the new clock, a flip-flop chain is used to receive this CDC request from the old clock.

always @(posedge new_clock)
	{ new_req, xreq_pipe } <= { xreq_pipe, req };

Once the request has been received, it is immediately sent back to the original clock in the form of an acknowledgement.

always @(posedge old_clock)
	{ old_ack, xack_pipe } <= { xack_pipe, new_req };

Only when this acknowledgement is low again are we ready to send any subsequent requests. Hence, we are busy from the time the original request is sent until the time later when the acknowledgement is cleared.

assign	busy = (req)||(old_ack);

This then gives us the final logic we need to create our request in the first place. We can send a request any time we are not busy, and some event has happened that we wish to place on the other clock domain. Once the acknowledgement has been received, we drop our request line and wait for the acknowledgement to be dropped as well.

always @(posedge old_clock)
	if ((!busy)&&(some_event))
		req <= 1'b1;
	else if (old_ack)
		req <= 1'b0;

Indeed, this is the basic handshaking method used within my Wishbone scope in order to send information, such as a reset scope command, from the bus clock domain to the data clock domain and back again.

Signaling a (rare) event

From here, we can build upon this idea to send an “event” from one domain to another. By “event”, I mean something that will only ever be true for a single clock cycle. To work, though, the event will need to be rare enough that the CDC circuit isn’t still busy handling the last event.

Fig 5: Passing an event across clocks
An event crossing clock domains: first it triggers the request, and then the rising change in the acknowledgement triggers the event in the new domain

The basic method, shown in Fig 5 above, is almost identical to what we did before. There are only two basic differences.

The first difference is that we trigger off of a specific single-clock event, rather than off of some yet to be defined logic.

always @(posedge old_clock)
	if ((!busy)&&(i_stb))
		req <= 1'b1;
	else if (old_ack)
		req <= 1'b0;
assign	busy = (req)||(old_ack);

The second difference is that we need to recognize a rising request signal within the new clock domain. This means that we need to keep track of the last state of the request signal, last_req, in our pipeline request logic as well as the rest of the pipeline request logic we used before.

always @(posedge new_clock)
	{ last_req, new_req, xreq_pipe } <= { new_req, xreq_pipe, req };

Then, any time the last_request is low, but a new_request has been made, we know we need to generate the event under the new clock domain.

always @(posedge new_clock)
	o_stb <= (!last_req)&&(new_req);

You can use this approach to send triggers from one clock domain, such as the video refresh interrupt created within the video clock domain to the interrupt controller which may be within the CPUs clock domain.

Sending a group of wires at once

You can also use this transfer approach to send a group of wires at once.

This was the approach I used when I wanted to send a byte’s worth of data at once from the RPi to the iCE40 on top of my ICO board. Since I was designing the interface with no idea of how fast the two clocks would be relative to one another, I needed a handshaking method to control the transfer of all 8-bits at once. You can see the project that uses this approach here. The RPi software can be found in sw/host/netpport.cpp, whereas the Verilog half of the interface can be found in pport/pport.v.

How would we handle that?

The first step would be to copy the data to a transfer region, and then to set a “pre-transfer” signal indicating that the data was valid.

always @(posedge old_clock)
	if ((!busy)&&(!valid_data))
		transfer_data <= i_data;
		valid_data <= 1'b1;
	end else if (old_ack)
		valid_data <= 1'b0;

This needs to be done on the clock before the transfer starts.

You can then perform a handshake to the other clock domain.

always @(posedge old_clock)
	if ((!busy)&&(valid_data))
	begin
		req <= 1'b1;
	else if (old_ack)
		req <= 1'b0;
assign	busy = (req)||(old_ack);

Once the changed acknowledgement has been received, the data can be copied into flip-flops controlled by the new clock domain.

always @(posedge new_clock)
	if ((!last_req)&&(new_req))
		o_data <= transfer_data;

No flip-flop chain is necessary to receive the transfer data, since the hand shake approach has already guaranteed that the transfer data is valid.

One other difference, though, is that you don’t want to acknowledge the other side until the transfer has been completed. Hence, you would have the following for the acknowledgement,

always @(posedge old_clock)
	{ old_ack, xack_pipe } <= { xack_pipe, last_req };

This guarantees that the data will remain stable throughout the copy.

Sending a stream of values

What we haven’t discussed is the idea of sending a stream of values from one clock to the next.

Perhaps you are trying to write video from a camera to memory. In this case, you might be receiving 24-bit pixels at a high (pixel) clock rate, but that you need to write 128-bits of data at a time to your DDR3-SDRAM memory at a lower rate.

Fig 6: A FIFO can cross clock domains
A FIFO can cross clock domains

Alternatively, you might be wanting to store and forward results from a high speed analog to digital converter (A/D) across an ethernet port. Indeed, the example applies for a low speed A/D Converter as well! Perhaps you are consistently sending data to the new clock domain, but you are only reading it out rarely, and in high speed bursts at that.

In both examples, you have a stream of data that needs to be moved from one clock domain to another.

The solution to this problem is to use a FIFO, such as the one shown in Fig 6 above. You may recall that we’ve discussed FIFOs before. Using such a FIFO, you can drive the input at one rate and the output at another.

Fig 7: FIFOs need to manage other signals as well
A FIFO can cross clock domains

The problem, though, is how do you build a FIFO that crosses clock domains? Specifically, you need to keep the source from writing if the FIFO is full, and you need to keep the sink from reading the output when the FIFO is empty–and both of these conditions depend on knowing information from the other side of the FIFO. Not only that, but you need to be able to handle the CDC with your reset circuitry as well.

This, however, will need to remain the topic of a future post.

Until then, if you are more interested in the topic, Clifford E. Cummings of Sunburst Design has written a wonderful, and rather extensive, paper on the topic. I suspect you will find it to be very valuable–I certainly have.