In many ways, metastability is the big boogeyman within FPGA design. It is hard to see when desk-checking a design, it doesn’t show up on all simulations (certainly not with Verilator), your synthesis tool can’t solve it, and timing analysis often just gets in the way of dealing with it. Metastability, though, can make your design unreliable. If your design has a problem with metastability, then it might never work. It might work today and not tomorrow. It might work perfectly for months, and then have a fatal flaw.
In many ways, metastability problems are the worst of all errors. They are hard to trace. You might deliver to a customer a design that passes all of your internal tests, only to have that (now) disgruntled customer tell you it doesn’t work. Then, to add insult, when you get the hardware back to examine, it works again. This is the nature of an unpredictable problem such as metastability.
So, what causes metastability? Metastability is caused when the set up and hold time requirements of a flip-flop aren’t met. The flip-flop then enters a state which is neither zero nor one, neither high nor low. It may be read by some of your logic as a zero, and by other parts of your logic as a one. Metastability, therefore, can cause your logic to do some very unpredictable and (apparently) illogical things.
For the digital designer, metastability can take place any time a signal crosses from one clock domain to another. This is called a “Clock Domain Crossing”, or CDC, and it needs some special engineering to be done properly.
Today, therefore, let’s look at several basic solutions to solving CDC issues.
What is a clock domain
If we need to pay special attention to clock domain crossings, the first question that we need to answer is, just what is a “clock domain”?
A “Clock Domain” is that portion of your circuitry that is generated and
processed by a single clock. I like to build my component IPs to use a
single master clock that I call,
i_clk. All of the registers, then, that are
set within such components on the positive edge of this
i_clk clock signal
form a single clock domain. Indeed, all of the registers set within an entire
design on the same edge of a same clock form a single clock domain.
Combinatorial logic based upon this register set is also within this same
As an example, Fig 1 shows four separate clock domains within a design. Perhaps this might make more sense, though, if we looked at how to recognize these examples within some Verilog RTL.
Let’s examine the positive edge of
clock_one. Any register set on the
positive edge of
clock_one is within one clock domain–the yellow domain in
Fig 1. Hence,
r_reg_one in the example below will be within this clock
Any logic can be created, based upon
r_reg_one and transitioning on
the positive edge of
clock_one, without crossing a clock domain. Hence,
r_pipe_one below is still within the same
clock_one clock domain.
This applies to combinatorial logic as well, not just register logic. Any
combinatorial logic depending only upon inputs created within the same
clock domain is also within that clock domain. Hence,
remains within the
clock_one clock domain.
Anything depending upon another clock, though, is in a different clock domain.
r_reg_two below is in the
posedge clock_two clock domain,
shown in red in Fig 1 above.
Likewise, the negative edge of a clock is a separate clock domain from the positive edge of the same clock.
So if that’s a clock domain, what’s a clock domain crossing?
A clock domain crossing (CDC) takes place anytime the inputs to a given flip-flop were set based upon something other than the clock edge used by that flip-flop. Fig 2 illustrates three examples of this that we’ll discuss below.
The clearest example of a
is when the inputs to a register, say
r_reg_two, are set based upon
clock_one, yet the output is set based upon a second
clock–in this case
This also applies to crossing from the positive edge of one clock to the
negative edge of any clock as well. Hence, the following is a
r_reg_one was set on the
posedge clock_one whereas
set on the negative edge of the same clock.
The third type of clock domain crossing you are likely to see is the asynchronous input. If you have an input whose value isn’t changing on your clock, then setting any register based upon it represents a clock domain crossing:
Classic examples of asynchronous inputs that need carefully engineered CDCs are buttons, switches, UART receivers, and incoming SPI clocks, although other examples abound. You may even remember the struggle I had crossing from an HDMI pixel clock domain to my memory clock domain, as I discussed in a previous post.
The problem with all of these CDCs is that they need to be managed, so as to mitigate the risk of any flip-flops being placed into a metastable state. How to mitigate these CDCs is the topic of the rest of this article.
Re-synchronizing a slow logic signal
Chances are that, if you’ve only superficially looked into CDCs before, you’ve been told that the way to synchronize a value going from one clock to another is to pass it through two flip-flops clocked with the new clock, as shown in Fig 3.
Each of the two flip-flops in this figure is clocked with the clock from the new clock domain, whereas the input to the first one was created within the old clock domain. While the result of the first one may have a high probability of metastability, the output of the second flip-flop has a much lower probability of metastability.
Some engineers will even recommend not two flip-flops but three. In many ways, how many flip-flops you use is dependent upon your application space, and how catastrophic any metastability problems would be.
Example code for this might look like,
xfer_pipe is either one or two bits wide.
The trick to remember in this process is that none of your code should
reference the output of the
in the metastable
region of the transfer pipe, referred as
xfer_pipe above. The value of
xfer_pipe is the value in the
danger region, shown in Fig 3 above in red. Instead, you should wait one
more clock and use
new_val (in this example) instead.
This works nicely for cases where the value from the old clock domain changes slowly–much slower than the frequency of the new clock domain’s clock. Not all CDC problems, though, are that simple. Other problems require different solutions, but almost all such solutions are dependent upon this first basic method.
Asynchronous Reset Assertion, Synchronous Release
One particularly common example of a CDC is an asynchronous reset. I know I’ve given beginners the advice of never using an asynchronous reset. I stand by this advice for beginning FPGA designers. [Xilinx Ref] There comes a time, though, when you need an asynchronous reset.
For me, I last found myself needing an asynchronous reset was when I needed to reset a circuit whose clock I had stopped. Without the clock, I couldn’t reset the circuit and so I needed an asynchronous reset.
The problem isn’t so much entering the reset state, the problem is exiting the reset state. The exit needs to be done synchronous to the clock. If it isn’t, you not only risk metastability problems, but you also risk some parts of your design getting released from the reset state before (or after) others. (This Xilinx Ref discusses some of that)
Creating an asynchronous reset line with a synchronous release isn’t really all that hard to do in Verilog. Indeed, you might argue that the Verilog solution below looks a lot like our two flip-flop solution above.
In this example, we assume an active low asynchronous reset, and we synchronize
the exit from this reset state using another active low signal,
You can then use the
s_reset_n as an asynchronous active low reset signal
throughout your design.
Engineers have argued about whether active high or active low resets are better. In general it doesn’t make a difference within an FPGA, however the peripherals the FPGA controls may have specific requirements. ASIC designers, on the other hand, tend to like the active low reset. Verilog designed to handle both FPGA and ASIC implementations will likely need this circuitry therefore.
The Cross-clock Handshake
One of the problems with the flip-flop chain method of handling CDCs is that nothing guarantees that the input is stable long enough to know that the output was received. Sometimes, for example, you want a CDC method that can handle going from a slow clock to a fast clock, from a fast clock to a slow clock, or even from one clock of unknown speed to another clock having no known relationship to the first. This calls for another approach: the handshaking method, shown in Fig 4.
In this figure, two clock domains are shown, yellow and green, together with the metastable region between the two shown in pink or red. Time goes from top to bottom, showing messages being passed back and forth from the two sides.
The figure also shows how the hand-shaking method works. The first thing that happens is that a request is made from the old clock domain and then passed to the new clock domain. The request goes through the metastability region using the flip-flop chain method described above. Once it gets to the new clock domain, an acknowledgement is sent back–also going through the same flip-flop chain method, but this time with the old clock driving the flip-flop chain. Once the acknowledgement has been received, the request signal may be dropped (cleared), at which point the new clock domain drops its acknowledgement flag.
One trick of this method, though, is that no new request can be made until the acknowledgement has been cleared. This “I’m busy” region is shown in Fig 4 as a bright red bar, during which time no new requests may be sent.
Let’s walk through this approach in Verilog, shall we?
First, something starts this off by setting the
This logic isn’t quite complete, but we’ll come back to it in a moment.
Once the request has been received, it is immediately sent back to the original clock in the form of an acknowledgement.
Only when this acknowledgement is low again are we ready to send any subsequent requests. Hence, we are busy from the time the original request is sent until the time later when the acknowledgement is cleared.
This then gives us the final logic we need to create our request in the first place. We can send a request any time we are not busy, and some event has happened that we wish to place on the other clock domain. Once the acknowledgement has been received, we drop our request line and wait for the acknowledgement to be dropped as well.
Indeed, this is the basic handshaking method used within my Wishbone scope in order to send information, such as a reset scope command, from the bus clock domain to the data clock domain and back again.
Signaling a (rare) event
From here, we can build upon this idea to send an “event” from one domain to another. By “event”, I mean something that will only ever be true for a single clock cycle. To work, though, the event will need to be rare enough that the CDC circuit isn’t still busy handling the last event.
The basic method, shown in Fig 5 above, is almost identical to what we did before. There are only two basic differences.
The first difference is that we trigger off of a specific single-clock event, rather than off of some yet to be defined logic.
The second difference is that we need to recognize a rising request signal
within the new clock domain. This means that we need to keep track of the last
state of the request signal,
last_req, in our pipeline request logic as well
as the rest of the pipeline request logic we used before.
Then, any time the
last_request is low, but a
new_request has been
made, we know we need to generate the event under the new clock domain.
You can use this approach to send triggers from one clock domain, such as the video refresh interrupt created within the video clock domain to the interrupt controller which may be within the CPUs clock domain.
Sending a group of wires at once
You can also use this transfer approach to send a group of wires at once.
This was the approach I used when I wanted to send a byte’s worth of data at once from the RPi to the iCE40 on top of my ICO board. Since I was designing the interface with no idea of how fast the two clocks would be relative to one another, I needed a handshaking method to control the transfer of all 8-bits at once. You can see the project that uses this approach here. The RPi software can be found in sw/host/netpport.cpp, whereas the Verilog half of the interface can be found in pport/pport.v.
How would we handle that?
The first step would be to copy the data to a transfer region, and then to set a “pre-transfer” signal indicating that the data was valid.
This needs to be done on the clock before the transfer starts.
You can then perform a handshake to the other clock domain.
Once the changed acknowledgement has been received, the data can be copied into flip-flops controlled by the new clock domain.
No flip-flop chain is necessary to receive the transfer data, since the hand shake approach has already guaranteed that the transfer data is valid.
One other difference, though, is that you don’t want to acknowledge the other side until the transfer has been completed. Hence, you would have the following for the acknowledgement,
This guarantees that the data will remain stable throughout the copy.
While this has worked quite well for me for some time, I did come back to this process in a later article, though, to discuss how this basic approach can be improved upon and made even faster by removing the clearing pass through the synchronizers.
Sending a stream of values
What we haven’t discussed is the idea of sending a stream of values from one clock to the next.
Perhaps you are trying to write video from a camera to memory. In this case,
you might be receiving 24-bit pixels at a high (pixel) clock rate, but that
you need to write
128-bits of data at a time to your
DDR3-SDRAM memory at a lower rate.
Alternatively, you might be wanting to store and forward results from a high speed analog to digital converter (A/D) across an ethernet port. Indeed, the example applies for a low speed A/D Converter as well! Perhaps you are consistently sending data to the new clock domain, but you are only reading it out rarely, and in high speed bursts at that.
In both examples, you have a stream of data that needs to be moved from one clock domain to another.
The solution to this problem is to use a FIFO, such as the one shown in Fig 6 above. You may recall that we’ve discussed FIFOs before. Using such a FIFO, you can drive the input at one rate and the output at another.
The problem, though, is how do you build a FIFO that crosses clock domains? Specifically, you need to keep the source from writing if the FIFO is full, and you need to keep the sink from reading the output when the FIFO is empty–and both of these conditions depend on knowing information from the other side of the FIFO. Not only that, but you need to be able to handle the CDC with your reset circuitry as well.
This, however, will need to remain the topic of a future post.
Until then, if you are more interested in the topic, Clifford E. Cummings of Sunburst Design has written a wonderful, and rather extensive, paper on the topic. I suspect you will find it to be very valuable–I certainly have.
We know that we have passed from death unto life, because we love the brethren. He that loveth not his brother abideth in death. (1John 3:14)