Most FPGA vendor libraries include hardware I/O elements for driving a signal on both edges of a clock. I like to refer to these elements as ODDR modules, since they create a hardware output on both edges, i.e. dual data rate, of a clock. If you are building a design for an FPGA, then I highly recommend that you use such a module.
Sadly, I recently found myself in a position where I couldn’t use a pre-built ODDR on a project. My favorite FPGA vendor I/O libraries weren’t available to me.
I needed to build my own hardware DDR output element.
Without a second thought, I scribbled out the following implementation:
It wasn’t until some time later that I started wondering if this was really the best way to handle the problem.
Here’s the missing subtlety: we imagine
data changing on a clock tick.
In reality, the clock rises and then only some fractional amount of time later
the actual data changes. During this time between the clock rising and
ddr_data register changing, there’s the chance such a circuit might
create a glitch,
such as the one shown in Fig. 1 below.
That wasn’t what I wanted.
So I did some internet searching, and came across this lecture on hazards. I quickly learned:
That left me wondering, could I build something better by just paying attention to a bit of math?
My first draft attempted to follow the same logic as my naive approach above. First, I copied the data into a special register.
Then I needed to find a way to switch between halves of that register.
I thought I might use a
cycle variable to capture which half of the clock
cycle I was in. Someone suggested something like the following
This would require clocking logic on both the positive and negative edges of a given clock. This is in general quite bad, however you are kind of stuck with doing something on both edges of the clock to build a circuit like this in the first place. At least I was only looking at one register.
The last step would then select between which of two bits based upon
whether these two
cycle* registers matched.
But was this better?
At this point, I tried applying the math from the slides I had
found. This required
expressing my design as either a sum of products, or a product of sums. So,
I rewrote the logic for
o_pad as a sum (OR) of products (ANDs).
I was so proud of myself, that I wanted to see how this circuit would look. So, I ran Yosys to get a peek at it.
The result was Fig. 3 below.
If I stared hard enough at the figure, I could see all of my logic carefully laid out within it.
But then I got to wondering, what would the logic optimizer do to a circuit like this?
The result was that all my special glitch-reducing logic had been removed. It was redundant. Sure, the optimizer did what I would expect, but how should I now fix my circuit?
Worse, as I looked over my draft further, I could quickly see that this combinatorial equation was much more complicated than I would ever want to have driving the output of a chip. There was just too much room for error.
I needed something simpler.
A Better Approach
Here was the rub: no matter how I built the circuit, I was going to need some combinatorial logic past the last clock. Further, if I wanted to avoid using the clock itself in my output, then I was going to need to transition on both edges of the clock. There was no way around that.
This left me with the following structure.
- First, I would need to do something to my inputs, to transform them somehow into local registers to this module. This would happen on the positive edge of the clock–and would work like any other piece of logic I might use in my design.
- I would then need to move one of those pieces of logic from the positive to the negative edge of the clock.
Although I’ve shown this as a predicate logic function,
f1, in reality this
needs to be just a register copy since time is important and I want to make
certain that nothing gets lost in this translation–especially since I’m
not sure how well I trust my tools to handle this logic.
- The third step would then be some form of logic function on the two halves: the logic generated on the positive edge of the clock and the logic created on the negative edge.
I also wanted to keep this third function simple. Something that would not only
be glitch free,
but would also be simple enough to not have a lot of logic
delays within it. While both
OR might work individually, they
would both require the inputs change on both edges of the clock–there would
be no way to “undo” the output from one clock half once the next clock half
arrived. In the end, I decided that
f2 above needed to be an
exclusive OR (XOR)
function–especially since an
XOR can be accomplished
simply with a small number of transitors. Not only that, it’s a basic
standard cell element in most
ASIC logic libraries.
That left me with something like Fig. 5 above and on the left and written out below,
cn would be the outputs of
set on positive and negative edges of the clock respectively.
cnp, on the
other hand, would be an intermediate result used to pre-calculate the logic
for the negative
But what should I use for the positive clock edge logic?
For that, I worked both clock phases out independently.
For the first phase of the clock, I’d be able to control the
cp of the
cp ^ cn function. In order to output the first data element on the
positive edge of the clock, I only needed to annihilate the
Unfortunately, this made for a second and unnecessary clock domain crossing,
from the negative edge of the clock to the positive edge. On the other hand,
cn was just a copy of
cnp from the prior positive edge of the clock,
so I could just as easily reference
cnp instead without incurring any
additional edge-to-edge crossings.
Generating an equation to set
cnp would’ve been just as easy, except I needed
the new value of
cnp to depend upon the result of
cp which was still being
calculated on this same clock period. If I just place a parenthesis around the
cp value, then the equation becomes almost identical as the one for
But would it work?
To double check my work, I fired up SymbiYosys.
No, SymbiYosys can’t handle analog logic–which is really what this attempt at glitch-free logic is. However, it can handle digital logic on both halves of a clock. That was what I wanted here for a quick check.
Formal methods check
Thankfully, the formal check of this
logic is fairly easy. The first step is
to use the
multiclock on option within the
Once done, you’ll need to assume the existence of a toggling
two formal timesteps per clock cycle.
I then needed to assume the inputs were clock synchronous.
Or, rather, I forgot to include this assumption at first and then got surprised when the result wasn’t what I wanted. When the design then failed, resulting in a trace where the inputs didn’t “look” right, I figured I should add the assumption above.
The last step to setting up the problem was to keep track of the bits I wanted
to output. I used a quick two-bit register,
f_data, for this purpose.
The last step was the assertion: I wanted to make certain that the result was correct on each half of the clock.
Yes, this looks just like the logic I started with that I am trying to replace.
Indeed, it did do–although I did initially need to skip the first couple of time-steps until everything sync’d up. Still, it worked quite nicely to help me figure out what I was doing right and wrong.
Polishing off the design
The last step was to implement the “off” function, where the ODDR module wasn’t enabled. My first thought was that I should keep the output from toggling when not enabled. A second thought was that I should just set the pins to the first bit of input.
f_data was the easiest way to at least describe what I wanted.
From here, I could design something that met my formal criteria and SymbiYosys could then tell me if I got it right or not.
My first draft for this logic was just to set everything to a constant if ever the enable input was low.
Only … this didn’t pass the formal check.
After a little bit of floundering, I realized I would have to build this based
cnp. That led me instead to the following logic.
Even better, the result is pretty simple logically. It only requires three XOR elements, three flip-flops, and a mux. Even better, as designed above, the output is driven by an XOR of two flip-flop outputs.
This entire exercise was a lot of fun, and I learned a lot about glitchless logic in the process.
How practical is this design? Well, it’s not portable to any FPGAs. Indeed, I wouldn’t use it on an FPGA at all. There are better structures on FPGAs, and those structures are ideally placed on the FPGA to handle final timing properly.
See, that’s the big problem with this design: it’s highly susceptible to placement. Were the XOR placed on the opposite side of the chip from the pad it is driving, you might easily have multiple pins transitioning on apparently separate clocks. Even if the XOR were placed next to the pad, flip-flop placement will adjust both clock period and phase. That means that in order to make this work properly, you’ll need to make certain that both the flip-flops and the XOR are placed right next to the output pad. That places further requirements on the tools you use and what they need to support in order to make this happen.
So, is this doable? Absolutely! Is the task done? Far from it. Still, it was a fun distraction for a evening.
Are not two sparrows sold for a farthing? and one of them shall not fall on the ground without your Father. (Matthew 10:29)