I remember from my time in the military service being taught that, “The man with two watches never knows what time it is.” Wikipedia lists this as Segal’s law,

“A man with a watch knows what time it is. A man with two watches is never sure.”

Fig 1. Two independent clocks will never agree in real life

The concept is pretty simple: if your board has a 100MHz oscillator, then you know it runs at 100MHz. Having no more information, that’s often easy enough to work with on its own. Is it really at 100MHz? Exactly? Of course not, but without any more information then that’s all you have to work with. At least within the universe of only a single clock, everything is consistent alone.

Once you have two clocks in your system, they will then disagree with each other and you will never be certain which of the two is correct.

Still, you will want to know if your Ethernet clock is truly at 125MHz, the video clock at 148.5MHz, or the audio clock at 49.152MHz. Indeed, just knowing these clocks are present might well be half the battle.

For now, let’s just pick a 100MHz clock as an arbitrary reference, and then see if we can measure the rates of these other clocks with respect to that reference. Of course, any estimate will only be as good as the reference clock.

Three methods of frequency measurement

Fig 2. Three methods of frequency estimation

In general, there are three methods of frequency estimation. You can count transitions, track the clock phase and frequency using a PLL, or use an FFT. Let’s quickly look at each in turn, starting with the FFT.

  • FFT-based frequency measurement

    The grand-daddy of all frequency measurement methods is clearly the FFT. It’s closely related to the maximum likelihood method of frequency estimation. It can also be robust against harmonics. Here’s how you’d do it:

    1. Enter in a set of samples. These can come from an incoming clock, or even an Analog to Digital Converter (ADC). The FFT isn’t particular.
    2. Calculate the FFT of the set
    3. Calculate the absolute value (squared) across the set
    4. Pick the value with the maximum magnitude–perhaps restricting your search to a known window where the frequency of interest will take place. Such restrictions can greatly increase your ability to estimate when in the presence of a lot of noise. This is something I discussed in my Ph.D. dissertation.
    5. The “bin” of number of that maximum magnitude location, times your FFT’s incoming sample rate, is now your estimate of the frequency of interest. This will get you to within about one FFT bin’s worth of precision.
    6. Want to do better? You can interpolate between FFT bins near that maximum to find where the maximum exists between bins. Be aware when doing this, however, that the absolute value operation will distort your result. You may therefore need to oversample by 16x or so before picking the sample you want to interpolate from.

    Your first problem with implementing this algorithm on an FPGA, however, will be the complexity of the FFT. It’s not trivial. An FFT requires at least 3 multiplies per stage, and lots of block RAM to store in-process results along the way. Whether or not this is really a problem is really a question of how good you want your frequency estimate to be. Remember, it’ll never be better than your knowledge of your reference clock speed, so there’s a limit on how hard you need to work.

    Your second problem will be the division used by the interpolator. It’s doable, but not trivial.

    What makes FFT-based methods so valuable is their resistance to interference. If you have multiple frequencies coming into a design, and you want to measure only one of them or perhaps each one of them, the FFT should be able to get you much closer than any other approach.

  • Breaking the FFT (Guru discussion)

    Okay, I know I said there were three types of frequency estimation, but I got in an argument with a coworker a while ago and want to have a chance at saying I got things right one more time …

    Let’s say you have a set of samples from x[0] to X[N-1]. This coworker noticed that if you took two FFTs of this data set, one from x[0] to x[N-2] that we’ll call Y0[f] and the other from x[1] to x[N-1] that we’ll call Y1[f], then you can conjugate multiply the two together. The phase difference between the two FFTs, at the bin of interest, is then proportional to the frequency of interest. Since that phase estimate is a real number, rather than a quantized FFT bin, this coworker argued that the resulting frequency estimates were that much better.

    So, I ran some tests. The full FFT method outperformed this pairwise phase difference method by several dB’s.

    If you go farther and expand out summation of an FFT result’s magnitude squared, you’ll discover this pairwise method is a subset of the original FFT based magnitude method that you can get to by throwing terms away. Of course it only makes sense that you’d do better if you didn’t throw any terms away in the first place.

    Incidentally, I was never able to convince this coworker.

  • Using a PLL

    By this, I mean using a logic PLL, such as the one we’ve already discussed on this blog. PLLs have the advantage over an FFT in that they are much simpler to build. Like the FFT, they also have some amount of out-of-band noise immunity.

Fig 3. Transition counting

Unlike an FFT, a PLL can only capture one frequency at a time. Further, an FFT will find a tone in a deterministic amount of time, whereas it’s hard to predict how much data is required before being able to get a PLL to lock. Worse, it is quite possible for a PLL to lock to a wrong frequency–even something way out of bounds if it is strong enough. Still, with some conditioning circuitry, a PLL should be able to handle most conditions just fine.

  • Counting transitions

    Counting clock transitions is easy. So easy, in fact, that we’ll take another look at it in depth in the next section.

Counting Transitions

Calculating a clock frequency by counting transitions is really easy. I mean, really easy. There’s two parts to it. The first part tells us when to start and stop counting, and the second just counts transitions.

We’ve looked at that first part before. In general, there are two ways to generate a once per second signal, something I call a one-part-per-second (PPS) signal after my background with using GPS. Your two options are to either use an integer or a fractional divider.

Here’s what the integer divider method would look like:

always @(posedge i_clk)
if (one_second_countdown <= 1)
begin
	o_pps <= 1'b1;
	one_second_countdown <= CLOCK_RATE_HZ;
end else begin
	o_pps <= 1'b0;
	one_second_countdown <= one_second_countdown - 1;
end

It basically consists of a counter just counting down from CLOCK_RATE_HZ to one (not zero), and restarting. When the clock would otherwise be zero, o_pps gets set to indicate we’re going around again.

The fractional clock divider approach looks completely different, but fundamentally does the exact same thing.

parameter [31:0] CLOCK_STEP = { 1'b1, 32'h0 } / CLOCK_RATE_HZ;

always @(posedge i_clk)
	{ o_pps, fraction_of_second } <= fraction_of_second + CLOCK_STEP;

Watch out for overflow in your implementation! My favorite way to express this is as (1<<32) / CLOCK_RATE_HZ, but this can cause overflow within any tool that’s not ready for a 33-bit number.

Whichever method you use, you now have an o_pps signal you can use to indicate when your clock counter should start and stop counting.

That leads us to the next step, actually counting.

initial	counter = 0;
always @(posedge i_clk)
if (o_pps)
	counter <= (transition) ? 1:0;
else if (transition)
	counter <= counter + 1;

initial	o_result = 0;
always @(posedge i_clk)
if (o_pps)
	o_result <= counter;

There’s two things to note carefully here. The first is that I placed the result, o_result, in its own register. This will keep you from ever reading an invalid result–once the clock counter has it’s first result. The second key is that you need to be able to count on every clock tick. That includes the clock tick when you reset the counter, otherwise you’ll risk dropping a count.

But where does the transition flag above come from? Well, ideally, we’d want this to be true to any time the clock we are counting rises so we can count that rise.

always @(posedge i_clk)
	last_test_clock <= test_clock;

// This could also be done combinatorially, depending upon the timing
// requirements within your design
always @(posedge i_clk)
	transition <= (test_clock && !last_test_clock);

Okay, that’s great if test_clock is already synchronous with your system clock, but in the general case of clock counting it’s not likely to be. That means we’re going to need to go through a 2FF clock domain crossing.

reg	test_clock_cdc;
always @(posedge i_clk)
	{ test_clock, test_clock_cdc } <= { test_clock_cdc, raw_test_clock };

Only, what happens if the test clock is faster than half your system clock rate? That is, what happens if you don’t get a high sample followed by a low sample, followed by a high sample since the test clock comes in too fast? This method would miss transitions.

This was why, when building my own design, I first dropped the clock speed of the test clock before counting it, as in:

reg	[LGNAVGS-1:0]	avgs;
always @(posedge i_test_clk)
	avgs <= avgs + 1;

always @(posedge i_clk)
	{ test_clock, test_clock_cdc } <= { test_clock_cdc, avgs[LGNAVGS-1] };

Much to my surprise, this technique worked for clocks even as fast as 400MHz+! Perhaps this isn’t as momentous as you might think. If LGNAVGS=4, then this avgs logic takes no more than 4 LUTs and 4 FF’s. It’s not a lot of logic. For reference, though, I tend to get surprised when things work above 100MHz, and more surprised when they work above 200MHz–but that’s just an indication of the problem space I tend to work within.

Shall we take a look at the whole design from top to bottom?

The Clock Counting Design

The design below is quite generic for just about any clock speed. Just a couple notes, though, before starting.

First, this design uses a PPS signal generated by my real-time clock core. This PPS signal is typically generated internally from a 48-bit fractional counter. Even better, I have a version of that real-time clock core that integrates nicely with an external PLL-based tracking loop applied to the PPS signal from a GPS receiver, so let’s just assume it’s generated externally.

Second, I set the number of bits in the counter to 32, simply because that’s the common bus width that I normally use.

Third, I’ve named the two clocks, i_sys_clk for the system clock and i_tst_clk for the clock under test. Both signals are assumed to be true clocks, and so available on the clock routing network of the FPGA.

That said, let’s dig in.

module	clkcounter(i_sys_clk, i_sys_pps, i_tst_clk, o_sys_counts);
	parameter	LGNAVGS = 4, BUSW=32;
	input	wire			i_sys_clk, i_sys_pps, i_tst_clk;
	output	wire	[(BUSW-1):0]	o_sys_counts;

	reg	[(LGNAVGS-1):0]	avgs;
	reg	[2:0]		tst_clock_cdc;
	reg			tst_posedge;
	reg	[(BUSW-LGNAVGS-1):0]	counter;
	reg	[(BUSW-LGNAVGS-1):0]	r_sys_counts;

The first step is to divide the incoming clock frequency down to something we can sample. I’ve chosen to divide by 16 (LGNAVGS=4), but this decision should really be determined by the needs you have within your design. Still, 16 has been plenty for my own uses.

	always @(posedge i_tst_clk)
		avgs <= avgs + 1'b1;

This incoming clock divider is the only part of this design where I transition on the test clock. (Remember my rule about using only one clock domain if possible?)

The next step is to take the top bit of that divisor and run it through a two clock synchronizer.

	always @(posedge i_sys_clk)
		tst_clock_cdc <= { tst_clock_cdc[1:0], avgs[(LGNAVGS-1)] };

From here, I can check for positive edges. tst_posedge will be true one clock after a positive edge. Of course, the one clock delay doesn’t matter here, so there’s no real cost in delaying by a clock.

	always @(posedge i_sys_clk)
		tst_posedge <= (tst_clock_cdc[2:1] == 2'b01);

That’s all the preliminary work. Now, we can finally do some clock counting.

Just like you saw above, we’ll count upwards on every clock cycle.

	always @(posedge i_sys_clk)
	if (i_sys_pps)
		counter <= (tst_posedge) ? 1:0;
	else if (tst_posedge)
		counter <= counter + 1'b1;

The last step is to report the results back on one of our ports. We’ll make one change to this logic here. Because we divided the clock initially, we’ll multiply our result by the amount we divided the clock by–just so that we have a value that’s at least close to the right answer.

	always @(posedge i_sys_clk)
	if (i_sys_pps)
		r_sys_counts <= counter;

	assign	o_sys_counts = { r_sys_counts, {(LGNAVGS){1'b0}} };

endmodule

That’s all there is to it. The operation is fairly commonplace, but still quite simple.

Conclusion

Feel free to create a simple peripheral containing this core, either AXI-lite or Wishbone. Indeed, I’m currently doing exactly that in this video ingestion logic that I’m currently working on. You might also connect it to a serial port controller via a FSM such as I presented in the tutorial. I think you’ll be pleased with the result.

One of the neat things of this approach is that it can also detect the absence of a clock. If there are no clock pulses, the counts per second will be zero. (You will still need your system clock.) I’ve found this to be a great advantage when working with video, since it will let you know if the video signal is even present as a first step towards debugging the rest of the design.