Controlling Timing within an FPGA

Within an FPGA, everything is based upon event based timing. SPI controllers require a logic generated clock, I2C controllers have some maximum limit they can communicate at, UART controllers run at some user defined baud rate … everything wants to communicate at a carefully controlled speed.

Here we’ll discuss a couple ways to create the timing you need.

The Power of Two Clock Divider

The first approach I will use to timing events is usually a clock divider. It’s just too simple and too easy to build to ignore.

reg	[(N-1):0]	counter;
always @(posedge i_clk)
	counter <= counter + 1'b1;

Using this approach, your clock will be nicely divided by an even 2^N. Hence, if you attach an LED to counter[(N-1)] you’ll have the slower clock you need.

The Simple Clock Divider

A very common beginners task is to create a 1kHz, 100Hz, or even a 10 Hz clock from your input clock. Since these are not the result of dividing your clock by 2^N, a different approach is necessary.

Suppose for example that your system clock were at 100MHz. You’d then need to divide it by 10M if you wanted to get a 10Hz clock. This is easily done with a more generic clock divider circuit.

reg	[(N-1):0]	counter;
always @(posedge i_clk)
	if (counter < THRESHOLD-1'b1)
		counter <= counter + 1'b1;
	else
		counter <= 0;

As before, you can then use counter[N-1] as an LED driver, and you will have divided your clock by whatever value you set THRESHOLD to be.

The Strobe Signal

The problem with the simple clock dividers above is that the clock signal is N bits wide, and the top bit may be one for many clocks and zero for many clocks. How shall you make your logic work only once in all those clocks?

As a first rule, do not drive your logic like this:

always @(posedge counter[N-1])
	begin
		// DON'T DO THIS
	end

This will cause you all kinds of grief, either leading you to an unreliable design, or forcing you to deal with multiple clock domains, clock domain transfers, and worse. Unless you really know what you are doing … don’t use this approach.

When I first started building FPGA designs, I would check for zero within whatever state machine logic I had that was going to rely upon my new clock. As a result, I tended to use something like this instead:

always @(posedge i_clk)
	if (counter == 0)
	begin
		// Don't do this
	end else if (some_other_condition)
	begin
		// Other logic goes here
	end else if ...

My problem was that I then needed to come back later and rebuild all this logic. While it worked, it required more LUTs than was actually necessary, and it couldn’t be clocked at any high speed.

So … don’t do it this way either.

One way to understand the problem with this approach is to count the cost of your logic. This cost may be estimated by the number of inputs necessary to create any of your logic registers. The larger the number of inputs, the more LUTs will be required to implement it, the slower the logic will take. Having an N-bit wide clock driving a lot of logic just adds N-1 unnecessary bits to complicate things. As a result, while this approach will work (and did for me for many years), it’ll only work for FPGA logic with a slow i_clk frequency.

The better alternative is even simpler, and there’s no reason not to use it.

Instead of testing for (counter == 0) within your logic, create a strobe signal. We’ll call our strobe signal ck_stb:

always @(posedge i_clk)
	ck_stb <= (counter == THRESHOLD-1'b1);

What makes this signal so useful is that it will only ever be on for one clock period at a time, and that one clock period will be the period that you need to do something. As a result, you will then only need to check whether or not ck_stb is true whenever you need to do something, rather than all N bits of counter.

The next step is to build your logic so that it transitions on this strobe:

always @(posedge i_clk)
	if (ck_stb)
	begin
		// Build your logic this way instead
	end else if (some_other_condition)
	begin
	end else if ...

You can find an example of this within my WBUART cores. Look for the variables baud_counter and zero_baud_counter within either the transmitter or receiver modules.

The Fractional Clock Divider

What if you need to divide your clock by 3.1415926535…? Not a problem. You can accomplish this using a fractional clock divider. The result will look something like:

reg	[15:0]	counter;
always @(posedge i_clk)
	{ ck_stb, counter } <= counter + 16'h517d;

Ok, so … there’s a couple pieces to doing this that are worth discussing in order to understand it.

How does this work? Well, consider what happens after 2^16 clocks … you’ll have 16’h517d transitions. Hence, you’ve divided your clock by 16’h517d or about by pi.
The ck_stb signal will be set anytime this counter rolls over. Because ck_stb isn’t used to calculate the next counter, but only used as the output of this equation, ck_stb becomes a logic signal you can use to drive your logic at the rate you want.
Notice that this clock register is 16 bits wide rather than N-bits wide. Because of how the clock width couples with the fractional division number, I had to make this width a constant instead of a generic. You can still change it to whatever you need it to be.
As for the 16’h517d, this number is given by 2^16 divided by PI. Where does the 16 come from? It’s the width of your counter. Does it need to be 16? The more bits you have, the closer you’ll get to the actual frequency you wish to create. I’ve often used 48-bits within my Real-Time Clock Core, but what you choose should be a matter of your design needs and choices.
You can also use the ck_stb signal within your code to do things every 1/pi’th clock, just like we used the ck_stb before.
What if you need to generate an actual clock signal, and not just a clocked strobe? You can use the top bit of this counter as a clock signal that you can send to peripherals if you need to. Just … don’t use it as a clock signal within your own logic unless you really know what you are doing.
What is the actual clock period of this clock? Well, because we are dividing by PI, you will find either three or four ticks between ck_stb signals, never more, never less. This is going to create some phase noise in your clock. It can cause problems with some systems, so make sure you check the spec of whatever system you might be working with in order to know what is acceptable.

You can find a simple example of this fractional divider in this version of blinky for the ICO board. I used this code to test and measure the speed of the input clock to my ICO board. By using this approach, I was able to prove that the incoming clock was 100MHz, as opposed to the 25MHz oscillator listed in the schematic (Oops!).

As a fun example, I used this same fractional clock generator approach to create a single bit FM signal that I then “transmitted” out of my GPIO ports. Sure, it was an ugly signal, but it was enough to lock a FM receiver to it and listen to Queen on “the radio”.

The Divided Counter

If your counter is so long that you can’t meet timing, there’s usually no cost to splitting the counter into a higher word and a lower word:

reg	low_stb, ck_stb;
reg	[(N-1):0]	low_counter, hi_counter;
always @(posedge i_clk)
	{ low_stb, low_counter } <= low_counter + 1'b1;
always @(posedge i_clk)
	if (low_stb)
		{ ck_stb, counter } <= counter + 1'b1;
	else
		ck_stb <= 1'b0;

Sure, the two words may not be synchronized, but … this is still a very doable approach.

The Divided Fractional Clock Divider

What if you are using a fractional divider? If you can’t meet timing with your fractional division clock, you can also divide that one into two words, both upper and lower:

reg	[(N-1):0]	low_counter, hi_counter;
always @(posedge i_clk)
	{ low_pps, low_counter } <= low_counter + LOW_STEP;
always @(posedge i_clk)
	{ ck_stb, counter } <= counter + HIGH_STEP + low_pps;

As before, the two counters are not in lock step with each other. If you want to actually have a synchronized timer, you might need to delay the lower counter ‘till they line up:

reg	[(N-1):0]	low_counter, hi_counter, dly_counter;
wire	[(2\*N-1):0]	full_counter;
always @(posedge i_clk)
	{ low_pps, low_counter } <= low_counter + LOW_STEP;
always @(posedge i_clk)
	dly_counter <= low_counter;
always @(posedge i_clk)
	{ ck_stb, counter } <= counter + HIGH_STEP + low_pps;
assign	full_counter = { counter, dly_counter };

Conclusion

As you can see from the many different examples above, dividing your input clock down to a rate that you can then use for your logic is fairly easy. Given the many ways of doing this wrong, we have now at least showed you several methods for doing this “right”.

Try it! Let me know how these techniques work for you.