Many signal processing applications require a sine wave at some point. If the phase or frequency of this sine wave is controlled within the design, then it is often called a Numerically Controlled Oscillator (NCO). Let’s spend some time today looking into how you might build one of these within an FPGA. We’ll also present a C++ implementation at the end as well, which may be used in embedded applications.
Since we’ve already studied how to generate a sine wave on an FPGA, most of our work is already done: We’ve discussed a table lookup method, a quarter wave table lookup, method, and even how to generate both a sine wave and a cosine wave using a CORDIC. The subtle difference to describe to day is really how to turn such a sine wave generator into a NCO.
Before we dive into the details, though, let’s spend a moment thinking about how you might use such an NCO. My own reason for presenting this today is twofold. First, I know of a student struggling to understand how to build something like this as part of a digital communications demodulator. There is a small trick involved–one that appears to be well known among those who do this sort of thing for a living, but not so well known among students and I’d like to share it here. Indeed, I think we might even build a better NCO below than this stackoverflow article recommends. My second reason for writing today is that I’d like to write about how to build a Phase Locked Loop (PLL) within an FPGA on this blog, and every PLL I’ve ever built has always included the basic NCO logic as part of its implementation.
But this hardly scratches the surface of what you might do with an NCO. Consider as an example that …
Any linear signal processing system is completely characterized by its frequency response. As a result, I’ve used an NCO in time past, together with a scope of some type, to evaluate whether the digital input to a signal processing algorithm was being handled properly.
We used an NCO earlier as part of our demonstration that the improved PWM generator worked better than a traditional PWM for audio signal generation. While we didn’t discuss the details of the tone generator at the time, we’ll explain many of those details today.
You can also use an NCO to move a signal around in frequency–either bringing it down from some intermediate frequency (IF) to a baseband frequency where you can process it in a receiver, or the same in the other direction as part of a transmission algorithm.
Another common use is to generate musical notes via an additive synthesizer. Indeed, the phase accumulator portion of the NCO we’ll develop below can even be used in subtractive synthesis–it’s quite generic.
Finally, because you as the designer have complete control over the frequency output of the NCO, you can even do some more exotic things–such as building a frequency hopping spread spectrum signal–should this be what you wish to do.
What is an NCO?
For our purpose today, a Numerically Controlled Oscillator is simply an oscillator created from digital logic that you have complete control over digitally. Nominally, such an oscillator will receive as an input the frequency you wish to produce and it will produce a digitally sampled sine wave at that frequency. Should you choose to use an NCO within a PLL, then you will also be adjusting the phase of this sine wave generator. For now, however, consider an NCO to be a simple digital logic circuit that takes a frequency input and produces a sampled sine wave as an output.
Let’s walk through a little bit of trigonometry, to see how this works.
and is given by the equation,
Since digital implementations can only work on
we’ll need to sample this sine wave
Ts seconds. To keep our notation straight,
we’ll now index this sine wave
output by sample
n, rather than by time,
I personally find it easier to work with the sample rate of the
rather than the time between samples,
Ts. These are reciprocals of each
fs = 1/Ts. We can then
express this same equation as,
and plot the sampled function in Fig 4.
In this figure, the samples are shown in circles. They are each separated by
a phase of
Our entire focus in this algorithm, though, is going to be on the
of this expression–the argument of the
sine wave. In the expression
is given by
n times the frequency
2pi. We’ll call this changing
Specifically, this phase value is defined by,
It represents the number of times our
has gone around the
unit circle–the number of
rotations if you will. For example, a
phi[n] of 1.0, applied internally
to our sine wave,
would lead to a
phase argument to our
sine function of
2pi–suggesting we had gone around the
phi[n] of 2.0 would yield the same value, but represent
instead that the phase
had traveled around the
twice. Fractions then will indicate partial angles from the x-axis around
the unit circle,
0.5 would represent going halfway round
0.25 would represent a quarter of the way around the
This simple modification accomplishes two purposes. First, it allows us to
avoid a multiply by
n, turning the calculation of the next
from the past
into an operation requiring an addition alone. Second, because this newer
version is no longer tied to the distance from
n=0, this subtle change
allows us to maintain any accumulated
offsets in our
as well–not just
with zero phase at zero time.
This sounds fairly straight-forward so far. What’s the trick?
The “Trick” to building an NCO
The “trick” in building an
lies in the units of
phi[n]. The units of
phi[n], presented above, are
a number of cycles (or rotations) around the unit
circle. As a result, a
phi[n] of 1.0 represents once around the
phi[n] of 2.0 represents twice around the
unit circle, etc.
However, within most signal processing logic (not quite all), you don’t care how many integer times a sine wave goes around the unit circle, you only care about the angular fraction. Let’s therefore examine this number in terms of both it’s integer and fractional portions.
Specifically, let’s separate it into an integer portion, and the first
W bits past the decimal point as shown below.
Pictorially, dropping the integer portion might look like Fig 6 below.
Notice how the phase jumps back to zero in Fig 6 at the same point as where the
starts to repeat. Sure, you didn’t need to bring this value back down
to zero, but doing so creates a limited range in
y that we can then
split among a fixed number of bits,
Since we didn’t care about the integer number of times we’ve gone around the
we can subtract from
phi[n] its integer portion to recover the
fraction alone–just like we did in Fig 6 above. We can then multiply the
2^W, so as to get a
representation that will fit within a word of
W bits long.
Just to finish this off, we’ll only keep the integer portion of this
value, representing the top
W bits of our fractional
and we’ll ignore any further bits beyond the decimal point.
PHI[n] is now a number between
2^W-1 representing a fractional
rotation around the unit circle,
a value between
That’s the first part of our “trick.”
For the second part of this “trick”, let’s use the top
P bits of our
PHI[n], as the input to our
generator, whether a
sine wave generator, or even the
input phase value of a
These can be used for one of two purposes. First, they can be used as a fractional table index, accumulating over time to adjust our table index. An example of this is shown below in Fig 7.
If you look carefully at this figure, you can see how the phase pointer moves a little bit more than one table position at a time. Eventually this extra accumulated fraction causes the table index to skip a table position entirely (position 6).
This will allow you to represent and create
sine waves at
not formed by integer steps through your table. Such
may involve jumping across table entries, as shown above in Fig 7, or even
repeating entries if necessary. Yes, jumping entries will likely cause a
distortion in your output, but it will also help you maintain better
resolution than the first
P bits alone would allow.
A second use for the bottom
W-P bits would be as part of an
scheme to reduce the phase noise
associated with any table representation.
This is such an important possibility that we may have
to come back to it and write about it more in a later article.
But, what about overflow?
This is an important question, so let’s walk through an example and see what
happens. Consider what would happen if we were keeping track of
in an 8-bit word, and one of our additions overflowed. For example,
suppose you wanted to take four steps to go around a circle, starting at
PHI[n]=8'h20 (45 degrees). You’d then add to it
8'h40 (90 degrees) on
each clock. The resulting sequence would then be,
8'h20 (45 degrees),
8'h60 (135 degrees),
8'ha0 (225 degrees),
8'he0 (315 degrees),
8'h20 (45 degrees).
Did you catch what just happened? The phase accumulator just overflowed between 315 degrees and 45 degrees, and yet the phase representation just “did the right thing”! What that means is that you can ignore any overflow in your phase–it’s just going to wrap around the unit circle anyway, and the sine wave generator is only interested in the phase fraction anyway.
Example Source Code
So, how would this look in practice? Let’s look at an example of what this would look like in both C++ and Verilog.
We’ll start with a C++ example. We’ll make a C++ NCO class incorporating these principles. There are three basic parts to this class. The first is the class declaration and table generation.
We’ll build the table itself with the sine wave values at the left-edge of any interval. This isn’t optimal, since it will force the error to zero on the left edge and likely make it a maximum on the right side of the interval, but it will yield us a decent capability quickly.
We may come back to this in a later post to minimize the maximum error in this lookup.
The last part of this initialization is to provide initial values for our
PHI[n]) and the
step necessary to create a known
In this case, we’ll initialize this step so that it will produce a zero
very exciting, but fixing that will be our next step.
The second part of this implementation is the function that sets the
This is captured within the
by the phase
m_dphase. As discussed above, this is the difference between
PHI[n-1]. Further, if you create a
SAMPLE_RATE value in
f can be given to this routine in
Hertz. Otherwise, you can keep
SAMPLE_RATE set to
1.0 and then the
f value accepted by this routine will
be set in terms of a normalized
frequency ranging from
0.0 to the
You may find that the most confusing part of the logic above is the
ONE_ROTATION value. This is the
value representing once around the
unit circle. It is given by
2^W. We have to go through a bit of a hoop to set this value, though, since
ONE_ROTATION value doesn’t fit within the
unsigned integer we are using
m_phase). Alternatively, we might’ve set
pow(2,sizeof(unsigned)*8), but the approach above makes it easier for the
compiler to recognize that this value is a constant, rather than needing to
pos() math library function.
The final part of this class steps the index into the table forward by one
step, and then returns the value of the table at the index given by the top
P bits in the
You may notice that I’ve chosen to use single precision
double precision floating point numbers. I did this for
First, I wanted to encourage you to ask the question of just how much precision do you actually need?
Second, I wanted to point out that the single precision
only has a 24-bit mantissa. Most
today allow integers of 32-bits. As a result, the integer
accumulator has more precision than a
accumulator. You can see this pictorially in Fig 8.
If you are not familiar with single precision IEEE floats, the first bit,
S, is a sign bit, and the next seven,
E, are exponent bits. The final
M, are mantissa bits. Put together, these items represent a
number sort of like,
(-1)^S * 2^E * M. (Yes, I’m skipping some details
Unlike IEEE floats, our fixed point
representation is simply
32 mantissa bits,
having a value ranging from
Which one do you think will have more precision?
In a similar fashion, if you were to use an
unsigned long for your
accumulator instead of an
unsigned value, then the
accumulator would have more precision than a
double precision float would
In both cases the reason why this phase accumulator out-performs a floating point phase accumulator is simple because our representation has a fixed decimal location, rather than a floating decimal point. That allows more bits per word to be allocated to the mantissa, since the floating point representation needed to allocate extra bits for both a sign bit and the exponent.
Were you to build an NCO implementation within Verilog, the code is almost identical. The biggest differences are first the fact that we require the frequency given to be already be converted into the appropriate units, and second that the bit select is simpler than before.
Finally, the top
P bits of
r_phase are used in our table
You may notice that both the C++ and Verilog implementations are quite similar. They are both low logic implementations showing the basics of what is required to create an NCO.
We’ve just presented the logic behind building a basic NCO. While the sine wave generated by this approach isn’t perfect, it may well be good enough for your project. Should you need a higher quality sine wave, you may wish to know that there are other/better sine wave generators out there beyond the simple table lookup method we’ve discussed before.
As one example of doing better, you may wish to note that we’ve done nothing to minimize the maximum error for any given table index.
In a similar manner, we’ve done nothing with the rest of the
bits in our
accumulator. These bits can be used to
between table entries, either
quadratically, or more as desired for better performance.
These, however, will need to be left for discussions on some other day.
And said, Hitherto shalt thou come, but no further: and here shall thy proud waves be stayed? Job 38:11