Some time ago, I presented a CORDIC algorithm on this blog. This CORDIC algorithm makes it possible to generate sine and cosines using normalized integers as phase angles and only additions and shifts. The CORDIC algorithm is unique because of the simple fact that it does not require any multiplies to calculate these trigonometric values.
This is only the beginnings of what a CORDIC algorithm can be used for. You can also modulate baseband signals to intermediate/carrier frequencies, and downconvert them back to base band, using the same approach. It is really quite versatile.
CORDICs can also be used for generating test functions for FPGA based filters. This is particularly where I’d like to go with the technology on this blog. This means that our filtering test bench will depend upon this CORDIC algorithm, and hence any failure in the CORDIC will have a ripple affect into our future articles.
It also turns out that we made our problem more difficult by creating a core generator approach to our CORDIC algorithm. Because of the core generation approach, our CORDIC test bench will need to run across all manner of parameters: number of stages, input data width, phase bit-width, extra internal bits, and output bit width. So that we could handle this extra variability, we took a pause in our development in order to present some probabilistic quantities related to quantization which we can now use to predict the performance we expect so as to measure how well we do in comparison to it.
Building this test bench is going to take several steps. The first several of those will estimate how close we can expect the CORDIC algorithm to get to the right result. This is important, because it will then form the basis for any performance thresholds we might create to know if our implementation works. These results can also be used as an estimate of how well the CORDIC algorithm will perform for arbitrary input parameters.
This applies to all random variables.
These will form the basis of the error estimation work that follows. Indeed, we’ll use several of these properties at every step.
From Core Generator to Test Bench
One of the hassles of any test bench for a core generated algorithm is communicating information from the core generator to other parts of the design, such as the test bench. The problem has a simple solution.
The approach we’ll take for dealing with this problem is to have the
core generator generate
not only the CORDIC
code itself, but also a
C header file
describing the relevant choices that were made in the
The core generator
will create this C header
any time the command line parameter
-c is given to it.
The basic information found within this
includes the bit widths of the input (
IW), output (
OW), and phase (
It also includes the number of extra bits,
used in the CORDIC
computations, as well as the number of
NSTAGES. Finally, this
will contain information regarding whether or not the reset wire
or the traveling
aux) were included within the design.
We’ll add to this basic information some probabilistic prediction information we’ll develop below.
Now let’s start our run through the various errors or noise sources within the CORDIC algorithm.
The first noise source is the
of the input samples. We can assume that both
i_yval come with
an input quantization
1/12th–as we discussed in our last
post. Likewise, the
variance of the phase is
the lowest phase unit squared.
though, our first step was to multiply these input values by a programmable
number of extra bits. The number of these extra bits was used to create a
WW, used below,
To express this mathematically, we’ll let
y_i represent the
inputs to the algorithm, and
y_0 represent the inputs to the first
rotation. Since the precision of these values is not affected by the
pre-rotation step, no additional error is inserted there. Hence, we have the
following representation for the inputs to our first rotation:
In this equation,
e_yi are any errors associated with the
y_i. Once the multiply has been accomplished, we’ll
express the output as a sum of both the desired output
together with the differences from perfection,
e_y0. These latter
two random values
are the errors in our precision following this step.
Internal Truncation Error
As you may recall, each
rotates the previous
x,y values using a transform matrix using the equations:
There was also a corollary transform for when we wished to rotate in the other direction. To represent both, we used the plus-or-minus character above. Further, the minus-or-plus character above captures the fact that two separate signs need to change together, but that they need to be opposites of each other.
we truncated the values of
y_k after shifting them to the right. This is equivalent to adding
d_yk to each of the
values in addition to the error terms brought to this stage from the prior
stage. We can separate the desired result,
y_(k+1) from the
accumulated errors in the result,
e_y(k+1) and re-represent
At this point, we know
V[e_k] from the previous stage, and with the
V[d_k] we can calculate
V[e_(k+1)] for the next stage.
But … what is
error we calculated earlier, truncation error is neither zero
nor does it have a
However, if we assume that the choice of direction is made randomly,
with a probability of one half for each rotation direction, then the truncation
error becomes zero
(Half the time the
is negative by one half, half the time it is positive by one half, etc.)
though, is given by
1/3 since we need to integrate from
1 instead of from
Well, okay, that’s close but not quite right. In reality, the truncation
error associated with truncating an already
number requires an analysis of a discrete probability
If we suppose that a finite number of bits,
k, are dropped in the
truncation then the variance is not
The proof of this doesn’t fit on a single line, however, so we’ll leave the details as an exercise for the student.
After the CORDIC rotation stages, we rounded the result and produced an ouput. That step, therefore, is next.
As you may recall,
WW was the working width we were using for the
transform stages, and
OW is the output width. If
WW is greater than
then this stage drops bits and rounds the result. This rounding step also
adds some more
Mathematically, we might write what is taking place with,
At this point, we know the
of just about all of the components.
y_k are fixed values (not random), so their
variance is zero.
e_yk were both calculated in the last step.
scale and addition rules,
we can express the final
d_xo is the only thing left to discuss. This is the
associated with our rounding step. If we treat the value as continuous, then
d_xo would be
1/12th. This isn’t quite the case, since
d_xo is quantized, but it makes a decent approximation. (The actual number
1/8 for dropping one bit, and asymptotes at
1/12 for dropping
an infinite number of bits.)
We can now use this value to calculate the variance both at the end of each CORDIC stage, as well as at the end of the entire algorithm. We’ll place the code to calculate this value within our generic CORDIC library:
This will then inform our test bench code the average sum of square errors that it can expect.
At this point, we’re almost done with our pre-work. We have only one step left, and that is looking into any variance associated with the phase accumulator.
There’s one last component to the
of our result, and that’s the
associated with the phase accumulator. You may recall that, at each
stage of our
CORDIC algorithm, we added
or subtracted an amount of phase to a phase accumulator. We calculated
these phase values within
and placed these values into the
cordic_angle array within our generated
This means that though we intended to rotate by one angle,
we ended up rotating by an approximation of that angle. If we let
gamma reference the
R(theta) represent the rotation we
wanted to accomplish, and
e_oy be the algorithm errors we’ve
been discussing so far, then our result should be:
The problem with this representation is that it doesn’t acknowledge any of these phase errors. In other words, because we’ve only approximated the phase errors with integers, we ended up rotating by a different phase–one that was close to what we wanted to rotate by:
If we separate the rotation into two rotations, one by the rotation we want,
R(theta), and the other by the rotation we didn’t want,
R(eta), we can
rewrite this as:
Finally, if we use a small angle approximation for
eta, then this can be
rewritten as something a touch more useful:
At this point, our results are separated between the results we want, together
with an additive “noise” term that we didn’t want. From here, then, we can use
our scale and addition
to determine the variance
of the result. This is given by first the
we worked out in the last session,
V[e_o], as well as a new term
associated with the rotation error. This new term consists of a couple
of scalar values, both the known
gamma and the outputs of our algorithm,
y_k. It also
consists of the
of our random phase variable,
That’s how random phase errors are going to affect our output.
But, what is this last phase
To understand that, let’s go back and look through the logic for the CORDIC stages again. At each stage, we approximated a phase with an integer (quantized) value. Unlike before, where the difference between the true value and the quantized value was unknown, in this case we know the difference between the original phase value and its truncated representation. Hence, with a probability of one half that this difference should be positive, and one half that it is negative, we can calculate the variance of the accumulated phase.
We’ll calculate and place this rotation phase error into its own value within our core-generated C header file:
So, at this point we’ve worked out the variance associated with our CORDIC rotations, and now we’ve worked out the variance associated with truncated our phase accumulator values. With these values, we can now begin to put our test bench together.
Building the Test Bench
This has taken a lot of background, but it’s now time to build our test bench. Our test bench will be based around the idea of rotating a fixed, maximal valued, input through all of the possible phase rotations.
To start, we’ll define some helper values, both the log (based two) of the
number of test samples as well as the number of test samples, that the
rest of our code here can reference. Since we’re going to go through one
wavelength of a sine wave, these are determined by the phase width parameter,
We’ll also set our sine wave’s amplitude to the maximum value that will not overflow,
This value will be necessary to maximize our Carrier-to-noise ratio in the face of a fixed amount of background noise.
We’re also going to insist that this algorithm uses the travelling CE form of pipeline management.
This means that we set
i_aux on any input
with a valid value, and anytime
o_aux is true on the output we’ll have a
valid output value. So that we can work on all these values at the same
time, we’ll place the output values into an array as they become valid.
You may notice how
idx is used to separate the actual loop variable,
from the array index associated with the return values. This allows us to
compensate for whatever delay the algorithm might have.
takes a couple clocks to process its results, our last output will likely
not be available to us as soon as we provide our last input. Hence, we’ll need
to flush these last results out. To do this, we set the input traveling
i_aux, to zero and wait till the output CE bit,
o_aux goes to zero
Finally, let’s pull all these results together and examine them.
The first step in evaluating these results is going to be determine the
answers that the CORDIC
algorithm was supposed to produce. In other words,
what angle was given to it, what rectangular coordinates were given to it,
and therefore what values,
dyval, should’ve been returned.
The CORDIC algorithm applied a gain to our inputs, as we discussed earlier. Hence, the output is not just the rotated input, but that rotated result needs to be multiplied by the CORDIC gain before we can tell if it was done right or not.
As a last step in trying to figure out what our answer should be, we’ll need to compensate our perfect answer for any change in word width.
Now that we know the value we wanted the
let’s compare it to the value that was produced. We’ll estimate the
squared differences between what we think the algorithm should produce
and what it actually produced in the variable,
We can even calculate an average of this squared error, by accumulating these
square values into
As a final value to examine, let’s keep track of the maximum difference between our expected and returned values.
To turn the sum of the squared errors into an estimated standard deviation, we’ll need to divide by the number of samples and take the square root of the result.
We can check whether or not this average squared error is within bounds, and fail if not.
The magic number
in this comparison,
1.5, is simply a heuristic that has worked for me so far.
While a threshold could be calculated from a proper statistical hypothesis
test basis, this
number has worked well enough for me.
In a similar fashion, the maximum error should lie within a fixed number
of standard deviations
from zero. The
5.2 below is a heuristic, though, since we haven’t done the proper
statistical analysis to make this exact.
Perhaps a more useful result is the
that can be achieved. This should be given by the energy in the sine wave
input, times the
gain, squared and then divided by the estimated
variance. Since we already have the
sine wave magnitude times the
gain captured by our
scale variable, we can just square this to get our
The ratio of these two values is the SFDR, another indication of the quality of how well this algorithm works.
Unlike the average error, though, which we could calculate from the expected variance above, predicting the SFDR isn’t as simple. Therefore, we’ll only calculate it as part of our test bench and leave you to decide whether it is good enough.
We can now measure both if our CORDIC algorithm works, as well as how well it works.
So … how well does it work?
Just to find out, I ran the CORDIC core generator software, and tested the CORDIC algorithm for a variety of bit widths and phase bit widths. Shown below, in Table 1, are the Carrier-to-noise ratio (CNR) and Spurious Free Dynamic Range measurements the test bench recorded.
|IW, OW Bits
As you can see from the table, the CORDIC algorithm’s performance improves, for a given bit width, as the number of extra internal bits increases until these get to about four. At this point, the improvement settles out, and little further gain can be had.
One common estimate of quantization noise is that you can achieve 6-dB of performance per bit of sample width. By this measure, we might expect a 48 dB CNR for an 8-bit output, a 72 dB CNR for a 12-bit output, 96 dB CNR for a 16-bit output, and 144 dB CNR for a 24-bit output. Our measured results are roughly consistent with this rule, so they appear to be reasonably accurate.
Likewise the SFDR measurements showed a clear increase in performance as the bit widths involved increased.
The test bench, however, failed when it got to the larger phase widths: 29 and above. In particular, the test bench failed for all of the phase widths appropriate for the 24-bit samples. The error given, that of a memory allocation error, illustrates a weakness in a test methodology that works by first recording all of the answers into memory and then processes those results.
Because the test bench failed for the 24-bit case, the expected CNR values are shown (in parentheses) rather than any measured CNR values. Since these expected values were consistent with the predicted CNR values for the rest of the test (comparison not shown), they are at least an indication of what one might expect when using this routine for larger bit-widths.
It’s taken us a lot of work to get to this point, where we can estimate the CNR performance that a CORDIC algorithm will achieve, and then verify that it has that performance. Indeed, the math wasn’t pretty, and I’d be surprised if a large number of folks made it this far.
Please accept my apologies. I would like to keep this blog as simple as the RTL code I’m sharing, and this was not that simple.
The problem that lead us to this point, though, is simple: how do you know when an algorithm’s results are “good enough.” It was this particular question that sent us this deep into the (hard-to-follow) mathematics above and in our prior post.
Further, despite the complexity of this development today, we still didn’t solve all of the mathematical problems associated with a test bench of this type. In particular, we didn’t apply any proper statistical bases to the hypothesis tests within our test bench. As a result, we cannot be certain that this CORDIC implementation is as good as it can be.
Still, this should be good enough to move forward.
Using these results, we now have some confidence that not only does the CORDIC algorithm work, but we also have the confidence that given the right parameters it will work well enough for our purposes.
What purposes might that be? That question, my friend, we’ll have to answer in another post. This one has been too long already.
In the beginning was the Word, and the Word was with God, and the Word was God. (John 1:1)