Bit growth in FPGA arithmetic
We just started a discussion of linearupsampling. In many ways the linearupsampling algorithm is much like any DSP algorithm: samples come in, multiplies, adds, and subtracts get applied, and samples go out. Like any other algorithm using these operations, every add or multiply will increase the number of bits required to represent the result. So it seems only fitting to look at bit growth, so we can know how many bits to allocate for our internal results.
Since dropping bits, once grown, is even more complicated than just tracking bit growth, we’ll save that discussion for it’s own post later.
If you’ve never dealt with, or heard of, bit growth before then it can be confusing. It doesn’t need to be so. As we’ll show here, there are only two rules you need to keep track of. Follow the two basic rules, and you should be okay. That said, we’ll present three rules today.
My basic approach for understanding bit growth is to look
at the extremes of any N
bit arithmetic operation, and see how many bits
it takes to represent that extreme. Indeed, if you ever get confused, just
come back to this approach and examine the extremes of the values you
are calculated within your application.
Looking at the extremes in the representation, an N
bit number,
a
, can be any one of a range of signed or unsigned values. In the
case of signed, two’s
complement
values, the range of an N
bit number is given by,
Hence, a four bit number can represent values between 8 and +7 inclusive.
Likewise the range of an unsigned value N
bits wide is,
and a four bit unsigned number can represent values from 0 to 15.
If you add, subtract, or multiply numbers at the ends of these ranges,
by how much must N
be increased to capture the new result without overflow?
There are a couple of cases you might not consider along the way, and so I’ll note these below and keep this page updated as a reminder of what the corner cases are.
Add one bit on every add
The rule for addition is that adding two N
bit numbers together requires
an extra bit to represent the result. Let’s test this to see if this is
true:
That certainly works for the positive end of the range, what about the negative end?
So, at the negative end we are good with adding an additional bit as well.
If you know that the numbers you will be adding will have different signs, then no bit growth is needed, since the absolute value of the result will tend towards zero rather than increasing. I’ve just never found an application that can use this nuance.
Unsigned numbers are roughly the same, with exception that you need to be careful when subtracting two unsigned numbers. The problem here being that the result might be negative. This means that, unless you can guarantee that the result will not overflow (i.e. go below zero), you will need to add a sign bit. Hence, again, one bit of growth–but this comes with an alternate representation (two’s complement) as well.
So the rule of adding one bit per addition (or) subtraction has special case we’ll need to be careful of, where the data representation changes as well, but otherwise it holds.
What if you are adding values with different bit widths? Suppose you wish to
add an N
bit number to an M
bit number? As before, the answer can be found
by examining the extremes. If you add the two positive extremes:
as long as M<N
. The same is true for the negative extreme. Here, then,
the rule is that the output must have one more bit than the widest of the
two input numbers.
Now, let’s apply this to bits within an FPGA. When building DSP algorithms,
you’ll want to adjust them for the appropriate width later. That way you
can write one routine that works for multiple bitwidths, and may even work
across multiple projects. Hence, you’ll want to declare your various bit
widths as parameters. We’ll use this approach for our linear
interpolator,
as well.) For example, if you want to add two number, a
and b
,
together, you might do something like:
This captures all of the logic we just discussed above: adding one bit on any addition or subtraction.
Long strings of addition
Since the fundamental DSP operation is a convolution, and since convolutions require adding lots of numbers together, this creates a special case of bit growth that needs to be discussed.
Imagine you had K
words, N
bits each, that all needed to be added together.
If you applied the rule above, you might allocate K+N
bits to the result.
While the result would work, it’s also more bits than you need.
As before, we’ll look at the extremes. What happens when you add K
extreme
values together? We’ll consider only the signed case, since the unsigned
case is roughly the same:
Before taking the next step, let’s imagine a number of bits, M
, that can hold
the value K
, such that:
Further, we want M
to be the smallest number possible so that this inequality
holds. Mathematically, we might say that,
or equivalently that M
is the ceiling (smallest integer above or equal
to)
of the base two logarithm
of K
. In simpler terms, it’s just the log (base
two) of K
, rounded up to the nearest integer.
Now, using this number, we know that
and so that,
And that’s our defining case. When adding a large number of values together, all having the same bit width, the number of bits required to hold the result is given by the original number of bits plus the log, base two, of the number of elements added together.
We’ll save the Verilog example for when we build an FIR filter on this blog.
Add the number of bits on every multiply
Let’s now look at multiplying two numbers. We’ll start by looking at signed multiplication, but then look at unsigned multiplication before finishing.
Our first step will be to examine
two’s complement
multiplication. Suppose we multiply
a signed N
bit number with a signed M
bit number. If you multiply
the two largest of these numbers together, the result will fit in N+M1
bits:
Don’t forget to check the other extreme! Multiplying the two most negative numbers together yields a positive result, so this is an important check:
You may note that 2^(N+M2) just barely doesn’t fit in N+M
two’s complement
bits, so this becomes the defining case: N+M
bits are required to hold the
output, although fitting within N+M1
bits will be more common.
In Verilog, multiplying two signed numbers together would look like:
Notice again that, just like the add, we allow the bit width to be specified (and overruled) by a parameter. This allows us to adjust the bitwidth later.
Now let’s consider unsigned multiplication. In this case, we only need to check the positive edge of the interval,
As before, N+M
bits are required to hold the product of an N
bit number
with an M
bit number. Indeed, the Verilog code for doing this is the same
as before, with the only exception that the signed
tag needs to be
removed to do unsigned multiplication.
Rules to Live By
Let’s summarize our rules here:

When adding two numbers, increase the number of bits in the result by one.
Be careful of subtracting unsigned numbers. Your new bit may change you from unsigned to signed two’s complement

When adding long strings of numbers, each with an identical number of bits, such as you would do when implementing a convolution the bit growth goes as the log of the length of the string, rather than one bit per addition.

When multiplying two numbers, store the output in a register having as many bits within it as the two multiplicands had combined.
We’re not quite ready to finish our lesson on Linear Interpolation, though. We still need to discuss

How to drop bits when processing signals, and

How to debug DSP algorithms.
The approach you will need to debug a DSP based algorithm is very distinctly different from the tools necessary to debug the rest of your logic.
Then we can come back and finish building our linear interpolator.
If you see any mistakes in the rules above, or any rules regarding bit growth that I’ve missed, please feel free to comment and note them below.
And the child grew, and waxed strong in spirit, and was in the deserts till the day of his shewing unto Israel. (Luke 1:80)