In my case it happened this last week. I needed to do some channel estimation, and I reasoned that a pseudorandom sample stream would make a nice input to the channel. Specifically, my ultimamte plan is to transmit pseudorandom bits out of an FPGA output pin at the fastest speed I can: 950 Mbps on my Artix-7 Arty FPGA board. I can then receive the bits at the other end of the channel, and examine them to get an estimate of the channel distortion. If all goes well, I should even be able to apply Shannon’s Capacity theorem to determine the maximum speed of the channel.
All I needed to get started was a source of pseudorandom bits.
One common source for pseudorandom bits in digital logic is a Linear Feedback Shift Register (LFSR). LFSRs are simple to build and so they are commonly used for this purpose. They have some wonderful mathematical properties associated with them, guaranteeing a certain amount of pseudorandomness. Be forwarned, however, the one thing LFSRs are not is cryptographically random/secure. Do not use LFSRs in place of a proper cryptographically secure sequence. That said, they are worth learning how to create and use.
Much of my System Identification project will need to wait for a later post. Until that time, I’ll be accepting gentlemen’s wagers (no money involved) regarding what this ultimate maximum speed will be.
Today’s topic though is just a simple discussion of how to implement an LFSR in Verilog. We’ll skip the worst of the mathematics, although I would recommend you look them up.
Basically, the idea behind an LFSR is that given a current register state, I’ll call it a “fill”, you can compute the next state via a linear combination of the bits in the current state. I’ll use the term “taps” or “polynomial” interchangeably to describe this formula–although you might need to dig into more of the mathematics to understand why.
In this post, we’ll discuss these registers and how to create them in Verilog.
If you are not familiar with what a field is in this context, it’s simply a mathematical abstraction of a number system based upon a set of values, together with the definitions of addition and multiplication defined on those values. The neat thing about fields is that much of the math you are already likely to be familiar with is based upon fields. For example, I’m going to guess that you are probably familiar with linaer algebra. Linaer algebra, though, is all based upon fields. Although I learned linaer algebra using integers, real numbers, and complex numbers, linaer algebra is actually based upon fields: arbitrary algebraic systems with a defined set of values, as well as two primary operations defined upon that set.
What we haven’t mentioned are the two two mathematical operators that define this field: addition and multiplication. Both are used in the implementation of an LFSR.
The difference between the addition and multiplication operators you might be familiar with, and the operators defined by GF(2), is that the GF(2) operators are followed by a modulo two. Hence, the result will be a ‘1’ if the traditional addition (or multiplication) result was odd, or ‘0’ if it was even.
If you look at the addition operator under
you’ll find the first couple values to be what you expect:
1+0=1. Where things get a little interesting is when
1+1 together. In the traditional integer arithmetic you’re likely
familiar with, you’d get a
2. In arithmetic over
you need to take the result
1+1 results in zero, as shown in Fig 1.
Think about that operation again for a moment:
That describes an
exclusive or operation, or XOR.
We’ll do the same thing for multiplication. In this case, multiplying numbers from the set of zero and one will only yield the result zero and one, so you might not notice the modulo two. Then, when you look at the results as shown in Fig 2, you’ll quickly see that multiplication over GF(2) is nothing more than a bitwise-and.
Given these two operators, let’s define a bit-vector,
sreg, whose elements
are either zero or one—elements from
GF(2). We’ll also state that
this bit-vector has an initial value given by
INITIAL_FILL. In a similar
we can construct a linear operator,
T(x), that can be
sreg to yield a new or updated value for
sreg at the next
time-step. This operator will be defined by another vector,
we’ll come back to this in a moment. For now, just remember that any
linear operation on a finite set can be represented by a matrix–another
fact we’ll come back to.
In pseudocode notation, an
just applies the linear operator to the current
sreg value over and over
A key thing to point out, though, is that any linear operation on the all
zeros vector will always return the all zeros vector, hence
we go along, we’ll need to be careful to keep
sreg from becoming zero.
LFSRs so special
is that as you apply this linear operator, the bottom bit will appear
to be pseudorandom. If you
choose your linear system well, an
N bit shift register will walk through
2^N-1 combinations before repeating–generating
2^(N-1)-1 zeros along the way. You do need to be aware, though,
not all linear systems have this property. Those that do are said to
create Maximal Length
These maximal length
tend to be well known, and tables even
containing examples of
that will generate anything from short
all the way up to very long ones.
Ok, I promised not to get deep into the math, but it is important to understand that LFSRs implement a linear operation on a bit-vector. These linear operations can be represented as a matrix operation–but only if the linear system described by the matrix follows the rules of GF(2).
As the wikipedia article on LFSRs explains, there are two forms of expressing
LFSRs. The first
is a Galois form, the second is known as the Fibonacci form. Both will yield
the same sequences, although their initial parameters,
TAPS values will be difference from one form to the other.
(Don’t worry, we’ll define these more formally in a moment.) Pictorially, a
Galois shift register
looks like Fig 3.
Let’s let the number of stages in this register be
LN, and the initial
value of all of the stages given by
INITIAL_FILL. The last item needed
to implement a general purpose Galois shift register
are the coefficients of the multiplies shown in the figure above. We’ll
call these the
TAPS–mostly because they “tap-into” the shift register
sequence when one, and ignore it when zero.
Looking at Fig 3 again, you can see an input going through
of processing to create an output. That output is then
fed-back to affect the stages along the way. Notice also the adds and the
multiplies. These are the adds
we discussed in our last section.
The multiplication values are given by the
TAPS of the
LFSR. They do
not change during sequence generation.
The mathematicians will describe the
TAPS as the coefficients of a
polynomial in GF(2).
These coefficients can either be a one or a zero–as with everything in
As we mentioned above, specific
will yield maximal length
If you would rather, though, I tend to think of
TAPS as just a particular
N bit binary vector.
One of the reasons why I wanted to point out, in the last section, that an LFSR is nothing but a linear operator over GF(2), is that it allows us to write out this formula as a series of linear equations.
In this form, I’ve used
g_(N-1) to represent the taps. The
N bits to it ranging from
x[MSB]. This equation provides the formula for calculating the next
x, from the last one. It’s worth noting that the entire
matrix is nearly upper right triangular, save for the
g coefficients in the
Okay, so that’s the operation we want to perform. Now, let’s finish building it.
In pseudocode, I’m going to represent the
x’s in a register I’ll call
g’s will be our
TAPS bit-vector parameter.
The other value of interest is the
INITIAL_FILL. For a maximal length
sequence, the only
restriction on the
INITIAL_FILL is that it cannot be zero. Any other
value is allowed. This
INITIAL_FILL will determine your starting point in
the random sequence created by the
You may also have noticed the “input” in Fig 3. We’ll set this to zero for today’s task. Setting it to another value has the effect of creating a feed-through randomizer, rather than the pseudorandom number generator we are building today.
The code necessary to implement a Galois shift register can be drawn directly
from Fig 3. We’ll use
sreg to describe the values in the register, so that
o_bit, is just the LSB of
On a reset, we’ll initialize
sreg to our
INITIAL_FILL. Likewise, we’ll use
the “global-CE” pipeline
so nothing is allowed to change unless
i_ce is also true. This will make our
circuit useful for
tasks that need to run synchronously at data rates other than our clock rate.
With that aside, our logic is straight-forward. On every clock, we move all
the bits forward by one step. If the LSB is a one, we add
TAPS vector to our state vector as the bits step forward.
Note how we checked for a
1 bit in the bottom bit of
sreg. This is how to
implement the multiply. If the bottom bit is a zero, then the
zero will be the zero vector which will not affect anything when added. On
the other hand, if
sreg is a
1, we’ll add the taps to our result.
The astute observer may note here that
TAPS[LN-1] must be a one, or the
most significant bit (MSB) will be trivially zero.
Incidentally, this version of an LFSR is really easy to calculate in software, and its C++ equivalent is given by:
If you look hard through the Linux kernel sources, you’ll even find an implementation of this algorithm within them. . There, though it’s used to implement a Cyclic Redundancy Check (CRC) and so their input isn’t zero.
LFSR form is
the Fibonacci form. The Fibonacci form of an
is mathematically equivalent to the Galois
save that the outputs are calculated in a different manner, and the
need to be “reversed”. The diagram in Fig 4 shows the basic form of the
In many ways, this makes an ideal form for implementing an
The feedback bit is usually calculated from just a small
number of taps (2-4) into the shift register, making it fit within a single
LUT quite easily.
Another unique feature to this form is that the values in the shift
register aren’t modified between when they are originally calculated and
the output–making it possible to see then next
LN output bits by
just examining the shift register state. (We’ll use this in our next
showing how to generate more than one bit at a time.)
As with the Galois LFSR, the Fibonacci version also implements a linear system. This time, though, it has a different structure.
MSB is the feedback bit above.
When it comes to implementation, the initial implementation steps are the same as with the Galois implementation.
The difference is what you do to determine the next output bit. When using
the Fibonacci form, we take an inner
product between or
sreg, and our TAPS vector. This is nothing more than a point by
point multiply (AND),
followed by summing
all of the results of the multiplies together to create the new high order
bit. As before, we’ll hold the “input” to zero, and so we get the following
It’s not often that you get a chance to use an
reduction operator, however
this is one of those times. Note the expression
^(sreg[..] & TAPS). The
^ in the front of this specifies that all of the values in
sreg[..] & TAPS
are to be added together, in a
modulo two sum–just
what we need to do here.
This isn’t the only way to build a Fibonacci LFSR within an FPGA. For example, this Xilinx app note discusses how to implement an LFSR using the shift register logic blocks within a Xilinx FPGA. Xilinx’s development is different from mine primarily because the shift register logic block that Xilinx depended upon does not support a reset signal.
I’ve also placed a test bench for each of these two shift register implementations here and here. Since the two are nearly identical, I’ll walk through the salient portions of one of them only. For a more detailed description of how to build a Verilator based-test bench in general, please take a look at this article. The one thing we’ll do different here from many of my other projects is that we’ll call Verilator directly rather than using a subclass–just because our simulation needs for this component are so simple.
We’re also going to declare three more variables:
We’ll use the
clocks variable to count the number of
clocks we use, to keep
from overloading the user’s screen with numbers.
nout will be used to
help us place spaces in the output, and
ones will be used to count the
number of ones in our output.
This test bench also
assumes that our result will be a maximal length
sequence. In this
case, there should be exactly
2^(LN)-1 unique outputs (clock ticks before
sreg equals one again), and
2^(LN-1) ones in those outputs. This will
be our evidence of success.
Our next step will be to start our core with a reset pulse.
TRACE_NEGEDGE are macros used to write the
simulation state to a
Once the reset is complete, we’ll assert that the shift register initial state
1. While this doesn’t necessarily need to be the case, setting
INITIAL_FILL to anything else within the
will necessitate coming back and updating it here.
Finally, we’ll move on to the simulation itself. We’ll start out with
16384 clocks of simulation.
On each of these clocks, we’ll print an output bit to the screen. This should allow you to visually verify that the bits look random. We’ll place spaces between every eight bits for easier reading as well.
Finally, if the shift register value ever returns to our
INITIAL_FILL of one,
then we know we’ve exhausted the sequence. We can therefore break.
In case you want to test for a really long sequence, the
won’t be enough. Rather than continue to fill your screen, the
just quietly continues crunching here.
As a final step, let’s compare the number of clocks we used in our output to determine whether or not the test bench was ultimately successful.
Of course, since I’m posting this, I’ve already proven it …
There you go! That’s all there is to building an LFSR and demonstrating that it works.
Sadly, though, neither of our implementations today was sufficient for my channel estimation problem at 950MHz. Even if I could run this logic that fast, the rest of my logic wouldn’t be able to keep up. As a result, we’ll need to come back to this topic and see if we can’t build an LFSR that produces multiple outputs in parallel.
I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all. (Eccl 9:11)