# Generating more than one bit at a time with an LFSR

This is now our third post on Linear Feedback Shift Registers (LFSRs). Our first post examined how to generate a Linear Feedback Shift Register (LFSR) in Verilog, and our second post walked through an example of a 5-bit LFSR. However, neither of these developments have solved the problem I had initially.

As you may recall, I wanted to use an LFSR to do channel estimation. My intention was to use a setup like Fig 1 to the right. My plan is to transmit pseudorandom bits out of an FPGA output pin at the fastest speed I can: 950 Mbps on my Artix-7 Arty FPGA board. I’ll then receive the bits at the other end of a 12” pmod cable representing my channel, and examine the waveform at the other end to get an estimate of the channel throughput and distortion.

Indeed, if all goes well I should be able to apply Shannon’s Capacity theorem to determine the maximum speed of the channel.

To do this, though, I need a source of pseudorandom bits. Worse, our last attempt at generating pseudorandom bits only generated one bit per clock, and I will need several bits per clock in order to drive an output serializer at high speed.

So let’s return to our Fibonacci LFSR
generator
and see if we can modify it to produce more than one output per clock
period.
We’ll keep the setup identical to
our first post,
so the only thing that needs to change today is the number of outputs
bits we need to generate. We’ll start with describing how we’ll go about
getting these extra bits, and then discuss the code that implements this.
We’ll also do one more: let’s *formally prove* at the end of our development,
that our resulting implementation actually works.

## Getting that next bit

Our task is create an
LFSR
that produces `WS`

bits at a time–rather than just one. (`WS`

= Word Size)
The question, though, is how shall we do this?

We’ll use the
example
LFSR
we presented earlier
for discussion and as an example along the way. You can see
this example
in Fig 2 at the right. It’s a simple five stage
LFSR,
with a feedback equation defined by `TAPS=5'b00101`

.

Let’s begin our development by imagining an infinite stream of (constant)
bits in our shift register, `sreg`

. Each of these bits satisfies the
relationship given by the Fibonacci
GF(2) equation
we started with. When we presented this equation, we had bits `MSB:0`

defined
and we just needed to calculate the next bit, `MSB+1`

,

You may also remember, from the discussion of an
example
LFSR, that these
`MSB:0`

bits have no required relationship between them–save that they cannot
all be zero.

Now let’s see what it will take to calculate bit `MSB+2`

. We can start with
our equation for the next bit, and apply it to bits `[MSB+1:1]`

to get bit
`MSB+2`

,

We’ll have to get rid of the reference to `sreg[MSB+1]`

, though, before this
equation will be useful for us.

To get there, let’s split this new equation into two parts.

The first part will be due to the bits in `sreg[MSB:1]`

. We can represent
this part as `sreg[MSB:0]&(TAPS<<1)`

. This represents the bits that we already
know. You can see this how this would affect our
example
LFSR
in Fig 3. If you compare Fig 3 with Fig 2, you can see how the taps to
generate the next bit are the same as the ones shown in Fig 2, save only that
they’ve been moved one stage to the left.

The LFSR in our example is simple enough that we can apply this same technique to even get a second bit from the taps we were given, as Fig 4 below shows.

Where the example falls apart is when you need to reference a bit that isn’t present in the original shift register. For that, we need to move to the second part of our equation.

This second part, `sreg[MSB+1]&TAPS[MSB]`

will
need some work. In this case, though, we already have an equation for
`sreg[MSB+1]`

(above). Therefore we can substitute that equation for
`sreg[MSB+1]`

into our equation for `sreg[MSB+2]`

in order to get a new
expression for `sreg[MSB+2]`

that depends only upon `sreg[MSB:0]`

,

Perhaps a picture will explain this better. Consider Fig 5 below.

In this figure, you can see the remains of the taps that were being shifted
to the left, and a broken reference to a bit that isn’t in our set from
`MSB:0`

. However, instead of adding the bit we don’t have, we instead
add the equation for that bit. The result is that our new bit, in this
example
LFSR
depends upon three bits from our shift register, instead of just the two.

Let’s simplify this expression a touch further, though.
If we take another look at the equation above, we can now use the distributive
property
to collect our terms. Specifically, we’ll factor out the `sreg`

term to the
left, and the values multiplied by the `sreg`

vector together into a new
term on the right,

In this equation, the portion of our expression to the right of
the `&`

looks very much like an expression similar to our expression for
`sreg[MSB+1]`

. Specifically, this right half expression looks like a vector
that, when taken as an inner product with `sreg`

, produces our result–just
like the `TAPS`

vector did for `sreg[MSB+1]`

. Even better, this vector
is a constant–depending only upon the `TAPS`

parameter. Let’s use this.
We’ll define `tapv[0]`

to be our `TAPS`

, and then `tapv[1]`

to be this value.

We can then repeat this derivation. Doing so will reveal a formula for
`tapv[k+1]`

based upon `tapv[k]`

. In particular,

This `tapv`

array is the key we need to build our code below.

## A Multi-Step LFSR Implementation

Now that we have an equation for future output bits, it’s time to build our multi-step shift register. Feel free to follow along in the code we’re creating here, as we’ll only discuss the basic highlights below.

Our goal is to extend our origianal
Fibonacci
LFSR
code
to output `WS`

bits per clock, where `WS`

is given by a parameter.
This means we’ll need to calculate `WS-1`

more bits than we did the
last time. It also means
that our shift register, which only used to be `LN`

bits long, will now need
to have `LN+(WS-1)`

elements in it. Of these, we’ll use the bottom `WS`

bits
as our output bits.

But I’m getting ahead of myself. Before we can get to `o_word`

, there’s a
lot of other work to be done first.

We’ll start with the equations we just worked out in the last section. We’ll
place these `tapv[]`

values into an array of `WS`

elements, each as
wide as our underlying
LFSR,
or `LN`

bits long.

The first of these equations is the one we’ve used before, `tapv[0]`

,

From here we can use the recursive equation derived above to get the rest,

If you didn’t manage to follow the development of these equations above, don’t worry. We’ll “prove” these equations work below in the next section.

The next trick is the reset value.

You may remember before, when we had one new bit per time step, that we set
our state space to an `INITIAL_FILL`

value.

This reset value then dictated the first `LN`

bits out of our device.

This won’t quite work for our multi-step core. First, the `INITIAL_FILL`

vector needs to have `WS-1`

additional elements to it. Then, to make matters
worse, all of those bits need to maintain the
LFSR
relationship between them. So while the statement above works for the first
`LN`

bits, we’ll still need to determine the next `WS-1`

bits.

We can get these next several bits of the `reset_value`

by running the first
several bits of the `reset_value`

through our
Fibonacci
LFSR
equation. This is a zero cost operation: every part of this equation
is known and constant. Therefore the synthesizer can simplify the code
before our core ever gets mapped to logic.

While this works well to develop `reset_value`

, it doesn’t work as well
as an initial value for our `sreg`

. We’d like to say,

However, this only works with some synthesizers, such as
yosys,
and not with others, like
Verilator.
For those others, we’ll set `sreg`

initially to
`INITIAL_FILL << (WS-1)`

, and then
use `reset_value`

for any subsequent resets. This will still give us the
same sequence, with the only problem being that the first `WS-1`

values
will be zero instead of those associated with the fill.

Now that we have our tap equations, `tapv[]`

, and our `reset_value`

, we can
now move on to the state register itself. In the
typical
LFSR,
all but one bit of the new state register is known. These known bits are
given by shifting the register one step to the right, whereas the last bit
is generated by the `TAPS`

equation.

Let’s do the same thing here and shift these known bits first,

Those are the easy bits.

After those easy bits, our first next bit is determined by the original Fibonacci LFSR equation. That’s the one we built into our first Fibonacci LFSR module.

The rest of the bits are calculated in the same fashion, with the exception
that the equation for them, in particular the `tapv[k]`

coefficients,
is different from one bit to the next. So, our next step is then to
walk through those extra bits applying the `tapv[k]`

equations as appropriate
to generate each new bit.

Did you notice how we also cycled through the various `tapv[k]`

expressions?
This is just implementing the equations we calculated above.

That’s it. It’s still simple, but there is just a little more simple involved in this version over the last version.

Put together, we just calculated `LN+WS-1`

bits. `LN-1`

of these are
calculated the same as before–by shifting the shift register. The next bit
is still calculated by our original `TAPS`

equation, just like we did with
our initial Fibonacci implementation.
The last `WS-1`

bits were then calculated in the
exact same way with the only exception being that the equation, `tapv[k]`

,
changed on a bit-by-bit basis.

But, let’s come back to our earlier question, will this really “work”? Will all these equations really produce the sequence we want?

To answer that question, let’s see if we can get the computer to “prove” that this new form works.

## Applying Formal Methods

I discussed the basics of using
formal methods
using yosys-smtbmc in a
previous post.
The basic idea behind
formal methods
is to define a state space, and within it the space of invalid states.
The state space is initially defined by all of the register values within
your design. It is restricted further, made smaller that is, with `assume`

statements as necessary. Invalid states are defined as well, but this time
using the `assert`

statement. We’ll use these two statements, `assume`

and
`assert`

with the ultimate goal of
proving
that the logic above will *never* enter into an illegal state.

These formal methods are particularly appropriate for this multi-bit LFSR design, since LFSRs are so mathematically based. In particular, the output of the LFSR needs to strictly satisfy a mathematical equation–one that the formal equation solver can verify for us.

As before,
we’ll use yosys
to generate a list of properties in a
format that
yices can understand. When processing a file in
this manner, yosys will define the `FORMAL`

pre-processor directive for us. That allows us to begin our `FORMAL`

verification section with an `ifdef`

,

Our plan will be to then use yosys-smtbmc to then drive the yices theorem prover to prove that our multi-step LFSR actually works.

The first step is to restrict the search state space.

We’ll use the approach
we presented before
to create a clock, and an `f_past_valid`

flag. This latter flag will tell
us whenever the `$past()`

directive will yield valid results.

Now using that assumed clock, our first
LFSR
specific step will restrict the space of all possibilities by
simply assuming that `i_reset`

is true on startup.

Next, we’ll assert that the bottom `LN`

bits are equal to the `INITIAL_FILL`

on the clock following any reset.

Now that we’ve dealt with the `i_reset`

case, we can move on to the regular
register bits.

The first test is whether or not the first of the new register bits,
bit `LN-1`

, is valid. This bit is the one produced via the
original Fibonacci
configuration
we presented before.
Here we’ll just apply it to the values left in our
shift register, plus the one we just shifted out–the one still found in
`$past(sreg[WS-1])`

. Note that this wouldve been `$past(sreg[0])`

in our
previous
version, but
since we are now generating another `WS-1`

bits, that last bit shifted out
is no longer the `0`

bit from before.

Our new implementation also defines another `WS-1`

bits that we want to
validate. Because this is a very mathematically defined
LFSR,
there’s a mathematical relationship between all of these bits. All we
need to do to
formally validate
this code
is to check our current state bits against the equation that was supposed
to generate them one at a time.

This is the majority of the proof. If this works, all of our `tapv[]`

work
will be verified.

There is one more vital part to our proof: we need to prove that `sreg`

will never be equal to zero. You may recall when we
last discussed
LFSRs
that we highlighted the problem with a
zero register: once the register `sreg`

becomes zero (if ever) then it will
cease to produce
pseudorandom
numbers. We’ll allow the theorem prover
to verify that this never happens.

That’s the last of the file, and the last of the proof.

All that’s left is to run the theorem prover and see how we did!

In case you haven’t been following along, you can find the example file we just created here. You can also find the Makefile and other files associated with driving the formal theorem prover here.

## Conclusion

Now that I have an LFSR implementation that I can step 8-14 times per clock, I should be able to create a high speed data stream from one FPGA and send it to another. The difference between what the second FPGA observes and what was actually sent will be any channel effects. In other words, I’m now ready to move my channel identification problem forward one step to measure just how much information can be reasonably stuffed through this FPGA I/O channel.

*For man also knoweth not his time: as the fishes that are taken in an evil net, and as the birds that are caught in the snare; so are the sons of men snared in an evil time, when it falleth suddenly upon them. (Eccl 9:12)*