This is now our third post on Linear Feedback Shift Registers (LFSRs). Our first post examined how to generate a Linear Feedback Shift Register (LFSR) in Verilog, and our second post walked through an example of a 5-bit LFSR. However, neither of these developments have solved the problem I had initially. As you may recall, I wanted to use an LFSR to do channel estimation. My intention was to use a setup like Fig 1 to the right. My plan is to transmit pseudorandom bits out of an FPGA output pin at the fastest speed I can: 950 Mbps on my Artix-7 Arty FPGA board. I’ll then receive the bits at the other end of a 12” pmod cable representing my channel, and examine the waveform at the other end to get an estimate of the channel throughput and distortion.

Indeed, if all goes well I should be able to apply Shannon’s Capacity theorem to determine the maximum speed of the channel.

To do this, though, I need a source of pseudorandom bits. Worse, our last attempt at generating pseudorandom bits only generated one bit per clock, and I will need several bits per clock in order to drive an output serializer at high speed.

So let’s return to our Fibonacci LFSR generator and see if we can modify it to produce more than one output per clock period. We’ll keep the setup identical to our first post, so the only thing that needs to change today is the number of outputs bits we need to generate. We’ll start with describing how we’ll go about getting these extra bits, and then discuss the code that implements this. We’ll also do one more: let’s formally prove at the end of our development, that our resulting implementation actually works.

## Getting that next bit

Our task is create an LFSR that produces `WS` bits at a time–rather than just one. (`WS`= Word Size) The question, though, is how shall we do this? We’ll use the example LFSR we presented earlier for discussion and as an example along the way. You can see this example in Fig 2 at the right. It’s a simple five stage LFSR, with a feedback equation defined by `TAPS=5'b00101`.

Let’s begin our development by imagining an infinite stream of (constant) bits in our shift register, `sreg`. Each of these bits satisfies the relationship given by the Fibonacci GF(2) equation we started with. When we presented this equation, we had bits `MSB:0` defined and we just needed to calculate the next bit, `MSB+1`,

You may also remember, from the discussion of an example LFSR, that these `MSB:0` bits have no required relationship between them–save that they cannot all be zero.

Now let’s see what it will take to calculate bit `MSB+2`. We can start with our equation for the next bit, and apply it to bits `[MSB+1:1]` to get bit `MSB+2`,

We’ll have to get rid of the reference to `sreg[MSB+1]`, though, before this equation will be useful for us.

To get there, let’s split this new equation into two parts. The first part will be due to the bits in `sreg[MSB:1]`. We can represent this part as `sreg[MSB:0]&(TAPS<<1)`. This represents the bits that we already know. You can see this how this would affect our example LFSR in Fig 3. If you compare Fig 3 with Fig 2, you can see how the taps to generate the next bit are the same as the ones shown in Fig 2, save only that they’ve been moved one stage to the left.

The LFSR in our example is simple enough that we can apply this same technique to even get a second bit from the taps we were given, as Fig 4 below shows. Where the example falls apart is when you need to reference a bit that isn’t present in the original shift register. For that, we need to move to the second part of our equation.

This second part, `sreg[MSB+1]&TAPS[MSB]` will need some work. In this case, though, we already have an equation for `sreg[MSB+1]` (above). Therefore we can substitute that equation for `sreg[MSB+1]` into our equation for `sreg[MSB+2]` in order to get a new expression for `sreg[MSB+2]` that depends only upon `sreg[MSB:0]`,

Perhaps a picture will explain this better. Consider Fig 5 below. In this figure, you can see the remains of the taps that were being shifted to the left, and a broken reference to a bit that isn’t in our set from `MSB:0`. However, instead of adding the bit we don’t have, we instead add the equation for that bit. The result is that our new bit, in this example LFSR depends upon three bits from our shift register, instead of just the two.

Let’s simplify this expression a touch further, though. If we take another look at the equation above, we can now use the distributive property to collect our terms. Specifically, we’ll factor out the `sreg` term to the left, and the values multiplied by the `sreg` vector together into a new term on the right,

In this equation, the portion of our expression to the right of the `&` looks very much like an expression similar to our expression for `sreg[MSB+1]`. Specifically, this right half expression looks like a vector that, when taken as an inner product with `sreg`, produces our result–just like the `TAPS` vector did for `sreg[MSB+1]`. Even better, this vector is a constant–depending only upon the `TAPS` parameter. Let’s use this. We’ll define `tapv` to be our `TAPS`, and then `tapv` to be this value.

We can then repeat this derivation. Doing so will reveal a formula for `tapv[k+1]` based upon `tapv[k]`. In particular,

This `tapv` array is the key we need to build our code below.

## A Multi-Step LFSR Implementation

Now that we have an equation for future output bits, it’s time to build our multi-step shift register. Feel free to follow along in the code we’re creating here, as we’ll only discuss the basic highlights below.

Our goal is to extend our origianal Fibonacci LFSR code to output `WS` bits per clock, where `WS` is given by a parameter. This means we’ll need to calculate `WS-1` more bits than we did the last time. It also means that our shift register, which only used to be `LN` bits long, will now need to have `LN+(WS-1)` elements in it. Of these, we’ll use the bottom `WS` bits as our output bits.

But I’m getting ahead of myself. Before we can get to `o_word`, there’s a lot of other work to be done first.

We’ll start with the equations we just worked out in the last section. We’ll place these `tapv[]` values into an array of `WS` elements, each as wide as our underlying LFSR, or `LN` bits long.

The first of these equations is the one we’ve used before, `tapv`,

From here we can use the recursive equation derived above to get the rest,

If you didn’t manage to follow the development of these equations above, don’t worry. We’ll “prove” these equations work below in the next section.

The next trick is the reset value.

You may remember before, when we had one new bit per time step, that we set our state space to an `INITIAL_FILL` value.

This reset value then dictated the first `LN` bits out of our device.

This won’t quite work for our multi-step core. First, the `INITIAL_FILL` vector needs to have `WS-1` additional elements to it. Then, to make matters worse, all of those bits need to maintain the LFSR relationship between them. So while the statement above works for the first `LN` bits, we’ll still need to determine the next `WS-1` bits.

We can get these next several bits of the `reset_value` by running the first several bits of the `reset_value` through our Fibonacci LFSR equation. This is a zero cost operation: every part of this equation is known and constant. Therefore the synthesizer can simplify the code before our core ever gets mapped to logic.

While this works well to develop `reset_value`, it doesn’t work as well as an initial value for our `sreg`. We’d like to say,

However, this only works with some synthesizers, such as yosys, and not with others, like Verilator. For those others, we’ll set `sreg` initially to `INITIAL_FILL << (WS-1)`, and then use `reset_value` for any subsequent resets. This will still give us the same sequence, with the only problem being that the first `WS-1` values will be zero instead of those associated with the fill.

Now that we have our tap equations, `tapv[]`, and our `reset_value`, we can now move on to the state register itself. In the typical LFSR, all but one bit of the new state register is known. These known bits are given by shifting the register one step to the right, whereas the last bit is generated by the `TAPS` equation.

Let’s do the same thing here and shift these known bits first,

Those are the easy bits.

After those easy bits, our first next bit is determined by the original Fibonacci LFSR equation. That’s the one we built into our first Fibonacci LFSR module.

The rest of the bits are calculated in the same fashion, with the exception that the equation for them, in particular the `tapv[k]` coefficients, is different from one bit to the next. So, our next step is then to walk through those extra bits applying the `tapv[k]` equations as appropriate to generate each new bit.

Did you notice how we also cycled through the various `tapv[k]` expressions? This is just implementing the equations we calculated above.

That’s it. It’s still simple, but there is just a little more simple involved in this version over the last version.

Put together, we just calculated `LN+WS-1` bits. `LN-1` of these are calculated the same as before–by shifting the shift register. The next bit is still calculated by our original `TAPS` equation, just like we did with our initial Fibonacci implementation. The last `WS-1` bits were then calculated in the exact same way with the only exception being that the equation, `tapv[k]`, changed on a bit-by-bit basis.

But, let’s come back to our earlier question, will this really “work”? Will all these equations really produce the sequence we want?

To answer that question, let’s see if we can get the computer to “prove” that this new form works.

## Applying Formal Methods

I discussed the basics of using formal methods using yosys-smtbmc in a previous post. The basic idea behind formal methods is to define a state space, and within it the space of invalid states. The state space is initially defined by all of the register values within your design. It is restricted further, made smaller that is, with `assume` statements as necessary. Invalid states are defined as well, but this time using the `assert` statement. We’ll use these two statements, `assume` and `assert` with the ultimate goal of proving that the logic above will never enter into an illegal state.

These formal methods are particularly appropriate for this multi-bit LFSR design, since LFSRs are so mathematically based. In particular, the output of the LFSR needs to strictly satisfy a mathematical equation–one that the formal equation solver can verify for us.

As before, we’ll use yosys to generate a list of properties in a format that yices can understand. When processing a file in this manner, yosys will define the `FORMAL` pre-processor directive for us. That allows us to begin our `FORMAL` verification section with an `ifdef`,

Our plan will be to then use yosys-smtbmc to then drive the yices theorem prover to prove that our multi-step LFSR actually works.

The first step is to restrict the search state space.

We’ll use the approach we presented before to create a clock, and an `f_past_valid` flag. This latter flag will tell us whenever the `\$past()` directive will yield valid results.

Now using that assumed clock, our first LFSR specific step will restrict the space of all possibilities by simply assuming that `i_reset` is true on startup.

Next, we’ll assert that the bottom `LN` bits are equal to the `INITIAL_FILL` on the clock following any reset.

Now that we’ve dealt with the `i_reset` case, we can move on to the regular register bits.

The first test is whether or not the first of the new register bits, bit `LN-1`, is valid. This bit is the one produced via the original Fibonacci configuration we presented before. Here we’ll just apply it to the values left in our shift register, plus the one we just shifted out–the one still found in `\$past(sreg[WS-1])`. Note that this wouldve been `\$past(sreg)` in our previous version, but since we are now generating another `WS-1` bits, that last bit shifted out is no longer the `0` bit from before.

Our new implementation also defines another `WS-1` bits that we want to validate. Because this is a very mathematically defined LFSR, there’s a mathematical relationship between all of these bits. All we need to do to formally validate this code is to check our current state bits against the equation that was supposed to generate them one at a time.

This is the majority of the proof. If this works, all of our `tapv[]` work will be verified.

There is one more vital part to our proof: we need to prove that `sreg` will never be equal to zero. You may recall when we last discussed LFSRs that we highlighted the problem with a zero register: once the register `sreg` becomes zero (if ever) then it will cease to produce pseudorandom numbers. We’ll allow the theorem prover to verify that this never happens.

That’s the last of the file, and the last of the proof.

All that’s left is to run the theorem prover and see how we did!

In case you haven’t been following along, you can find the example file we just created here. You can also find the Makefile and other files associated with driving the formal theorem prover here.

## Conclusion

Now that I have an LFSR implementation that I can step 8-14 times per clock, I should be able to create a high speed data stream from one FPGA and send it to another. The difference between what the second FPGA observes and what was actually sent will be any channel effects. In other words, I’m now ready to move my channel identification problem forward one step to measure just how much information can be reasonably stuffed through this FPGA I/O channel.