Generating more than one bit at a time with an LFSR

This is now our third post on Linear Feedback Shift Registers (LFSRs). Our first post examined how to generate a Linear Feedback Shift Register (LFSR) in Verilog, and our second post walked through an example of a 5-bit LFSR. However, neither of these developments have solved the problem I had initially.

Fig 1: System Identification Setup

As you may recall, I wanted to use an LFSR to do channel estimation. My intention was to use a setup like Fig 1 to the right. My plan is to transmit pseudorandom bits out of an FPGA output pin at the fastest speed I can: 950 Mbps on my Artix-7 Arty FPGA board. I’ll then receive the bits at the other end of a 12” pmod cable representing my channel, and examine the waveform at the other end to get an estimate of the channel throughput and distortion.

Indeed, if all goes well I should be able to apply Shannon’s Capacity theorem to determine the maximum speed of the channel.

To do this, though, I need a source of pseudorandom bits. Worse, our last attempt at generating pseudorandom bits only generated one bit per clock, and I will need several bits per clock in order to drive an output serializer at high speed.

So let’s return to our Fibonacci LFSR generator and see if we can modify it to produce more than one output per clock period. We’ll keep the setup identical to our first post, so the only thing that needs to change today is the number of outputs bits we need to generate. We’ll start with describing how we’ll go about getting these extra bits, and then discuss the code that implements this. We’ll also do one more: let’s formally prove at the end of our development, that our resulting implementation actually works.

Getting that next bit

Our task is create an LFSR that produces WS bits at a time–rather than just one. (WS= Word Size) The question, though, is how shall we do this?

Fig 2: Example LFSR

We’ll use the example LFSR we presented earlier for discussion and as an example along the way. You can see this example in Fig 2 at the right. It’s a simple five stage LFSR, with a feedback equation defined by TAPS=5'b00101.

Let’s begin our development by imagining an infinite stream of (constant) bits in our shift register, sreg. Each of these bits satisfies the relationship given by the Fibonacci GF(2) equation we started with. When we presented this equation, we had bits MSB:0 defined and we just needed to calculate the next bit, MSB+1,

sreg[MSB+1] = ^(sreg[MSB:0] & TAPS);

You may also remember, from the discussion of an example LFSR, that these MSB:0 bits have no required relationship between them–save that they cannot all be zero.

Now let’s see what it will take to calculate bit MSB+2. We can start with our equation for the next bit, and apply it to bits [MSB+1:1] to get bit MSB+2,

sreg[MSB+2]     = ^(sreg[MSB+1:1] & TAPS);

We’ll have to get rid of the reference to sreg[MSB+1], though, before this equation will be useful for us.

To get there, let’s split this new equation into two parts.

Fig 3: One more bit

The first part will be due to the bits in sreg[MSB:1]. We can represent this part as sreg[MSB:0]&(TAPS<<1). This represents the bits that we already know. You can see this how this would affect our example LFSR in Fig 3. If you compare Fig 3 with Fig 2, you can see how the taps to generate the next bit are the same as the ones shown in Fig 2, save only that they’ve been moved one stage to the left.

The LFSR in our example is simple enough that we can apply this same technique to even get a second bit from the taps we were given, as Fig 4 below shows.

Fig 4: Two more bits

Where the example falls apart is when you need to reference a bit that isn’t present in the original shift register. For that, we need to move to the second part of our equation.

This second part, sreg[MSB+1]&TAPS[MSB] will need some work. In this case, though, we already have an equation for sreg[MSB+1] (above). Therefore we can substitute that equation for sreg[MSB+1] into our equation for sreg[MSB+2] in order to get a new expression for sreg[MSB+2] that depends only upon sreg[MSB:0],

sreg[MSB+2]     = ^(sreg & (TAPS<<1)) ^ (TAPS[MSB]&(sreg & TAPS));

Perhaps a picture will explain this better. Consider Fig 5 below.

Fig 5: Getting a third bit

In this figure, you can see the remains of the taps that were being shifted to the left, and a broken reference to a bit that isn’t in our set from MSB:0. However, instead of adding the bit we don’t have, we instead add the equation for that bit. The result is that our new bit, in this example LFSR depends upon three bits from our shift register, instead of just the two.

Let’s simplify this expression a touch further, though. If we take another look at the equation above, we can now use the distributive property to collect our terms. Specifically, we’ll factor out the sreg term to the left, and the values multiplied by the sreg vector together into a new term on the right,

sreg[MSB+2]     = ^(sreg & ( (TAPS<<1) ^ (TAPS[MSB] ? TAPS : 0)));

In this equation, the portion of our expression to the right of the & looks very much like an expression similar to our expression for sreg[MSB+1]. Specifically, this right half expression looks like a vector that, when taken as an inner product with sreg, produces our result–just like the TAPS vector did for sreg[MSB+1]. Even better, this vector is a constant–depending only upon the TAPS parameter. Let’s use this. We’ll define tapv[0] to be our TAPS, and then tapv[1] to be this value.

We can then repeat this derivation. Doing so will reveal a formula for tapv[k+1] based upon tapv[k]. In particular,

tapv[k+1] = (tapv[k] <<1) ^ (tapv[k][MSB] ? TAPS : 0)));

This tapv array is the key we need to build our code below.

A Multi-Step LFSR Implementation

Now that we have an equation for future output bits, it’s time to build our multi-step shift register. Feel free to follow along in the code we’re creating here, as we’ll only discuss the basic highlights below.

Our goal is to extend our origianal Fibonacci LFSR code to output WS bits per clock, where WS is given by a parameter. This means we’ll need to calculate WS-1 more bits than we did the last time. It also means that our shift register, which only used to be LN bits long, will now need to have LN+(WS-1) elements in it. Of these, we’ll use the bottom WS bits as our output bits.

	assign	o_word=sreg[WS-1:0];

But I’m getting ahead of myself. Before we can get to o_word, there’s a lot of other work to be done first.

We’ll start with the equations we just worked out in the last section. We’ll place these tapv[] values into an array of WS elements, each as wide as our underlying LFSR, or LN bits long.

The first of these equations is the one we’ve used before, tapv[0],

	assign	tapv[0] = TAPS;

From here we can use the recursive equation derived above to get the rest,

	genvar	k;
	generate for(k=1; k<WS; k=k+1)
	begin : PRECALCULATING_TAP_VALUE
		assign	tapv[k] = (tapv[k-1]<<1)^((tapv[k-1][(LN-1)])?TAPS:0);
	end endgenerate

If you didn’t manage to follow the development of these equations above, don’t worry. We’ll “prove” these equations work below in the next section.

The next trick is the reset value.

You may remember before, when we had one new bit per time step, that we set our state space to an INITIAL_FILL value.

	assign	reset_value[(LN-1):0] = INITIAL_FILL;

This reset value then dictated the first LN bits out of our device.

This won’t quite work for our multi-step core. First, the INITIAL_FILL vector needs to have WS-1 additional elements to it. Then, to make matters worse, all of those bits need to maintain the LFSR relationship between them. So while the statement above works for the first LN bits, we’ll still need to determine the next WS-1 bits.

We can get these next several bits of the reset_value by running the first several bits of the reset_value through our Fibonacci LFSR equation. This is a zero cost operation: every part of this equation is known and constant. Therefore the synthesizer can simplify the code before our core ever gets mapped to logic.

	generate
	for(k=0; k<WS-1; k=k+1)
	begin : CALC_RESET
		assign	reset_value[(LN+k)] = ^(reset_value[ k +: LN]&TAPS);
	end endgenerate

While this works well to develop reset_value, it doesn’t work as well as an initial value for our sreg. We’d like to say,

initial	sreg = reset_value;

However, this only works with some synthesizers, such as yosys, and not with others, like Verilator. For those others, we’ll set sreg initially to INITIAL_FILL << (WS-1), and then use reset_value for any subsequent resets. This will still give us the same sequence, with the only problem being that the first WS-1 values will be zero instead of those associated with the fill.

Now that we have our tap equations, tapv[], and our reset_value, we can now move on to the state register itself. In the typical LFSR, all but one bit of the new state register is known. These known bits are given by shifting the register one step to the right, whereas the last bit is generated by the TAPS equation.

Let’s do the same thing here and shift these known bits first,

	always @(posedge i_clk)
		if (i_reset)
			sreg[(LN-2):0] <= reset_value[(LN-2):0];
		else if (i_ce)
			sreg[(LN-2):0] <= sreg[(LN+WS-2):WS];

Those are the easy bits.

After those easy bits, our first next bit is determined by the original Fibonacci LFSR equation. That’s the one we built into our first Fibonacci LFSR module.

sreg[MSB] <= ^(sreg & TAPS);

The rest of the bits are calculated in the same fashion, with the exception that the equation for them, in particular the tapv[k] coefficients, is different from one bit to the next. So, our next step is then to walk through those extra bits applying the tapv[k] equations as appropriate to generate each new bit.

	generate
	for(k=0; k<WS; k = k+1)
	begin : RUN_LFSR
		always @(posedge i_clk)
			if (i_reset)
				sreg[LN+k-1] <= reset_value[LN+k-1];
			else if (i_ce)
				sreg[(LN+k-1)] <=
					^(sreg[(LN+WS-2):(WS-1)]&tapv[k]);
	end endgenerate

Did you notice how we also cycled through the various tapv[k] expressions? This is just implementing the equations we calculated above.

That’s it. It’s still simple, but there is just a little more simple involved in this version over the last version.

Put together, we just calculated LN+WS-1 bits. LN-1 of these are calculated the same as before–by shifting the shift register. The next bit is still calculated by our original TAPS equation, just like we did with our initial Fibonacci implementation. The last WS-1 bits were then calculated in the exact same way with the only exception being that the equation, tapv[k], changed on a bit-by-bit basis.

But, let’s come back to our earlier question, will this really “work”? Will all these equations really produce the sequence we want?

To answer that question, let’s see if we can get the computer to “prove” that this new form works.

Applying Formal Methods

I discussed the basics of using formal methods using yosys-smtbmc in a previous post. The basic idea behind formal methods is to define a state space, and within it the space of invalid states. The state space is initially defined by all of the register values within your design. It is restricted further, made smaller that is, with assume statements as necessary. Invalid states are defined as well, but this time using the assert statement. We’ll use these two statements, assume and assert with the ultimate goal of proving that the logic above will never enter into an illegal state.

These formal methods are particularly appropriate for this multi-bit LFSR design, since LFSRs are so mathematically based. In particular, the output of the LFSR needs to strictly satisfy a mathematical equation–one that the formal equation solver can verify for us.

As before, we’ll use yosys to generate a list of properties in a format that yices can understand. When processing a file in this manner, yosys will define the FORMAL pre-processor directive for us. That allows us to begin our FORMAL verification section with an ifdef,

`ifdef	FORMAL

Our plan will be to then use yosys-smtbmc to then drive the yices theorem prover to prove that our multi-step LFSR actually works.

The first step is to restrict the search state space.

We’ll use the approach we presented before to create a clock, and an f_past_valid flag. This latter flag will tell us whenever the $past() directive will yield valid results.

Now using that assumed clock, our first LFSR specific step will restrict the space of all possibilities by simply assuming that i_reset is true on startup.

	initial	assume(i_reset);

Next, we’ll assert that the bottom LN bits are equal to the INITIAL_FILL on the clock following any reset.

	always @(posedge i_clk)
		if ((f_past_valid)&&($past(i_reset)))
			assert(sreg[(LN-1):0] == INITIAL_FILL);

Now that we’ve dealt with the i_reset case, we can move on to the regular register bits.

The first test is whether or not the first of the new register bits, bit LN-1, is valid. This bit is the one produced via the original Fibonacci configuration we presented before. Here we’ll just apply it to the values left in our shift register, plus the one we just shifted out–the one still found in $past(sreg[WS-1]). Note that this wouldve been $past(sreg[0]) in our previous version, but since we are now generating another WS-1 bits, that last bit shifted out is no longer the 0 bit from before.

	always @(posedge i_clk)
		if ((f_past_valid)&&(!$past(i_reset))&&($past(i_ce)))
			assert(sreg[LN-1]
				== ^({sreg[(LN-2):0], $past(sreg[WS-1])}
					& TAPS));

Our new implementation also defines another WS-1 bits that we want to validate. Because this is a very mathematically defined LFSR, there’s a mathematical relationship between all of these bits. All we need to do to formally validate this code is to check our current state bits against the equation that was supposed to generate them one at a time.

generate
	for(k=0; k<WS-1; k=k+1)
		always @(posedge i_clk)
			if ((f_past_valid)&&(!$past(i_reset)))
			assert(sreg[LN+k] == ^(sreg[(LN-1+k):k]&TAPS));
	endgenerate

This is the majority of the proof. If this works, all of our tapv[] work will be verified.

There is one more vital part to our proof: we need to prove that sreg will never be equal to zero. You may recall when we last discussed LFSRs that we highlighted the problem with a zero register: once the register sreg becomes zero (if ever) then it will cease to produce pseudorandom numbers. We’ll allow the theorem prover to verify that this never happens.

	always @(*)
		assert(sreg[(LN+WS-2):(WS-1)] != 0);

That’s the last of the file, and the last of the proof.

`endif	// FORMAL
endmodule

All that’s left is to run the theorem prover and see how we did!

In case you haven’t been following along, you can find the example file we just created here. You can also find the Makefile and other files associated with driving the formal theorem prover here.

Conclusion

Now that I have an LFSR implementation that I can step 8-14 times per clock, I should be able to create a high speed data stream from one FPGA and send it to another. The difference between what the second FPGA observes and what was actually sent will be any channel effects. In other words, I’m now ready to move my channel identification problem forward one step to measure just how much information can be reasonably stuffed through this FPGA I/O channel.