A Configurable Signal Delay Element
It’s always fun to design something simple every now and then–something that doesn’t take too much thought, yet still fits a needed place in something you are building.
Today, let’s look at a delay element. This is a fundamental signal processing operation that takes a single stream and creates two streams–with the second stream delayed by some programmable amount of samples from the first one.
This is actually a very common
signal processing
need. Imagine if you will that you had one piece of processing code that was
applied to the input, took many samples (N
) to accomplish, and that the
result of this processing told you how to lock onto the signal that began
N
samples ago.
A classic example of this would be a burst preamble–a known sequence that occurs at the beginning of a burst transmission to help you to synchronize to that transmission. However, once synchronized, you then want to go back and process any samples immediately following that preamble. Should you have any delay in your preamble processing chain, then you’d need to go “back in time” to start processing your signal immediately following this preamble. This is one purpose of a delay element.
So, just for fun and to have a change-up from some more serious and complex topics, let’s examine a simple delay element.
Pseudocode
At first blush, the logic for a delay element. seems quite simple: just delay the incoming samples by some variable amount. Indeed, you might wish to start coding the algorithm together immediately (I did). You’d start with a delay of zero, and then build the logic for the delay of one.
Then you’d get stuck.
It’s right here at this point in the pseudocode that you need to transition
to a block RAM delay, and so you need a memory value read from block RAM.
We’ll call this value memval
.
Ok, so we’ll need a memory. That means we are going to want to write our data into memory.
We are also going to want to read it back out.
And in order to make this all work, we’re going to need some memory address manipulation code. Most of this is straight boilerplate.
The read address, though, is not boilerplate. It needs to be related to the write address. Indeed, this is perhaps the only difficult part of building a signal delay element such as this.
So how should the read address relate to the write address?
The first answer in this case would be that the read address should be less
than the write address by i_delay
elements. When you then try this code
within a test bench,
you’ll find that this choice just doesn’t work.
So let’s think this through a touch more.
Scheduling the Memory Pipeline
To get the read and write address correct, let’s examine how our signals would move through this pipeline. We can build a pipeline schedule as we’ve done before on this blog. You can see the schedule for our delay logic shown in Fig 2.
The basic concept of this diagram is that variables that are valid at one
time step lead to new variables that are valid on the next. So if i_data
is valid on one time step, o_data
will be valid on the next. Likewise
if we write i_data
to memory using the wraddr
signal on one time step,
then the memory element, mem[wraddr]
will have that value on the next time
step.
Let’s follow what happens to this memory a touch further. If after writing
to memory we immediately read from it into memval
, that will require a
read address, rdaddr
. We can then place this memval
into our output
delay element, o_delayed
and be done.
So how many clocks did that take? Two. Count the difference between when
o_delay
was produced and when o_data
was produced. This is then our
minimum delay when using memory: two clocks.
If you’ve been following this blog, you may remember going through this same exercise when we built a moving average filter.
From here, we can work out how the read address corresponds to the write
address. In particular, if rdaddr == wraddr-1
, then we are delaying by
two. So what we want, then, is to have rdaddr = wraddr+1-i_delay
and
that’s all the missing logic required to make this work.
Ok, I’ll admit … I didn’t put any time into figuring out how to schedule
the pipeline. I just built it wrong, and then adjusted the relationship between
wraddr
and rdaddr
in the test
bench
until I got things right. That should help illustrate for you, though, the
power of building a test bench and
simulating–rather
than just implementing something and then wondering what went wrong later.
Building this
So let’s build our final delay element!
Much of this logic is the logic you might expect from our discussion above.
For example, we need to increment the write address on every sample.
You may notice that this write address doesn’t depend upon a reset signal. The reason is simply because it doesn’t need to. As long as it increments by one on every clock from whatever address it’s at, it will work.
Likewise we are going to want to write our incoming samples into memory.
The difficult trick from above was that we need to make certain that the
read address equals the write address plus one minus the delay. Making this
happen in clocked logic is a touch more difficult–particularly because of the
i_ce
pipeline control
signal.
So that we can keep the read address a fixed distance from the write address
any time the delay, herein called w_delay
–you’ll see
why in a bit, changes, we’ll violate the rules of the global
CE bit
and set this on every clock. If CE
is valid, we set the read address
to the write address minus the delay plus two–not one. The two allows us
to compensate for the fact that the write address is also changing on this
clock. However, if the CE
line is low, then the write address isn’t changing
and the logic may appear more intuitive.
Now that we have our read address, we can simply read from memory.
With all this information, we can now make our delay logic. You migt recognize this from before–the delay of zero and the delay of one samples are identical.
Even the delay logic, which is implemented using memory, reads just about the same as it did before.
Pretty simple, right?
Well, okay, so let’s get one touch fancier. Right now this
delay element
works off of a variable, user-selectable delay. Suppose instead that you
wanted this delay
element
to use a fixed delay instead. You could just feed a constant value to
i_delay
and allow the optimizer within the synthesizer to handle
everything that follows. We’ll take a separate approach here. We’ll
capture this desired fixed delay with a FIXED_DELAY
parameter, and then use
this parameter to determine the delay any time
FIXED_DELAY != 0
.
Remember that w_delay
item I said we’d touch on later? This value is set to
i_delay
when the parameter isn’t forcing the delay amount, and FIXED_DELAY
when it is.
That’s a nice improvement to our delay component.
Still, the overall design isn’t all that different from the one we started out with–even with the details filled in.
Building a Test Bench
Since this is a fairly simple component, we can discuss the test bench before we finally conclude–rather than separating the test bench into a separate post. The test bench for this delay element follows from the same principles I laid out earlier, when we examined Verilator. Basically, when you are using Verilator your test bench is a C++ program that interacts with your design, and then compares the responses from the design to known responses that we might expect.
We’ll capture our parameters before starting, since our test will be dependent upon them.
Setting up the main program itself is fairly boilerplate. You need to make
certain you call the commandArgs
function to initialize
Verilator.
We’ll then declare our test class–wrapping it within the
TESTB
class so that we can get clock ticks, resets, and
VCD file
generation code for free.
Our first task will be to open a VCD trace file so that we can debug any problems later.
Then we’ll reset our core, so that we can start this test in a known state.
You may recall from our first formal methods post the problem associated with testing a reset in a test bench: that there are more combinations of when a reset can happen with respect to this logic than I have the creativity to imagine. It’s a problem we’re going to ignore here, but a valid one and hence one worth remembering.
We’re going to need our own copy of the delay memory, so that we can also create our own delay here in C++ to compare the unit under test to.
Let’s run our test across every delay that this delay element may produce. We’ll loop through each possible delay, testing and validating the results along the way.
The first step, following any change in delay value, is to load up that many values in the memory without testing any of the output delays.
To do this, we’ll generate a random number,
and to write it to our core.
We’ll also record that number into our own memory copy at the same time.
After loading one element per delay, we can now come back and test whether
or not the output was properly delayed. We’ll check NTESTS
(512) of these
for each possible delay.
As before, each test consists of creating a random value,
writing that value to the core,
and recording a copy of it for ourselves.
Now we can check whether or not the output from the core is the value from
dly
clocks ago.
At this point, the tests are complete and all we need to do is close nicely.
You may notice that, in the closing lines of the test
bench,
there’s no possibility for failure. The reason is simply because a failure
to match will cause a failure above in the assert()
statements, and so on
any failure we’ll never reach this point.
That’s it! We’re all done with our test bench.
If you choose to look through the actual test
bench,
you will notice one more capability that we haven’t discussed here: a certain
amount of fuzzing the i_ce
line. Specifically, I ticked the clock once with
i_ce
valid, and then ticked it some (random) number of additional clocks with
i_ce
equal to zero–just to see if it affected the behavior of the core.
(It didn’t)
All of this put together gives us confidence that this delay element works as designed.
Conclusion
We’ve still got lots of other problems and examples to work through, but it’s always fun to pick a simple one to go over that every one can understand.
For now, let’s think about what can be done with a delay element. We’ve already discussed one example above: synchronizing to a packet based upon a preamble. That wasn’t my purpose in building this element today, though. My own purpose is to allow me to measure the Power Spectral Density (PSD) in a waveform input—but we’ll leave that discussion for another day.
But and if that evil servant shall say in his heart, My lord delayeth his coming ... (Matt 24:48)