Handling multiple clocks with Verilator
For some reason, every time I’ve ever worked with video I’ve never managed to be fortunate enough to have the same clock rate for both the pixel clock and the memory. The closest I came was using a 25MHz pixel clock on the Basys3 board which I could create by dividing a 100MHz clock by four in logic. While that probably wasn’t the best way to do it, I did manage to successfully create a 640x480 image on my test display.
When I moved on to the more serious pixel clock of 148.5 MHz in my VideoZip project using the Nexys Video board, I could no longer manipulate my 100MHz system clock in logic to generate a 148.5MHz pixel clock. Xilinx’s DDR3 memory controller insisted on a clock of 100MHz, so I was stuck needing to deal with two dissimilar clocks.
Up until that project, I had never used more than one clock with Verilator. Many of my designs were based upon just a single clock. How was I going to handle multiple clocks? This turned into one of the biggest challenges I had when developing VideoZip. (VideoZip remains a work in progress.)
The pixel clock on the Nexys Video board isn’t the only problem for VideoZip. The Gb Ethernet port (RGMII) wants to run at 125 MHz, reasoning about 8-bits at a time. If this weren’t bad enough, the I2S audio interface wants an outgoing clock rate near 49.152 MHz. While logical and ugly kludges to this problem exist (which I may yet write about), the appropriate way to deal with this is to use a PLL or digital clock manager to generate these dissimilar clocks.
The unfortunate consequence was that I needed a multiple clock simulation capability. Ouch.
The solution I eventually chose crosses multiple project boundaries, but it is worthwhile enough that I’ll share it here. It involves not only modifying my prior Verilator test bench wrapper, but also a test-bench clock helper class. While the updated test bench wrapper can be created manually, I’ll show you in the end how to use AutoFPGA to tie each piece together into your design.
Reasoning about clocks
If you remember
how I use
Verilator,
you’ll remember that I like to wrap a
Verilated
design in a test bench class I call
TESTB.
Among other things, this test bench
class
has an tick()
method that I
can call any time I want the clock within my design to tick once.
In my AutoFPGA enabled projects, this
TESTB
class is created via AutoFPGA. The class also
has some nice capabilities for opening and closing
VCD trace files–but
those are not a part of today’s story.
tick()
works by:
-
Leaving the clock at zero and dumping the design state to a VCD file (if so enabled).
-
Setting the clock to one, and dumping the design state to a VCD file again.
-
Setting the clock back to zero, and dumping the design state again.
This time, though, the VCD trace is flushed to disk.
-
The module is then allowed to read any inputs that may have changed, and adjust any outputs that may need to be changed.
-
Return to step one and repeat until the simulation is done.
This works great for synchronous designs with only one clock. Using this method I can not only test my own design, but also incorporate co-simulation tests: Serial port, I2C, video, you name it, all of that can fit in this context.
The problem is that this tick()
method works great for designs with only
one clock, but it is entirely insufficient when dealing with multiple clocks.
It’s not that Verilator
is somehow insufficient. It’s not.
Verilator can handle multiple clocks
easily–as long as you can properly drive them.
Verilator’s
interface requires the caller to generate inputs at whatever rate they wish
to do so. This was what I needed to do.
My first step was to create a class to describe a clock to my test bench. I call this class TBCLOCK, or “test bench clock”. Its purpose is primarily to help me reason about time, and about one specific clock. To understand the next step, let’s first take a moment to understand this class and its methods. We can then look at how TBCLOCK can help us adjust our TESTB with multi-clock aware information.
TBCLOCK
TBCLOCK
has four basic methods: time_to_edge
, returning the number of picoseconds to
the next clock edge, advance
, which advances the clock by some number of
picoseconds, and rising_edge
which can be used to tell if the clock is
currently on its rising edge. The fourth method, falling_edge
is identical
to rising_edge
, but for falling edge clocks.
Put together, these three methods work like this: the TESTB object queries the TBCLOCK objects to determine the amount of time to skip forward to get to the next clock edge. This looks sort of like Fig 4 below.
TBCLOCK compares the current time to when the next edge will take place, and returns that amount of time in picoseconds. (Why picoseconds? It was an arbitrary decision based upon the reality that nanoseconds wasn’t enough for the application(s) shown above, and femptoseconds were overkill.)
The TESTB
enhanced logic then advances all of the
TBCLOCK
objects to the time of this next edge, adjusts the clock input(s) and calls
Verilator’s
eval()
function to update any logic dependent upon that clock.
When viewed across three separate clocks, the result might look like Fig 5.
You can see the resulting step sizes as events in the bottom trace in Fig 5. As a result, Verilator doesn’t step forward uniformly by the minimum common denominator of all clock steps, but rather in a non-uniform fashion–so that it is only ever called to evaluate logic following a clock edge.
Creating a TBCLOCK is fairly straight forward. Or, rather, it should be. I got it wrong many times over while just trying to get the basics below right. To create an object of this class, just declare one with the number of picoseconds per clock tick.
The initialization routine uses increment_ps
to create an internal stepping
interval m_increment_ps
which is half of the original increment_ps
. This
allows the TBCLOCK
object to reason about both positive and negative edge going clocks.
The next capability the test bench
clock
offers is the ability to return the number of picoseconds until the next clock
tick. This was what Fig 4 was showing above. We’ll use this in the next
section in our inner clock loop. The
next clock edge will come m_increment_ps
picoseconds after the last clock
edge. If you subtract this future time from the current time, you’ll get a
value of how many picoseconds remain until the next clock edge.
Once the clock generator has been queried for the time to the next edge, the
test-bench driver can then determine which clock edge comes next.
From here, each clock can be advanced until that next edge. That’s the purpose
of the advance()
function: given a step size (in ps), advance the global
clock time maintained within this test bench support
clock.
Well, not quite. advance()
has one other purpose. It also returns the value
of the clock, either 1
or 0
, at this new time instant.
In the next section, we’ll use the result of advance()
to set the
clock input value to the main Verilog test
bench function.
There are two other helper functions to determine if the current time is a rising or a falling edge, but that’s the basics of the first part.
The primary work in this class is done within the time_to_edge
method.
We’ll see how this helps in the next section.
Updating the inner testbench class, TESTB
The
TBCLOCK
class we discussed above is only a helper in the scheme of things. Most of
the actual logic takes place within the updated tick()
function found within
the test bench object,
TESTB,
used to drive the Verilator inputs.
As you may recall, I started creating a test bench class wrapper once I noticed that I kept using the same code for every Verilator based test bench. The code to open a trace file was the same. The code to capture data to that trace file was the same. The code to toggle the clock was the same. I found myself copying these pieces of code from one simulation wrapper to another. Rather than just duplicate the same code, I created the test bench wrapper class, TESTB.
One of the primary functions of the test bench wrapper object is to advance the clock. Verilator requires that the clock toggle from low to high in order to call the positive edge logic within your design. The clock needs to then return low, and all of these transitions require calls to the Verilator tracing methods if you want a VCD file when you are done.
I found this cumbersome, so I wrapped all of that logic with a tick()
method.
This is the same tick()
method I discussed above. The tick()
method of
TESTB
would capture inputs to the core in a trace,
toggle the clock high,
capture the results in a trace,
then toggle the clock low
and capture the results in the trace again–this time flushing the trace file. (Flushing is important–I’ve had too many designs fail some C-assertion in their associated logic, and without the flush you may not get the state of your variables at that last clock.
Before moving on, let me foot-stomp here that all three calls to eval()
are essential!. While it may look like the last step and the first step
are identical since they both leave the clock at zero, they are not the same.
Between these two steps,
co-simulation
logic might change inputs to the design. Unless you call eval()
following
any co-simulation updates to
design inputs, combinational logic depending upon these inputs may not
settle. This is a painful bug to search for, so I recommend you learn the
lesson here.
In this single clock paradigm outlined above, I could read any outputs and
adjust any inputs after calling this one tick()
method. I could also call
the C assert
function if something had gone wrong–the flush()
command
above guaranteed that the relevant portion of the trace was in the
file. This approach was simple
enough, and I’ve used this pattern for many of my designs. (You can read more
about it here.)
Sadly, this initial approach didn’t work when dealing with multiple clocks.
Instead, let’s walk through how this tick()
method can be updated to deal
with multiple clocks. In the example below, drawn from the VideoZip
project,
I have four clocks: hdmi_out
. hdmi_in
, net_rx_clk
, and my default clk
.
The first step when calling tick()
is to check the number of picoseconds
till the next clock edge. This is the minimum time to the next edge among
all clocks.
Once we know this amount of time, we’ll call eval()
once out of an abundance
of caution. This makes sure, before any clock edges change, that all of the
combinational logic associated with any potentially changed input wires has
settled.
Once done, each of the various clock objects may be advanced by this amount of time, and our global estimate of the current time can advance as well.
Finally, using these new clock values, we can call Verilator to evaluate our design in this new interval–adjusting any edge triggered logic.
If we are recording a trace at this time, we’ll then call Verilator to dump the current state of the design to a trace file.
Don’t forget to flush it! There’s been more than one time when I’ve checked the outputs of a core after ticking the clock, decided their was a problem and aborted, only to find the relevant signals hadn’t ended up in the trace file.
Finally, we’ll call any external simulation logic depending on clock edges.
In my single clock designs, I do this about mid-way through the low period of
the clock, so you can “see” the transformation. I also did it between calls to
tick()
. This doesn’t work with multiple-clocks, since peripherals are often
defined by the clock the logic is associated with. For this reason,
we’ll have to call separate functions for each clock to allow these
co-simulations
to update. We’ll do this on
the falling edges of their respective clocks. This includes possibly updating
the video simulation, checking for
simulated network packets, and more.
For example, in my spectrogram demo
project, the sim_clk_tick()
function
advances the A/D
simulation
and so updates i_adc_miso
, and the sim_pixclk_tick()
advances the
simulated video on the
screen
using the outgoing pixel, and the various outgoing synch signals.
(Ref)
The conclusion here is that if you want to use this technique, you’ll want to copy the TBCLOCK class (or build your own), and then create a test bench wrapper that references your TBCLOCK objects and gets all the pieces right.
Alternatively, you could use AutoFPGA to handle all of this busy work for you.
Using AutoFPGA to build the testbench
If you are not familiar with AutoFPGA, then in quick sum: it is Verilog-based code generator based upon a copy and paste concept with minimal substitution capability. You specify the code snippets associated with each design component or peripheral in an AutoFPGA configuration file, and then when you call AutoFPGA specifying that configuration file (among many others), AutoFPGA will create your top level (device dependent) design, your main design (device independent) file, and several other bus related files associated with the peripherals you are making or using.
If you are interested in this, consider reading about AutoFPGA’s design goal’s, or the primer on how to connect simple register-based components to a debugging bus using AutoFPGA.
The neat thing about using AutoFPGA for a purpose like this one, is that when you no longer need the extra clock or the logic that uses it, you can just remove the reference to the configuration file describing those components of your design from the AutoFPGA command line. If you want to see how this works, consider examining a project that uses AutoFPGA, and then looking in the AutoFPGA configuration file directory for the Makefile. In there, you’ll find some lines similar to:
This captures, in the $(DATA)
variable a list of configuration files that
are given to AutoFPGA.
Then in the main project Makefile the created code files will be copied to their various parts of the project tree if running AutoFPGA had changed them–but not otherwise. As an example from Zbasic, these Makefile lines would look like:
and a little later, you’ll see the definition of this copyif-changed
function.
Basically, if files $(1)
and $(2)
differ, then $(1)
is copied on top of
$(2)
. This keeps make from rebuilding
things that depend upon files that haven’t changed.
But that’s not my point here and now.
What I want to share right now is how easy it is to teach AutoFPGA about your multiple clocks.
First, you’ll want to define each of your clocks. A clock, in terms of
AutoFPGA, has three components:
a name, the name of the wire that contains this clock, and the frequency of
the clock in Hz. For example, you might have a clock clk
contained in the
wire i_clk
, that runs at 100MHz. You’d then define this as:
This alone is all that is needed to create the clock in the AutoFPGA generated TESTB file.
What about simulating a component requiring this clock?
Let’s consider simulating a video display. You can find a video display
simulator here. Let’s assume your
design has outputs o_vga_vsync
, o_vga_hsync
, o_vga_red
, o_vga_grn
,
and o_vga_blu
–such as this one does. Then, you’d want to declare a VGA
simulator
VGA simulator in your Verilog design component,
You’d then want to initialize this component. Here, we’ll set it up for an 800x600 display mode.
We can then call this co-simulation component on every clock tick, with,
Don’t forget to define the clock! For an 800x600
display mode, you’ll need
a 40MHz
clock.
Ideally, you could just add this updated configuration file to your design
to add this component, or remove it from your design to remove the component.
At this point, this would work for a
Verilator
simulation. If you wanted to go beyond simulation, you’d
need to actually add and configure the PLL in the toplevel design component.
You’d use the TOP.INSERT
AutoFPGA
tag for that purpose.
AutoFPGA
would then copy the contents of that tag into your toplevel.v
design file. No AutoFPGA
doesn’t configure the PLL itself (yet)–you still have to give it the code
for that (with the TOP.INSERT
). Still,
AutoFPGA
will put that code in place for you, making reconfiguration simpler.
Conclusion
Perhaps that seems like a lot of work. It’s not really. We’re primarily talking about 20-40 lines of code in total. It’s just a different way of thinking. The only sad and complicated part is that all of these lines of code take place over many design files. Having AutoFPGA manage this for me has helped to keep all of the changes to support multiple clocks within one or two files only.
In the end, we now have a Verilator, based design that runs using multiple clocks. Not only that, you can generate a VCD file showing all of these various clocks and their respective traces.
While this capability does not (yet) allow the generation of multiple clocks with a known phase relationship, such as one might use with an ISERDES or an OSERDES, upgrading the tools to do so would be fairly trivial. I’m sure I’ll get around to that when I have a need for it.
Perhaps some of you are wondering to yourselves, “Verilog offers a capability to generate multiple clocks already. Why aren’t you using Verilog’s test bench capability to do this?”
My answer to that is simple: I know how to interface a C++ module with my computer’s Windowing system using GTKMM. I don’t know the Verilog system call to do that.
What can you use this for? I’ve already mentioned video, Ethernet, and audio applications. There’s no reason why you can’t use this for custom applications as well. For example, I’m still looking forward to completing the differential pmod challenge … but that’s really another topic for another day.
And Jesus answered and said unto them, Elias truly shall first come, and restore all things. (Matt 17:11)