I’d like to discuss a very profound simulation capability, one that I have found to be amazingly useful when building and understanding my own software. In particular, I’d like to discuss Verilator. No, not just Verilator. I’d like to discuss how Verilator can be incorporated into your designs to provide you with a simulation tool that, to my knowledge, is not matched elsewhere.

If you are a Verilog programmer and you are not familiar with Verilator, then let this be my opportunity to call this wonderful tool to your attention.

If you are a VHDL programmer, you may be disappointed to learn that Verilator only works with Verilog, not with VHDL. Perhaps I can use this opportunity to show you some of what you are missing.

According to the Verilator manual, Verilator is used to “Convert Verilog code to C++/System C”. For the purpose of this article, we’ll consider only the Verilog to C++ converter part. Further, we’re going to discuss what can be done with such a capability, and why it is so amazingly valuable.

Basic Verilator Testbench

If you’ve never used Verilator before, and you wish to try it out, the first step is the same as with any other program you’d like to try on your computer. You’ll need to download download Verilator and install it. For me, this was as simple as “sudo apt-get install verilator”, but other approaches exist as well.

Once you have it installed, let’s walk through how you might use it. Specifically, we’ll walk through an example of what it takes to set your project up to use it.

This first step discussion for how to use Verilator will follow closely with the quick example code found in the Verilator Manual. This will show us what we need to know in order to use Verilator with any project.

You will start by running the Verilator command line program on your top level Verilog file. For the purposes of this discussion, let’s assume that top level file is a very generic “module.v”.

verilator -Wall -cc module.v

Assuming your design has no syntax errors, this will create a directory, obj_dir, and a C++ class definition for Vmodule in the files obj_dir/Vmodule.h and obj_dir/Vmodule.cpp. Another file in that directory, obj_dir/Vmodule.mk, can be used to “make” these files into the library that you will need to link your C++ driver program to.

cd obj_dir; make -f Vmodule.mk ; cd ..

We’re going to spend most of our time in this post looking at how to drive this C++ code to simulate your design.

If you look through the Verilator manual, you’ll find an example test bench driver that looks similar to the following:

#include <stdlib.h>
#include "Vmodule.h"
#include "verilated.h"

int main(int argc, char **argv) {
	// Initialize Verilators variables
	Verilated::commandArgs(argc, argv);

	// Create an instance of our module under test
	Vmodule *tb = new Vmodule;

	// Tick the clock until we are done
	while(!Verilated::gotFinish()) {
		tb->i_clk = 1;
		tb->eval();
		tb->i_clk = 0;
		tb->eval();
	} exit(EXIT_SUCCESS);
}

We’ll walk through this example from the top.

The main() program that uses Verilator must initialize Verilator with any arguments (argc and argv), and then it needs to create a new object of the class Vmodule–where module.v is the name of the design we applied the Verilator program to above.

Before we get to that next step, you need to know that the Vmodule class that Verilator created for you exposes all of your inputs and outputs as variables. Hence you can set the i_clk input to your module. Changing this value from a 0 to a 1 will also cause any @(posedge i_clk) logic to trip as well.

Hence, our job when simulating is just to run a tight loop where we set the clock, evaluate everything, set the clock again, and evaluate everything again. The loop stops when Verilator comes across a $finish statement within the Verilog code, or when you type a Ctrl-C. (I hardly ever use the $finish statement, so Ctrl-C is one of the main ways I terminate a simulation.)

This sounds simple enough, but let’s see if we can’t simplify it a touch more.

I’d like to create, from this outline, a test bench module that incorporates all of this information. We’ll call it TESTBENCH, and you can see an example of how I’ve used a very similar capability within the ZipCPU project here. Our TESTBENCH class will provide two functions: reset() and tick(), and we want to be able to check any time Verilator’s code has encountered a $finish statement. To do this, we can wrap the above code in a class, as in:

template<class MODULE>	class TESTBENCH {
	unsigned long	m_tickcount;
	MODULE	*m_core;

	TESTBENCH(void) {
		m_core = new MODULE;
		m_tickcount = 0l;
	}

	virtual ~TESTBENCH(void) {
		delete m_core;
		m_core = NULL;
	}

	virtual void	reset(void) {
		m_core->i_reset = 1;
		// Make sure any inheritance gets applied
		this->tick();
		m_core->i_reset = 0;
	}

	virtual void	tick(void) {
		// Increment our own internal time reference
		m_tickcount++;

		// Make sure any combinatorial logic depending upon
		// inputs that may have changed before we called tick()
		// has settled before the rising edge of the clock.
		m_core->i_clk = 0;
		m_core->eval();

		// Toggle the clock

		// Rising edge
		m_core->i_clk = 1;
		m_core->eval();

		// Falling edge
		m_core->i_clk = 0;
		m_core->eval();
	}

	virtual bool	done(void) { return (Verilated::gotFinish()); }
}

The main program likewise needs to change, but it primarily just gets simplified:

#include "testbench.h"

int main(int argc, char **argv) {
	Verilated::commandArgs(argc, argv);
	TESTBENCH<Vmodule> *tb = new TESTBENCH<Vmodule>();

	while(!tb->done()) {
		tb->tick();
	} exit(EXIT_SUCCESS);
}

At this point, we’ve just reworded things. We haven’t really created any new features. We have, however, insisted that any module.v using this test harness must have inputs i_clk and i_reset.

One more step is required before we can use this program. We need to compile our program, and we need to point the compiler to any include files that were created in our obj_dir directory, as well as adding some Verilator specific components:

g++ -I obj_dir -I/usr/share/verilator/include module.cpp /usr/share/verilator/include/verilated.cpp -o module

Your actual Verilator components may not be in the /usr/share/verilator directory, depending upon how your distribution installed Verilator. You can find out the correct directory by looking for $VERILATOR_ROOT in the output of

verilator -V

At this point we have a program, module, that we can run that will simulate our Verilog module.

You can see an example of one such test bench I use here. If you look a bit harder, you may even find several versions of this module wandering around other projects found in ZipCPU’s github.

Ok, so now we can simulate our design–much like any other simulator. You might argue at this point that Verilator is less than the other simulation design programs. While it can simulate our design, it’s not as simple to set up.

Don’t worry, we’re not done yet.

Let’s go a step further. Let’s start using Verilator to perform some simple magic for us.

Debugging Verilog via Printf

The first thing we are going to do with Verilator is to describe how you can print out values from within your simulation. This will turn Verilog simulation debugging into something very akin to software debugging.

When I debug software, I tend to use two approaches: printf and gdb. By using the two of these, I tend to stay of of the realm of “Voodoo computing” (my personal term for the software equivalent of FPGA Hell–when you have no idea what is going wrong with your software). If a piece of software I am working with creates a core dump, my first approach to finding the bug is usually to use gdb. In all other cases, such as when I am just trying to understand what is going on, I’ll use printf and I’ll just dump any piece of relevant data to the standard output stream. As a last resort, I’ll pull up ddd and step through my code and any library code I’m using.

One of the big problems with debugging FPGA’s, though, is that there are no printf capabilities within FPGA’s. Hence, software engineers trying to program FPGA’s often find themselves stuck when things don’t work. Their favorite tool is missing.

While the official Verilog answer to this is the $monitor command within Verilog, I’d like to demonstrate a different approach using Verilator.

The first thing we are going to do to create this debug by printf capability is to subclass our TESTBENCH class. We’ll create a new class, called MODULE_TB, that inherits from the TESTBENCH. This will allow us to maintain a generic TESTBENCH across many projects, but yet still add functionality to it that is specific to the module we are testing.

The next thing we are going to do is to override the method we just created above for requesting a clock tick. We’re going to replace that method with another method that first calls the original tick method, and then outputs whatever logic we’d like to know about.

class	MODULE_TB : public TESTBENCH<Vmodule> {

	virtual void	tick(void) {
		// Request that the testbench toggle the clock within
		// Verilator
		TESTBENCH<Vmodule>::tick();

		// Now we'll debug by printf's and examine the
		// internals of m_core
		printf("%8ld: %s %s ...\n", m_tickcount,
			(m_core->v__DOT__wb_cyc)?"CYC":"   ",
			(m_core->v__DOT__wb_stb)?"STB":"   ",
			... );
	}
}

As a result, we just added a printf capability to Verilog via Verilator.

Did you notice the values “v__DOT__wb_cyc” and “v__DOT__wb_stb”? These are local registers within the Vmodule construct that Verilator. created for us. They are being used to hold the values of the wb_cyc and wb_stb registers used to run an internal wishbone bus. Indeed, all of your registers should be visible to you from the m_core structure we have created. While the naming tends to be roughly consistent, if you struggle finding a variable you would like to print then just look through the obj_dir/Vmodule.h file. If you are still struggling, take a look at how a known variable in obj_dir/Vmodule.cpp is being set, and you should be able to figure out the internal variable name.

I should also point out: Verilator’s variable naming convention has changed over time. While my version of Verilator works with a “v__DOT__” preceding any top level register name, and the “__DOT__” substrings indicate hierarchical transitions between components, you may have to look within your own module to find how your variables are named. It’s for this reason that, in any new designs, I start the design with a series of #define’s indicating the Verilator to C++ mapping, so I know how to get access to any variables of interest.

As an example, this file begins with a set of #define’s that I use to simplify access to various ZipCPU internal variable names.

Remember, this is for debugging–so a little cheating is allowed.

We’re not done yet, though.

That printf capability can print an enormous amount of information to the screen–Gigabytes even! It can print information that we are interested in, but it can also print so much irrelevant information that it can be a struggle to find the relevant information we are interested in.

For this reason, I often add a boolean “writeout” variable prior to any printf’s, which I then use to gate whether or not I printf any values. I can then use this value to look for whatever conditions I find useful for debugging that day. For example, the code below will write out the relevant portions of any internal wishbone transaction.

class	MODULE_TB : public TESTBENCH<Vmodule> {

	virtual void	tick(void) {
		// Request that the testbench toggle the clock within
		// Verilator
		TESTBENCH<Vmodule>::tick();

		bool	writeout = false;
		// Check for debugging conditions
		//
		// For example:
		//
		//   1. We might be interested any time a wishbone master
		//	command is accepted
		//
		if ((m_core->v__DOT__wb_stb)&&(!m_core->v__DOT__wb_stall))
			writeout = true;
		//
		//   2. as well as when the slave finally responds
		//
		if (m_core->v__DOT__wb_ack)
			writeout = true;

		if (writeout) {
			// Now we'll debug by printf's and examine the
			// internals of m_core
			printf("%8ld: %s %s ...\n", m_tickcount,
				(m_core->v__DOT__wb_cyc)?"CYC":"   ",
				(m_core->v__DOT__wb_stb)?"STB":"   ",
				... );
		}
	}
}

Others who have looked at this approach consider it very “textual”. It is.

If you find “textual” to be old fashioned, then please consider that I maintain a very old fashioned hammer in my garage for when I need such a tool. Like my hammer, this old fashioned approach to debugging is still very valuable. When I was debugging the Hello World program in the OpenArty project, as part of both the first time the ZipCPU supported the C library as well as the first time it supported 8-bit bytes, this was the approach I used when things weren’t working. When I traced the problem down to a value in memory that had the wrong value within it, I was then able to look backwards through the massive textual output to find the exact memory operation that had set the value erroneously. I was then able to quickly find the broken logic and fix the bug.

This debug by printf approach should satisfy most of the software programmers out there who are used to finding bugs in this manner.

Many hardware designers are going to look for something more. Hence, let’s look at how to use a graphical waveform viewer such as GTKWave.

Debugging Verilog via GTKWave

If you’ve never used a waveform viewer, such as GTKWave before, then you are in for a treat. Such graphical viewers are really an essential part of any HDL designers toolkit. Using GTKWave, you can view and inspect every variable within your design on every clock.

Let’s rebuild our module so that it creates a VCD output file that can then be read by GTKWave. This will allow us to examine our design using it.

The first change we are going to need to make is to change how we call Verilator. Specifically, we’ll add the “–trace” option to the command line.

verilator -Wall --trace -cc module.v

This will turn our module into a C++ class, found within obj_dir/Vmodule.h and .cpp as before, but this time it can support tracing. We can turn this C++ class definition into a library in the same fashion as before.

The next step in this process is going to be transforming the TESTBENCH wrapper of our Verilated module. If you want, you can follow along on how to do this from the Verilator FAQ, although I think you’ll find we go into a touch more detail here.

#include <verilated_vcd_c.h>

template<class MODULE> class TESTBENCH {
	// Need to add a new class variable
	VerilatedVcdC	*m_trace;
	...

	TESTBENCH(void) {
		// According to the Verilator spec, you *must* call
		// traceEverOn before calling any of the tracing functions
		// within Verilator.
		Verilated::traceEverOn(true);
		... // Everything else can stay like it was before
	}

	// Open/create a trace file
	virtual	void	opentrace(const char *vcdname) {
		if (!m_trace) {
			m_trace = new VerilatedVcdC;
			m_core->trace(m_trace, 99);
			m_trace->open(vcdname);
		}
	}

	// Close a trace file
	virtual void	close(void) {
		if (m_trace) {
			m_trace->close();
			m_trace = NULL;
		}
	}

	virtual void	tick(void) {
		// Make sure the tickcount is greater than zero before
		// we do this
		m_tickcount++;

		// Allow any combinatorial logic to settle before we tick
		// the clock.  This becomes necessary in the case where
		// we may have modified or adjusted the inputs prior to
		// coming into here, since we need all combinatorial logic
		// to be settled before we call for a clock tick.
		//
		m_core->i_clk = 0;
		m_core->eval();

		//
		// Here's the new item:
		//
		//	Dump values to our trace file
		//
		if(trace) m_trace->dump(10*m_tickcount-2);

		// Repeat for the positive edge of the clock
		m_core->i_clk = 1;
		m_core->eval();
		if(trace) m_trace->dump(10*m_tickcount);

		// Now the negative edge
		m_core->i_clk = 0;
		m_core->eval();
		if (m_trace) {
			// This portion, though, is a touch different.
			// After dumping our values as they exist on the
			// negative clock edge ...
			m_trace->dump(10*m_tickcount+5);
			//
			// We'll also need to make sure we flush any I/O to
			// the trace file, so that we can use the assert()
			// function between now and the next tick if we want to.
			m_trace->flush();
		}
	}
}

Further, within our main() function, we’ll add a line,

tb->opentrace("trace.vcd");

any time we want to create a VCD output file.

Building this new program is a touch more difficult than it was before. You are going to need to include the verilated_vcd_c.cpp file, but in all other cases the build is the same.

g++ -I obj_dir -I/usr/share/verilator/include module.cpp /usr/share/verilator/include/verilated.cpp /usr/share/verilator/include/verilated_vcd_c.cpp -o module

After you run your “module” file, you should now find a “trace.vcd” file in your current directory. Running “gtkwave trace.vcd” should give you a chance to view all of the variables within your trace. In other words, after you finish a simulation, you can now view all of your internal register values, at every simulation clock tick.

Simulating Peripherals with Verilator

At this point, we haven’t really done anything any other simulator cannot do. We’ve simulated our program, and we’ve created both a printf based dump of the program as well as trace.

Where Verilator really shines, though, is in the ability to add simulated peripherals to the design. Let’s consider three peripherals as part of an example:

  1. A UART simulator that allows you to connect the UART to either a file, or a TCP/IP stream.

    The importance of this type of simulator cannot be overstated, as it allows you to create a debugging interface to your board–an interface that you can also access when running your simulator.

    Many students have struggled to get their first UART up and running. Having a UART simulator, which would check their outputs and verify it’s validity while acting like a true UART would be valuable for them.

  2. A QSPI Flash simulator. I’ve used this simulator many, many times to not only verify that the core I am testing properly interacts with the on-board flash, but also to know and understand how that flash interaction impacts the rest of my design.

    For example, using the QSPI Flash simulator together with the UART simulator, I can start the ZipCPU from a hard reset and run all the way through Hello World without leaving the simulation.

    The other day, I had the chance to then compare how long it takes to do this when using a DMA based bootloader approach, given that the simulations are cycle accurate. (The DMA approach is almost 2x as fast–but feel free to try it yourself to find out.)

  3. A VGA simulator I used to process the VGA signal on my Basys3 board. Using this simulator, I was not only able to verify the functionality of my VGA outputs, but I was also able to display the pixels that would be displayed on any VGA screen within an X-window on my own development workstation.

    Who knows? I may have the opportunity to present this simulator as part of this blog.

    Compare this simulator, though, to an alternative that works with your more commercial simulation products. Unlike that approach, if you had a simulator integrated into Verilator, you could then interact with your VGA display in real simulation time–even if it is a slower (realtime) than the actual hardware would’ve provided.

To incorporate these changes into our module’s simulation, we only need to make some minor adjustments to our overloaded tick() method.

// Include the files defining our various simulation components
#include <uartsim.h>
#include <qspiflashsim.h>
#include <vgasim.h>

class MODULE_TB : public TESTBENCH<Vmodule> {
	// Add the simulation components to the variables used by our module's
	// test bench
	UARTSIM		*m_uart;
	QSPIFLASHSIM	*m_flash;
	VGASIM		*m_vga;

	MODULE_TB(void) {
		// Initialize as before
		...
		// Then make sure we create our simulation components
		m_uart = new UARTSIM();
		m_qspi = new QSPIFLASHSIM();
		m_vga  = new VGASIM();
	}

	virtual void	tick(void) {
		// Simulation instructions
		//
		// Prior to actually ticking the clock, call any simulation
		// functions

		// Simulate a UART
		m_core->i_uart = m_uart(m_core->o_uart);

		// Simulate a QSPI Flash
		m_core->i_qspi_data = qspi_flash_simulator(m_core->o_qspi_clk
			m_core->o_qspi_cs_n, m_core->o_qspi_data,
			m_core->o_qspi_output_mode);

		// Simulate a VGA
		m_vga(m_core->o_vga_vsync,
			m_core->o_vga_hsync,
			m_core->o_vga_red,
			m_core->o_vga_grn,
			m_core->o_vga_blue);

		//
		// Now that we have our simulation inputs/outputs handled,
		// let's actually toggle our clock
		//
		TESTBENCH<Vmodule>::tick();

		// This is the code we had before
		bool	writeout;
		...
	}
}

Here, you can see how our overloaded tick() method first called the update methods of any simulators we had introduced. This guarantees that such simulators can handle data on a clock by clock basis, verifying the outputs of your design, and providing whatever capability is necessary on every clock tick. Once any simulation inputs or outputs have been handled, we then call the test bench’s tick method, so as to handle toggling the clock, keeping track of simulation time, and writing anything to the trace file.

Let’s pause for a moment at this point and let what we’ve just done sink in.

Perhaps some examples will help.

  • I’ve used this approach to debug my QSPIFLASH controller. Yes, I did this using the debug by printf method to find problems. I also placed assert statements within the QSPI flash decoding logic of the simulator. This would cause the QSPI flash controller to halt the simulation any time it detected an error in the protocol. For example, I could detect whether or not I waited enough clocks on startup before issuing the first command.

  • I’ve used this approach to debug the SDRAM controller that I built for the XuLA2-LX25 board. Because the SDRAM simulation I built was cycle accurate, I could then run performance and bench testing programs on the board later, and know just how well the board would perform.

  • The VGA simulator on my Basys3 board might be worth discussing for a moment.

    Because the Basys3 didn’t have enough RAM for a frame buffer, I had to place the video data into the QSPI flash memory. To make matters worse, I needed to compress the flash (run length encoding) to get the data so that I could load it fast enough to support the VGA frame rate I was working with.

    When I tried the algorithm out on the real hardware, the initial result was quite jumbled–as one might expect from the first time you try to build and test any decompression algorithm. However, by simulating the video hardware, I could find and pinpoint where the run-length decoding went bad, and therefore fix my algorithm.

    Further, because I was using a cycle accurate QSPI flash simulator (from above), I could make certain that I was meeting the timing I needed to meet to make this VGA run without data loss.

    I expect to need to build an HDMI simulator in the near future as well–both for input as well as output.

  • I also used this approach to debug an SD-card controller that controls an SD card via a SPI interface.

    Given the opportunity, I’ll likely use the same method to build and debug a similar controller using the SDIO interface instead of the SPI interface in the future.

  • Other things I’ve simulated include a PS/2 mouse connected to my Basys3 board, the ethernet controller on my Arty board, the GPS PPS signal from a PMod GPS, and even a PMod OLEDrgb board.

Here’s another neat fact in all of this: because these simulations are built from independent source code modules, we can link them together to create intregrated simulations as well. As an example, I can now simulate the ZipCPU in an environment where the CPU has access not not only a UART, but also an SD-Card.

Only one clock

While this approach is very powerful, what I’ve demonstrated above only demonstrates applying Verilator to a design with a single clock.

I’m currently working on a design of a Video driver that will read HDMI signals at one rate, write HDMI signals back out at (perhaps) another rate, interact with the CPU at a fourth rate, and interact with DDR3 SDRAM memory at a fifth rate. While AutoFPGA can currently build a test harness for such a simulation, the work to handle all of these clocks and to display the resulting video is far from complete. This may even include Verilator support for having one of those clocks have an adjustable period–but we’ll see what is required for actually making all of that work.

In other words, should you find the amazing Verilator capabilities outlined above valuable when using a single clock, then please consider yourself invited to contact the author of Verilator and support his work.

If you would like to help fund my own work of creating a test harness that will support multiple clocks, and even HDMI video, please feel free to support this blog on Patreon.

Other Benefits

Before I leave off discussing the value of Verilator to discuss its limitations, let me add one more item to my list of its benefits: It’s a lot faster to compile a Verilog module using Verilator than it is to compile and build it using a more traditional synthesis tool and design flow.

What I mean by this is, if I start a Vivado design build and a Verilator design build at the same time, then I will have found any synthesis bugs with Verilator long before Vivado completes.

Further, Verilator finds many bugs that Vivado either does not find, or that it buries in a long list of useless warnings.

Put together, these facts provide me with the motivation I need to build my projects within Verilator before trying to build it with a commercial synthesis tool. Finding and fixing bugs is just so much faster and easier with Verilator.

On the other hand, Verilator tends to run my 100MHz clock at about 300kHz (without optimization), so there does come a transition point where using the real hardware becomes faster/better/cheaper than using Verilator’s simulation capability.

Verilator’s Limitations

While I have very much grown to love using the Verilator approach I just outlined above, there are a couple of limitations I’ve come across when using it:

  1. You can only design components for which you have all of the Verilog logic in hand.

    For example, while there exist proprietary vendor supplied FFT’s, you will struggle to use them within Verilator.

    This will force you to either abandon Verilator, to look for an open source alternative to their FFTs, or to simulate every portion of your design except for the proprietary component.

    As a second example, while both Xilinx and Altera will provide you with a soft core CPU and toolchain, you will struggle to test such a CPU within Verilator.

    On the other hand, you could still verify your design using Verilator if you were using a ZipCPU. Gosh, even the ZipCPU debugger can run within verilator, using the same serial port interface it would have were it running on an FPGA. Hence, as with the FFT, open source alternative’s exist. (I’ve also managed to run the OpenRISC CPU within Verilator as well, so the ZipCPU is by no means unique–although I did need to add a UART simulation to the OpenRISC CPU to do it.)

  2. Verilator only works with Verilog, not VHDL. Perhaps you would like to write or contribute to a VHDL extension to Verilator? That’s the beauty of open source, right?

  3. Verilator has only limited support for tri-state busses, and absolutely no support for vendor specific things such as clock modules or SERDES capabilities.

    I get around this by placing all of my logic within a vendor-independent module. This is my “main” module beneath the top level file containing any vendor dependent capabilities.

  4. Verilator has no support for handling ‘x’ (unknown) values or ‘z’ (high impedence) values. Neither does Verilator recognize or comment on any potential clock transition conflicts, or metastability issues.

  5. Synthesizing a project with Verilator will provide you no feedback regarding whether or not your design will fit within a particular device, neither whether or not your design would meet the timing requirements of a particular device. You still need a fullblown FPGA tool chain to find out these things.

    While I recommend Verilator as an approach for young engineers just learning to program in Verilog for the first time, this is the biggest limitation. The reality is that, at some time, you will need to convert your design into a physical reality, and at that time you’re going to want to know if it will work on your target device.

Still, with all its limitations, Verilator has been a fundamental component of every one of my designs.

My Personal Experience

If you look across the various projects I’ve posted on GitHub, you may notice that almost all of them use Verilator for simulation. Even the 4x4x4 Tic-tac-toe program that I posted to test the C-library on my devices can run on the ZipCPU using Verilator. Perhaps the best recommendation I can give for Verilator, therefore, is simply the facts that 1) these projects work, 2) some of them are both quite capable and complex, and 3) they have all been debugged with Verilator.

How about your experience? I’d love to hear some of your comments below.