Technology Debt and AutoFPGA, the bill just came due

I’m currently working on a fun SONAR project where I need a data collector. The project involves transmitting SONAR data through the thick hull of a deeply submerged underwater object, without drilling holes in the hull to do it.

Understanding the transmission path through the hull will be a challenge, so it becomes important to store the incoming signal to memory, download it to Octave, and study it there before building the downstream processing logic.

Fig 1. Sonar Signal Processing Chain

My plan is to collect this information at high speed (800Msps), to dump it to memory, and then to an SD-Card.

If you’ve used Xilinx cores before, you may remember that they offer an AXI Stream to Memory Mapped DataMover core to handle this sort of data to memory processing, but I’ve always liked the Wishbone bus and Verilator–the fastest simulator on the market, and one that’s easy to integrate an SD-Card simulator into. Creating a similar Wishbone core took me only a couple hours one morning to both build and verify. Having the formal properties for the Wishbone bus on hand definitely helped.

That was the easy part. Indeed, I’d like blog about this new core soon as well (once I decide where to put it).

The harder part was integrating this new core with AutoFPGA.

The problem is simple and basic: AutoFPGA. as currently designed can handle the logic necessary to connect a single Wishbone master to all of the slaves within a design.

This approach is light on logic, as desired. The necessary interconnect logic is cheap and easy to build.

Fig 2. ZipCPU and AutoFPGA bus structure

For the ZipCPU, this logic represents a bit of a speed bump. Internally, the ZipCPU has two memory ports, one for instructions and one for data, and after generating them I arbitrate them together into a single bus interface. While this slows down the CPU, I’ve accepted the consequences of this to date because it helps simplify the rest of the design. Sadly, the ZipCPU gets slowed down again when its bus control signals have to be arbitrated against the DMA peripheral, and then again when they are arbitrated against the debugging bus.

All of this costs time and capability. Indeed, in order to meet timing each of the bus arbiters has required delaying bus accesses by a cycle.

I’ve dreamed of rewriting the CPU so that it has two independent bus ports, removing the DMA from the CPU to the main design, and then allowing all four bus masters (CPU instructions, CPU data, DMA and debugging interface) to interact with the bus at the same time through a crossbar. AutoFPGA can’t handle this (yet). We’ll discuss this more in a moment.

Fig 3. A better bus design, this time using a crossbar interconnect

This also has consequences for anyone who would like to use AutoFPGA. For example, I’d like it to be able to handle interconnecting AXI, AXI-lite, and even Wishbone classic signals. However, as built today, AutoFPGA can only ever create the logic for Wishbone pipeline signaling.

To solve this, I’ve recently created a series of crossbar bus arbiters–AXI, AXI-lite, and Wishbone (pipeline), with the goal and intent that AutoFPGA should just be able to reference such an arbiter and not need to know much more about the bus.

… and now I want to add a new bus master, a stream to Wishbone bus master.

It’s not quite that simple either. A second SONAR project I’m working on will require a transmit controller that will want to read instructions from the bus. Indeed, this is why I like and use AutoFPGA. It allows me to easily and rapidly reconfigure a master base design, such as this one for the Nexys Video board, from one configuration and application to another.

Fig 4. Two different AutoFPGA Configurations, built from the same base design

While the new SONAR transmit controller component is not a CPU, I found the idea of re-using my instruction fetch code just too tempting. Indeed, should the Lord be willing, I’m hoping to discuss how to build something like this in my (to be written) intermediate tutorial, with a music box as an application, but we’ll have to come back to that on another day.

What I’d like to do today is to use the same basic FPGA design for both applications, as shown in Fig. 4 above: the stream to memory controller, as well as the scripted SONAR transmit controller. Ideally, I’d just make a small change or two and the design would suddenly go from working on one project to working on a second project.

Again, that’s the purpose of AutoFPGA in the first place.

Sadly, this leaves me with a choice: I can either upgrade my interconnect logic generator within AutoFPGA to handle multiple bus masters, or I can slow the bus down (again) by manually adding in one more arbiter to transform the problem back to a known solution–the single bus master.

This time (yes, there were others), I chose to update AutoFPGA.

Updates in Progress

The updates to AutoFPGA are still a work in progress, or I’d share them on github. (No one really wants to try to build code that will just segfault, and that’s where I spent most of my day yesterday.) Here’s some of what’s coming, though:

First and foremost, my immediate goal is to create multiple bus master support, through a crossbar interconnect, so that adding (or removing) a bus master is as easy as adjusting the line in the Makefile identifying which masters are to be included in the design.

Fig 5. A Crossbar can support multiple masters

One of the advantages of using AutoFPGA over a proprietary solution like Vivado or Quartus is that all of the project files are user-supplied text files, and so they can easily be examined and fixed (if necessary). Even better, you’ll never need to “rebuild” your project from the ground up after updating your vendor tool set, although you might need to make some adjustments when updating AutoFPGA. I’ll discuss why below.

The crossbar interconnect logic is not currently a part of AutoFPGA. This creates both opportunities and problems.

The opportunity: You can easily replace my arbiter logic with yours by just matching the interface and then swapping the arbiter logic.

The problem comes from licensing. While AutoFPGA is licensed under GPL, it asserts no license over the code it creates. I treat it sort of like a GCC: the code GCC produces remains under the license it started with. I’ve done this to try to make AutoFPGA usable by all in any context.

If I want to keep AutoFPGA usable in this new context, I may need to release any bus-logic sub-cores under a very permissive license. Given the amount of time that went into creating them, I am reluctant to do so, but needs may require this.
As mentioned above, this new AutoFPGA upgrade has multiple bus protocol support. This currently includes AXI-lite support as well as Wishbone support. Even better, the AXI-lite will be high speed straight from the arbiter arbiter–rather than crippled like Xilinx’s support was.

My eventual goal will be to automatically insert crossbars and bus protocol (and clock) bridges as needed by the design. While other tools already exist to do this, not all of them are open–making it difficult to use the fastest simulator on the market. My current goal is just to be able to handle different bus protocols–bus bridges and clock crossings can be added manually for the time being.
I’ve also been burned by the previous AutoFPGA approach to integrating bus components into a design. Specifically, in order to integrate a bus component before, AutoFPGA would create wires based upon the bus name (not type): wb_cyc, wb_stb, wb_we and so on. To connect a slave, you’d need to create a @MAIN.INSERT tag to outline code that would be simply inserted into the main project design. This code would then pass wb_cyc, and wb_we directly to the peripheral design, and the design would return something like flash_ack, flash_stall, and flash_data (assuming it was a flash controller. Further, rather than passing wb_stb to the slave, the design would decode addresses to determine which slave was being addressed, and so you’d then pass wb_stb & flash_sel. That is, you’d pass the Wishbone strobe (i.e. transaction request) ANDed with the slave selection drawn from the bus address.

While I like this design approach in general, since it allows you to connect to the bus any way you want, there’s been more than one time I’ve connected a formally verified core to the bus and gotten this logic wrong.

As an example, I once got careless and just passed wb_stb directly to the core rather than wb_stb & slave_sel. The result wasn’t pretty, but caused multiple returns from the bus (I wasn’t filtering returns based upon the active slave) and so crashed Intel’s AXI interface (it had gone through an AXI to Avalon bridge …).

Worse, when you try to integrate with something like AXI, you end up needing to mention and connect every single I/O wire just to get it right.

The easy answer would be to define a tag in AutoFPGA, we’ll call it @SLAVE.PORTLIST, which would contain a string containing all the logic (i.e. a list of I/O ports) that can be used to connect your slave to the bus. A similar tag, @SLAVE.ANSIPORTLIST, would reference a string containing all the logic necessary to connect a peripheral to a bus using ANSI notation. Similar tags, @MASTER.PORTLIST and @MASTER.ANSIPORTLIST, would define this logic for bus masters. (There are already other @*PORTLIST tags used to define external I/O connections, the SLAVE and MASTER prefix would designate these port lists specific to the bus at hand.)

Creating these strings will simplify my design efforts, and help to standardize things as well.

One unintended consequence of this adjusting a core from one interface type to another would only require adjusting the @SLAVE.BUS.TYPE tag–of course, you’ll still need to adjust the core.

Peripheral classes

To keep the logic light, I’ve defined two subsets of the Wishbone protocol, subsets I call SINGLE and DOUBLE. I created them when I noticed that I had a lot of peripherals with nearly the same logic, and it just made sense to aggregate the control logic together across peripherals.

Neither of these two sub-classes, SINGLE nor DOUBLE, are allowed to stall the bus, neither are they allowed to return bus errors. SINGLE peripherals create their acknowledgments on the same clock cycle they are accessed, and DOUBLE peripherals create their acknowledgments one cycle later.

– A SINGLE peripheral is one that contains a single register only, and it’s useful for your basic control register.

– A DOUBLE peripheral is one that contains multiple registers. It uses one clock in a case statement to select among multiple values to return.

I’ve found these bus classes to be quite common across many design elements. Because their logic is simple, I’ve had no problem creating designs with 30+ peripherals and then adding or removing those peripherals via AutoFPGA as projects have required.

Fig 6. Special slave classes: SINGLE and DOUBLE

Here’s the good news: When I started creating the AXI-lite support, I realized that I needed to continue to support these two subclasses. The need for them wasn’t specific to Wishbone peripherals. Therefore, the AXI-lite support will support these two subclasses. To give you an idea how these might work, here are the assumptions required for these simplified peripherals.

– SINGLE requires that C_S_AXI_ADDR_WIDTH == 0, or a slave having one address only. That allows the address lines to be dropped. The DOUBLE class will allow a peripheral to support multiple addresses, and so different address widths as well.

– Write interface
1. The slave must guarantee that AWREADY = WREADY = 1.
  
  This will allow the interconnect to ignore these inputs.
2. The slave must also guarantee that BVALID == $past(AWVALID) for SINGLE peripherals, and that BVALID == $past(AWVALID,2) for DOUBLE peripherals.
  
  This will allow the interconnect to automatically generate a common BVALID for all of the peripherals in the set without needing the logic to check every peripheral for this condition individually.
3. The controller (i.e interconnect) will guarantee that AWVALID == WVALID.
  
  This means that you can connect AWVALID to WVALID when connecting your core, and also that you don’t need to handle synchronizing these two channels together within your core.
4. The controller will also guarantee that BREADY == 1.
  
  That is also required for the interconnect to ignore BVALID.
– Read interface

These rules pretty much follow the write interface above.
1. The slave must guarantee that ARREADY == 1
2. The slave must also guarantee that RVALID == $past(ARVALID) for SINGLE peripherals, or equivalently that RVALID == $past(ARVALID,2) for DOUBLE peripherals.
3. The controller will guarantee that RREADY == 1.
Together, these assumptions will greatly simplify creating AXI-lite slaves. The control logic to support this is pretty easy to build and verify as well, so it’s likely I’ll do something similar when I get to building the full AXI support.

I’m sure we’ll discuss more about this on the blog as time goes along.

All of these are wonderful, great, and (insert your favorite superlative here) ideas.

There’s one problem I discovered when integrating these changes into my design: I had used the knowledge of how the interconnect worked when building some of my cores. This is now causing these otherwise “working” cores to break.

Technology Debt: The bill comes due

Wikipedia (today) defines technology debt as,

Technical debt is a concept in software development that reflects the implied cost of additional rework caused by choosing an easy or limited solution now instead of using a better approach that would take longer.

Two of my cores that are now suffering from this debt are my RMII/RGMII network cores, and my wonderful new “Universal” QSPI flash controller.

Fig 7. Abusing the bus protocol

Both of these cores have both a memory interface and a control interface, as shown in Fig. 7 on the left. For the network core, the memory interface is to either the to-be-transmitted or the already-received packet memory contained within the core. For the flash controller, the interface is between the flash memory and the control register.

In both cases, I abused the bus protocol knowing how the interconnect would handle things.

You can see how this affects the port list for the flash controller below,

module	qflexpress(i_clk, i_reset,
		i_wb_cyc, i_wb_stb, i_cfg_stb, i_wb_we, i_wb_addr, i_wb_data,
			o_wb_ack, o_wb_stall, o_wb_data,
		o_qspi_sck, o_qspi_cs_n, o_qspi_mod, o_qspi_dat, i_qspi_dat);

Rather than defining two (properly separate) interfaces, I just created a single interface with two strobe signals: i_wb_stb for reading from memory, and i_cfg_stb for reading from the control port. Results were returned through a common return port of o_wb_ack, o_wb_stall, and o_wb_data.

If first hit a problem with this interface when I tried to handle acknowledgments. Since the simple interconnect I was using just OR’d all of the acknowledgment signals together in order to generate an ACK signal to return to the bus master,

always @(posedge i_clk)
	wb_ack <= |{ flash_memory_ack, memory_ack, etc_ack };

there was never any problem with OR’ing two acknowledgment signals together within the slave. Indeed, it spared bus logic in the return. Similarly, since the interconnect selected the data to be returned based upon which slave set its acknowledgment signal,

always @(posedge i_clk)
case({flash_memory_ack, memory_ack, other_ack, etc_ack })
	// No line to accept flash_config_data
4'b1???: wb_data <= flash_memory_data;
4'b01??: memory_ack;
// etc.

a slave interface that hadn’t been referenced could validly set it’s acknowledgment signal and then return data via the other slave interface port.

Both of these are an abuse of the Wishbone protocol.

As you might expect, I then ran into problems when I wanted to update my interconnect to drop the poorly designed return logic, since I wanted to now create an index register for a multiplexer that would identify which core should receive the returned data. (If that’s confusing, I explain the concept here.)

always @(posedge i_clk)
if (wb_stb)
	wb_index <= wb_addr[LOW_BITS-1:0];

always @(posedge i_clk)
case(wb_index)
0: wb_data <= flash_memory_data;
// This port will never return valid data, since there was no defined
// flash configuration port defining a flash_config_data value
1: wb_data <= flash_config_data;
2: wb_data <= memory_data;
// etc.

Once the selected channel stopped returning the correct value under this new logic implementation, I then started to learn the error of my ways.

Note the key word “started”. Rather than fixing the problem properly by creating two separate bus interfaces, I cheated. I returned the same data on both channels. This would work because the returned acknowledgment was still the OR of all the acknowledgments.

assign flash_config_data = flash_memory_data;

Now the bill is coming due again, since in order to support multiple masters it is now possible that two masters will each try to access the two peripheral interfaces, both at the same time, and so combining values in the return port is no longer possible under any stretch.

That means I’ll need to change the “Universal” QSPI flash controller port list to be something closer to,

module	qflexpress(i_clk, i_reset,
		i_wb_cyc, i_wb_stb, i_wb_we, i_wb_addr, i_wb_data,
			o_wb_ack, o_wb_stall, o_wb_data,
		i_cfg_cyc, i_cfg_stb, i_cfg_we, i_cfg_addr, i_cfg_data,
			o_cfg_ack, o_cfg_stall, o_cfg_data,
		o_qspi_sck, o_qspi_cs_n, o_qspi_mod, o_qspi_dat, i_qspi_dat);

like it probably should’ve been in the beginning.

There’s another problem that I’m likely to struggle from as well: all of the bus wire names are changing. Creating a bus structure where every wire is prefixed by wb_, as in wb_cyc, wb_stb, wb_we, etc., is great when only one master will ever control this bus. Creating multi-master support is going to require changing all of these wire names so that each peripheral can be interacted with separately. This will result in an annoying incompatibility between AutoFPGA versions. While I think the benefit outweighs the problems, it will take some time to upgrade all of my separate projects to get things to work again with the new version.

Conclusion

My conclusion from this whole affair is that I’m learning some hard lessons about design. In particular, be careful not to use the knowledge of how the other end of an interface is working to violate the rules of that interface. Sure, the result might work for your first project, but by doing so you are incurring a debt–one that will need to be paid eventually when you use the core later in a different environment.

Some time ago, I remember consulting with a particular technology company about this issue. They shared with me their own struggles, sounding very similar to this one, where they had all kinds of cores written in house but each of which had abused the bus protocol in some fashion or other. The result was that drawing a core out from their library to use in a new project incurred an update cost any time the new environment was different. Worse, because of the tyranny of the urgent, the didn’t fix the issue properly. Instead, they had chosen the quick and easy solution of modifying the library core to fit the new need. As a result, their core IP library was filled with many similar cores–but all having subtly different (abused) interfaces.

It’s fun for me to consult and discuss the “way out” of a problem like that. I’m sure you, like me, enjoy telling other people how to live their lives. It becomes quite a different matter when you find yourself stuck in the same mire.

Behold, thou … makest the boast of God, And knowest his will, and approvest the things that are more excellent, being instructed out of the law; And art confident that thou thyself art a guide of the blind, a light of them which are in darkness, An instructor of the foolish, a teacher of babes, which hast the form of knowledge and of the truth in the law. Thou therefore which teachest another, teachest thou not thyself? thou that preachest a man should not steal, dost thou steal? (From Romans 2:17-21)

Ouch. That hurts. So true though.

So I’m going to try to start paying off this debt today, together with whatever interest may have accrued. I’d still like to come back later, Lord willing, and discuss that stream to Wishbone converter–but that’ll have to wait for another day.