What do you do when you need TCP, but don’t have it?

Let me back up and set the stage a bit more. I’m working with what will be an underwater design, as shown in Fig. 1.

Fig 1. Controlling an Underwater FPGA

For those who haven’t figured it out, this will (eventually) form the basis of a SONAR device.

At first, this problem seems simple enough: there will be an FPGA device controlled via a network port. Easy, got it. Better yet, what’s the easiest network protocol to use? TCP. Open Source TCP, software stacks aren’t all that hard to write, and I know there are several open source stacks that are easy enough to use. (No, I’ve never written one myself …)

But, let’s dig a little deeper: I want to be able to control and debug the CPU via the same network port. That means I’ll want to be able to stop the CPU, read its registers, adjust the contents in RAM, and then restart it again all over the network. Put another way, if the CPU software won’t be running at all times, then I can’t implement TCP in software and still debug the CPU using the same TCP stack. Worse, what if the FPGA firmware can’t be trusted? Now things are getting a bit more challenging. How shall a piece of broken FPGA firmware be updated without using software on the FPGA board?

This, then, is where this problem starts.

At the most basic level, any FPGA design can be halted and updated via the JTAG port. Most vendor designs will allow for that. However, in this design, that JTAG port will be quite literally underwater. The only way to access it will be to bring the entire unit back out of the water, to dry off the chassis the FPGA board sits within, and then to uncap the JTAG port and access it. This is the insurance policy for the project–guaranteeing that the chassis the hardware sits within will not need to be opened except in extreme circumstances.

Fig 2. Reconfiguring a running FPGA

That’s not, however, going to be the normal mode of operation. Normally, upon application of power, this design will need to automatically start up and then do something to allow itself to be updated. This can be done via an Internal Configuration and Access Port (ICAP) found within the FPGA itself. If a second design configuration can be written to the flash, the ICAP port can then be told to tell the FPGA to reconfigure itself from this second configuration location.

Fig 3. Basic outline of a UART protocol to control a Wishbone Bus

Normally, I handle this sort of problem within my designs using a serial port together with what I call a “debugging bus”: a protocol, running over that serial port, that allows me to read from or write to any address on the bus within the FPGA. We’ve discussed this protocol before, and for reference you can see the basic components of this protocol in Fig. 3 on the left: Commands are grouped into words, decompressed, placed into a FIFO, and then those commands are used to control the bus. Bus returns then come back. Reset acknowledgments and interrupts get added to the return stream, which is then compressed and split back into bytes before being returned across the serial port. In general, this works great, and I’ve used this approach for years. Even better, having full bus access makes it really easy to debug the FPGA–as long as you can guarantee that neither the debugging bus protocol nor the bus itself will fail on you.

That’s great for a serial port protocol, where character’s aren’t (generally) lost and where messages don’t accidentally get repeated.

But what about a network protocol?

Today’s discussion, then, will discuss a network protocol which can be used to do this same thing, but with an interface simple enough that it can be implemented in an FPGA.

Understanding the Problem

Let’s back up, therefore, and take a moment to understand the goal and purpose of this protocol together with some of what makes the networking environment unique. This will lead us to an understanding of some of the problems involved that will need to be handled by this protocol.

All network traffic takes place in packets

The first rule of networks is that all communication takes place in packets. Even better, since we’ll be using Ethernet, every Ethernet packet ends in a four byte Frame Check Sequence based upon a Cyclic Redundancy Check (CRC). These four bytes are produced by a function applied to the contents of the packet. Then, on reception, the receiving end can also calculate the same function. If the result matches the four-byte frame check sequence found in the packet, then you can have some strong assurance that the packet was received across the interface without error.

Packets may be lost

Here’s our first problem: if something goes wrong on the network–perhaps there’s too much congestion somewhere, perhaps the FPGA is still responding to some other packet, then a packet may be dropped. Indeed, we discussed this sort of idea quite recently. In network implementations, dropped packets are considered a “normal” phenomena, and any protocol working across the network needs to be able to recover from a lost packet.

Think this through a bit, since this can be a real problem. What happens if the bus within the FPGA locks up because some peripheral isn’t responding? In the worst case, all following packets will be lost–to include any packets telling the FPGA to reset itself. This is a real possibility that we’ll need to consider as things go on.

Packets may be repeated

The next problem is that packets may be repeated.

At first, I simply poo-pood this idea. This would never happen in any of my implementations, I told myself, because nothing in the network stack will ever repeat a packet.

Then I got to thinking some more about this.

Fig 4. Request/Reply protocol

Imagine you have two computers talking to each other, and one computer (the client) makes a request of the second (the server/FPGA). If the request gets dropped, how shall the first computer (the client) know it was dropped except if it doesn’t get any response? Worse, if the client doesn’t get any response, does that mean 1) the server didn’t get the request, or 2) that the server did get and process the request and the client just didn’t get the reply? All the client can do at this point is to just re-send and re-send its packet until it eventually gets a reply.

Voila, repeated packets. We’ll need to handle these somehow.

Now lets make matters even worse: some requests, such as commanding the flash device to erase a sector or to program a page, aren’t things you want to do twice. Any network based debugging protocol will therefore, of a necessity, need to be able to properly handle duplicated packets.

Packets may arrive out of order

Just to make things worse, not only may packets get dropped or repeated, they might also arrive out of order.

UDP/IP is a fairly simple protocol

One of the easiest protocols to implement in an FPGA is UDP/IP. 1) It’s easy enough to program most computers to send UDP packets, and 2) receiving UDP packets is relatively easy as well. Even better, if implemented well, the internet (the IP part) has some remarkable capabilities to it: I might even be able to access this piece of underwater hardware from the other side of the world if necessary. On the FPGA side, there are some challenges involved in implementing UDP/IP, but the result is still fairly easy to accomplish. Some of these challenges include:

  1. The FPGA needs to know the Ethernet MAC address, IP address, and destination UDP port to send the UDP/IP packet to.

    These can often be copied from the source addresses found in the request, so that’s not a big problem.

  2. Both IP and UDP headers need to know the length of the rest of the packet–possibly even before the rest of the packet has been formed.

  3. If IP header and UDP payload checksums are implemented, these are also placed prior to the packet data.

This sort of necessitates that a packet be formed first and placed into a temporary buffer, before being forwarded downstream. Still, this is quite doable.

The problem with UDP/IP, however, is that it offers no protection against the problems listed above: packets may still be dropped, duplicated, or arrive out of order.

IP packets may be fragmented

If a packet is too big for some portion of the network, then whatever intermediate node recognizes this is supposed to be able to split the packet into multiple sub-packets (fragments). These packets will then be reassembled on the far end.

TCP/IP requires memory

I’ve used TCP/IP in the past. In general, it’s always been one of my favorite protocols to work with: it’s easy and reliable. However, I’ve never had to implement it before, neither have I ever tried to implement it on an FPGA. When digging into what it would take to implement the TCP/IP within an FPGA, it doesn’t take long to learn that each connection will require some amount of memory to work properly–perhaps as much as 64kB per connection.

  • The connection setup defines the maximum and required sizes of this window

  • Packet data includes where in the stream the data comes from

  • Acknowledgments include the latest received stream position

Further, TCP/IP removes packet boundaries in favor of transmitting stream positions. In many ways, though, TCP/IP might be the ideal way of encapsulating what was once a serial port stream.

Others have made this protocol work on an FPGA, so I know it can be done. In my review, however, I ended up balking at the idea of implementing my own TCP/IP protocol handling stack within FPGA logic. (I’ll probabbly still place an implemention of the TCP/IP stack within the CPU software before I’m done …)

GbE is fast

From my experience, and with the boards I have, a serial port can typically achieve speeds somewhere between about 100kB/s (1MBaud) and 400kB/s (4MBaud). Gigabit Ethernet, on the otherhand, can transfer data at 125MB/s. That’s probably faster than I need for this part of the project. The biggest impact this is likely to have, though, is that it might keep my compression algorithm from working–since that algorithm only tries to compress a response value as long as the return pipeline is stalled. If the link is so fast that it never stalls, then it might make sense to remove the serial port compression from this network re-implementation.

The Goal

Just to make it clear, my goal is to be able to control the bus within an FPGA design from a network interface. This means I want to be able to read from and write both peripheral registers as well as memory and flash. Using this capability, I want to be able to do things like:

  1. Reading, erasing, programming, and verifying flash memory contents

  2. Issuing a warm boot request, so that the FPGA will reload itself from a secondary location

  3. Configuring any application specific hardware from an external host.

    As one example, I’m hoping to control an ADAU1761 audio chip on a Nexys Video board. The network bus should be able to turn this chip on, adjust which channel(s) are selected, and any gain associated with the individual channels.

    I expect to use a separate protocol to handle the return audio from such a controller. For now, I just want a dependable network protocol I can use to guarantee my ability to configure this controller across the network in the first place.

    This is a test-only capability, to allow an operator to “hear” any internal signals while the device is still on the bench. Once the device is sealed up, similar circuitry within the device will be used to route from among multiple potential signal sources and sinks to their ultimate destinations.

  4. Without access to JTAG, I won’t have any vendor tools available for internal logic analysis in order to diagnose any faults. Instead, I’ll be using my own Wishbone scope across this network protocol.

    This also means that I’ll want to assume only a bare minimum of design functionality here. What happens, for example, if I need to debug the DDR3 SDRAM protocol? In that case, I would need to be able to operate this bus without access to any but internal FPGA memory.

  5. Rebooting, pausing, stepping, and stopping or starting the CPU within the design.

  6. Examining CPU registers when the CPU is halted, and perhaps adjusting their contents if necessary.

That’s my goal, and you now know the problems associated with working with the network. Now, knowing this information, what can we do in order to generate some form of protocol that we can use?

Generating a Protocol

Let’s take a moment to walk through the design of this new protocol, and the various choices I made along the way to deal with the problems listed above.

Fig 5. Encapsulating the former serial port protocol

My first draft of this protocol, the draft I’ll be discussing today, simply involved packetizing the requests I would’ve sent over the serial port in the first place. That means these debugging bus packets simply include an additional header on top of the headers already existing, together with a set of bytes which would’ve normally been sent across a serial port. We’ll call this the “NetBus header” and “serial port debugging bus payload” or just “payload” for short.

My second (rather arbitrary) choice was to insist that all protocol interactions take place in two parts: 1) the support software sends a request, and 2) the FPGA returns a reply. Moreover, as I alluded to above, every request was to receive a reply. Should a reply not be returned, that would mean that either the FPGA either didn’t receive the request or that the response hadn’t been received. This also meant that the FPGA would never initiate a transaction on its own, it would only ever respond to requests.

Incidentally, this also solves the problems associated with out of order packets: if only one request is ever outstanding at a time, then there will never be two or more packets to get reversed.

Fig 6. Eight GPIO bits

One of the neat features of my serial port protocol was that it had a special means of communicating interrupts to the host computer. At one time I used this feature with my original flash controller to notify any programming software of the end of a write transaction. This was cool, but … perhaps we could expand upon it? For example, it’d be nice to get some status bits from the controller. Such status bits might include answers to questions like: Has the buffer within the controller ever suffered from an overrun? Has the controller received any bus error responses? Is this the first packet following a reset? All of this together necessitated a set of general purpose I/O bits that could be controlled via this protocol. So far, I’ve settled upon four of the eight bits shown in Fig. 6. These would be sticky bits when set, and only cleared upon a write from the external host. Another four bits remain available for … whatever purpose.

I created this capability with two parts: a mask and a value. That way, any GPIO bit could be updated whose mask bit was set to the value it was set to, allowing independent control of each of these bits. Further, by setting the mask to zero, the FPGA would simply ignore these bits. Indeed, this I/O part of the protocol is very similar to one we’ve discussed before.

My third choice was to place a stream packet ID number into the header of the packet. The FPGA can then use this ID to identify and handle repeated packets. This way, should the FPGA ever detect a second request with the same frame number, then it would simply repeat the response to the prior packet with that frame number.

One stream ID was designated as special: ID=0. This indicated a “Sync” packet in this protocol. This would be the ID I would use when initiating a communication with the device. I could then use this packet to record and set the Ethernet MAC, IP address, and UDP port of the source, as well as to reset the state of the internal compression engine to something that could be known within the client program. This way, upon starting a client program, the client could quickly synchronize the two compression engines.

Further, the design should handle packets with an empty payload, as sort of a “keep-alive” packet.

Request Packet Format

These choices led me to the packet design for request packets shown in Fig. 7.

Fig 7. Host to FPGA Packets

In general, the host only added four bytes to any payload. The first two were the packet ID. This ID could be anything, with zero having the special meaning discussed above, and the second requirement being that ID’s shouldn’t be repeated in succession.

Reply Packet Format

The reverse link is very similar, if not almost identical. There are only a couple of differences.

Fig 8. FPGA to Host Reply Packets

For example, all replies will include the host packet’s ID number so the host can know which request is being replied to. This will allow the host to remove duplicate packets from the return stream.

Further, all replies will include sixteen general purpose I/O bits. Eight of these, i_gpio, are input bits collected from somewhere in the system–such as the interrupt bit. Indeed, we discussed four of these bits above, while the other four remain uncommitted. The other eight bits, o_gpio, are a simply a reflection of the current settings as generated by the last packet sent by the host controller.

The next word, however, is generated within the FPGA logic. The first 16-bit field of this word is the prior packet ID from the previous packet. This was chosen so that, in high speed situations, I might send two packets to the FPGA and then have some assurance the FPGA had received both in case the first response was dropped. The next sixteen bits are simply a one-up packet counter from the FPGA. This counter would be increased on any sends or re-sends, allowing the host to identify where, if at all, packets get lost in this system. Is it between the host and the FPGA, or on the return trip from the FPGA to the host?

Building the Client’s State Machine

Before concluding, let’s take a quick look at what the host control software would look like for this. We’ll base this look upon a UDPSOCKET implementation that encapsulates any issues of issuing packets to or receiving packets from the O/S.

A couple of other fields will allow us to keep a copy of the last received packet, or the packet we are getting ready to send.

class	NETBUS {
	char		m_rdbuf[RDBUFLN], m_pkt[TXBUFLEN];
	UDPSOCKET	*m_udp;
	unsigned	m_rxlen, m_frameid;
	// ...

The first step, therefore, in communicating with this new FPGA protocol is to establish a connection. This is done by sending a sync packet.

void	NETBUS::sync(void) {
	unsigned	nrd = 0;

	// Make up to MAXTRIES attempts to synchronize
	for(unsigned tries=0; tries < MAXTRIES && nrd == 0; tries++) {
		// Try sending a sync packet

		// Clear the 4-byte header, then send a packet of 4bytes only
		for(int k=0; k<4; k++)
			m_pkt[k] = 0;

		// Turn this into a UDP packet and send it
		m_udp->write(4, m_pkt);

m_udp->write will call the system to actually send this across a UDP port we’ve connected ourselves to.

The purpose of the for loop is so that, in the case that we don’t get any response, we can try sending up to MAXTRIES of these packets.

The next step is to look for the return packet. We’ll wait PKT_TIMEOUT milliseconds for this return–using the poll() system call to implement this timeout.

		nrd = m_udp->read(RDBUFLN, m_rdbuf, PKT_TIMEOUT);

Now that we have a return packet, let’s look for and assemble the packet ID. If this ID is not zero, then this isn’t a response to our sync request.

		rxframe = ((m_rdbuf[0] & 0x0ff)<<8) | (m_rdbuf[1] & 0x0ff);
		if (nrd > 0 && rxframe != 0) {
			// This is a return from another request
			// Ignore it.
			nrd = 0;

If after MAXTRIES attempts we still don’t get a response, we’ll throw a bus error so that the system can deal with this further up in the chain.

	} if (nrd == 0)
		throw BUSERR(0);

Otherwise, if we do have a good packet, we can look through the i_gpio values and record or process them as necessary. For example, this is where we’d mark that we’d received an interrupt of some type.

	m_rxlen = nrd;	// Mark how long the packet is that's in our buffer

Sending data to the device is very similar. First, you’d generate a packet header. Here we choose to use a pseudorandom number algorithm, although you can use roughly any algorithm you want–as long as it doesn’t generate a zero packet ID.

char \*NETBUS::begin\_packet(const NETBUS::BUSW address) {
	if (m_frameid == 0)	// Last packet was a sync packet
		m_frameid = 1;	// Following a sync, the first ID == 1
	else if (m_frameid & 1)
		m_frameid = (m_frameid >> 1) ^ RANDOMIZER_COEFFICIENTS;
		m_frameid = (m_frameid >> 1);

	m_pkt[0] = (m_frameid >> 8) & 0x0ff;
	m_pkt[1] = (m_frameid     ) & 0x0ff;
	m_pkt[2] = m_pkt[3] = 0;

Following the header, we’ll encode the address of the subsequent transaction.

	// Return a pointer to the packet following the
	// address
	return encode_address(address);

At this point, we can fill our packet with data to be written to the device.

Once done, we’ll try writing this data to the device MAXTRIES times, or until we get a response.

	for(unsigned tries=0; tries < MAXTRIES; tries++) {
		// Send the request packet
		m_udp->write(pktlen, m_pkt);

		// Try reading a packet
		if (m_rxlen > 0)
			// We were successful!

		// Otherwise we repeat
	} if (m_rxlen == 0)
		throw BUSERR(address);

	// Search the returned packet for evidence of a bus error.  If we
	// get a bus error, we'll throw an exception as above.
	// ...

We’ll then repeat this process until all of the data we need sent has been sent.

As you can see, the protocol is pretty simple from a software standpoint to get working reliably.


It works.

That’s all that matters, right?

Well, not quite. In reality, this is just my first draft of a packet protocol of this type. For example, I haven’t implemented IP defragmentation. (Nor am I really planning on doing so in the FPGA hardware.) Neither have I implemented IP support beyond version 4, or impemented any header option support. Similarly, I haven’t implemented any support for zero length packets beyond the original sync packets.

You can see a list of potential improvements I’ve been considering in Fig. 9.

Fig 9. Potential upgrades to this protocol

For perspective, I rewrote my original serial port bus protocol about three times before finally I arrived at something I liked. Each version was then better than the previous one. Indeed, even now I have a fourth version of that serial port protocol that I’m slowly testing. However, since this fourth version can only get about a 10% speed improvement over the current version for the same baud speed, it hasn’t gotten a lot of priority. Put simply, the speed of the serial port isn’t really slowing me down significantly.

Fig 10. Learning and rebuilding is to be expected

That in itself is a lesson in any endeavor, and one I learned when working on my Ph.D. I like to sum it up with this advice to students, “Fail early. Fail often. Plan for failure.” Or, alternatively, “Success is measured by the number of failures that it takes to achieve it.” (These quotes are my own …) Given that the best design is never the first one, you should plan on rebuilding any design once or twice before it will become the best design that you want to use and reuse over and over again.

From a more business perspective, I might argue the advice would be to put lots of energy into anything you intend to use more than once.

From all these perspectives, this is only a first design and draft of such a protocol. I expect it to get better over time.