If you’ve never heard of “blinky” before, it’s the name given to a piece of software or even an FPGA design that simply blinks an LED. We’ve discussed building blinky before, as well the more advanced (but no less fun) project of moving an active LED back and forth across a set of LEDs like one of my favorite TV shows as a kid, “Knight Rider”.

Even better, we’ve also discussed how to create a general purpose I/O controller, which could then be used to run a blinky program from within a CPU. In that particular article, we also measured how fast a CPU could toggle an I/O pin as part of such a blinky program. The resulting toggle rates, between 1 and 47MHz, are fairly impressive for a soft-core running within an FPGA with a 100MHz system clock.

Today, let’s return to blinky again, but this time let’s compare and contrast several approaches to the problem of toggling four separate LEDs: one at 1Hz, one at 2Hz, one at 3Hz, and a fourth one at 5Hz.

FPGA blinky

The easiest way to toggle four separate LEDs at once on an FPGA is to create four separate blinky modules. Each module would have a counter of some number of bits, say MSB+1 bits, so it might be defined as reg [MSB:0] counter;. We could then add a different step in each module, where the step size is 2^(MSB+1)/CLOCK_FREQUENCY * BLINK_FREQUENCY, and so create blinking LEDs at any frequency we want.

parameter STEP = BLINK_FREQUENCY *(1<<30) / (CLOCK_FREQUENCY/4);
always @posedge i_clk)
	counter <= counter + STEP;
assign	o_led = counter[MSB];
Fig 1. Four counters, each driving LED's

Internally, this might look something like Fig. 1 on the right. Fig. 1 shows four separate logic blocks, each similar to the block above, and each toggling an LED at its own rate.

Of course, this doesn’t tie our LEDs together in phase. What if we wanted all of them to have the same phase, so that they all turned on together at the top of a second?

In that case, we’d need to multiply a common counter, set to step at 2^(MSB+1)/CLOCK_FREQUENCY, by our blink frequency to get the result.

parameter STEP = BLINK_FREQUENCY *(1<<30) / (CLOCK_FREQUENCY/4);

always @(posedge i_clk)
	counter <= counter + STEP;

always @(posedge i_clk)
begin
	counter_1hz <= counter;
	counter_2hz <= counter * 2;
	counter_3hz <= counter * 3;
	counter_5hz <= counter * 5;
end

assign o_led_1hz = !counter_1hz[MSB];
assign o_led_2hz = !counter_2hz[MSB];
assign o_led_3hz = !counter_3hz[MSB];
assign o_led_5hz = !counter_5hz[MSB];
Fig 2. Generating synchronized blinks from a counter

If you’ve never built a design like this before, then I would encourage you to try this. Remember to formally verify it first, and then run it in simulation. The tutorial should help you there if you have any questions.

For a next level challenge, consider removing the multiplies and replacing them with shifts and adds. Since we’re only multiplying by 2, 3, or 5 above, this should be fairly easy.

If you work with FPGAs at all, this test should be fairly basic–perhaps even too easy. If you haven’t, this is a fun place to start.

Today, though, let’s take it up a notch.

CPU Blinky, polled

Fig 3. Placing a CPU within an FPGA

Moving from simple logic to a CPU is a big step within an FPGA. Even if we use a resource minimized CPU, such as the ZipCPU, you’ll still need a lot of additional infrastructure. At a minimum, you’ll need a ROM to store startup instructions (I like using a flash memory device), a RAM to hold any local variables, a timer to determine what 1Hz is and a GPIO controller to actually set any LED values. We’ll also use a Programmable Interrupt Controller (PIC) as part of our solution. Tying all of these together will require some type of memory and peripheral bus (I like Wishbone), such as is shown in Fig. 3 on the right.

When we last discussed blinking an LED from a CPU, we discussed how to turn the LED on by writing to a particular register from a C program,

	*_gpio = LED_ON;

and then writing again to turn it off.

	*_gpio = LED_OFF;
Fig 4. Blinking with a counter

If you wanted to toggle an LED across some time period, you could wait in a for loop, and then toggle your LED–as shown below and in Fig. 4 on the left.

while(1) {
	for(int k=0; k<WAITTIME; k=k++)
		;
	*_gpio = LED_ON;

	for(int k=0; k<WAITTIME; k=k++)
		;
	*_gpio = LED_OFF;
}

Getting the value for WAITTIME just right might take some work. Worse, caches are notorious for providing fast but unpredictable wait times for both instructions and data.

Fig 5. Triggering off of a 1ms hardware counter

But how might we handle four LEDs each toggling at a different rates?

We could use a timer! Remember our work building an interval timer some time ago? Let’s use it now. We’ll suppose the address of this timer is kept in the constant pointer _timer, and we’ll have it create a repeating interval of one millisecond.

	_timer = 0x80000000 + (0x8000_0000/CPU_FREQUENCY_HZ)*1000;

We can then use our interrupt controller, with a register addresss at _pic, to determine if this timer has tripped:

	// Turn off all interrupt sources, and clear
	// all active interrupt indications
	_pic = 0x7fff7fff;
	while(1) {
		int	picv = *_pic;
		// Reset any tripped interrupts
		_pic &= (0x0ffff & picv);

If the interrupt has tripped, we know a millisecond has passed, so we can increment our millisecond counter, count_ms, and then toggle each of our LEDs.

		// Poll the interrupt timer to check
		// if the interrupt has tripped
		if (picv & INT_TIMER) {
			// if it has, adjust our LEDs
			// based upon the millisecond
			// counter, count_ms
			count_ms ++;
			if (count_ms >= 1000)
				count_ms = 0;
			toggleled(count_ms, 1, 1);
			toggleled(count_ms, 2, 2);
			toggleled(count_ms, 3, 4);
			toggleled(count_ms, 5, 8);
		} 

Of course, polling the interrupt controller like this is really the wrong way to do this, but let’s come back to this thought in the next section when we discuss how to do this with interrupts.

One of the neat things about this approach compared to the uncalibrated for loop above is that we can now know exactly how many clocks take place between interrupts. We also know that, should the CPU, be late in processing an LED fast enough, at least the sequence will maintain its frequency rather than randomly getting later and later and so slower–since the timer restarts itself every ms in this case.

For now, let’s look at this toggle_leds_on_ms() function. How should this function work? The same way as before! We’ll multiply our counter by the toggle rate, divide by the number of counts per second, and then grab the resulting bit of interest from the number of times the counter wraps.

void	toggle_leds_one_ms(int count_ms, int rate_hz, int which_led) {
	int	mpy;

	// count_ms is the number of milliseconds from the top of our
	// second, ranging from 0 to 999.
	//
	// count_ms times our rate will give us a counter that overflows
	// (rate) times per second.  Twice that creates a counter that
	// overflows (2*rate) times per second--enough for us to turn on
	// and then off the LED.
	mpy = count_ms * (2*rate_hz);
	mpy = mpy / 1000;

	// mpy & 1 now contains what we want our counter value to be
	//
	if (mpy&1) {
		// Turn this LED on
		*_gpio = (which_led<<16) | (which_led);
	} else {
		// Turn this LED off
		*_gpio = (which_led<<16);
	}
}

Remember the way we constructed our GPIO controller: the top 16 bits on read are any inputs to our design, while the bottom 16 bits are outputs. To be able to set particular output bits and not others, we write a mask of the outputs we wish to adjust to the top 16 bits of the register–hence the (which_led << 16) logic from above. That way we can leave the other I/O registers alone, while just adjusting only the ones we want to change.

Interrupt driven CPU Blinky

What if we could shut the CPU down, though, when nothing was changing?

Fig 6. Using the ZipCPU WAIT instruction

This it the purpose of the ZipCPU’s WAIT instruction. We can get access to it from C without needing any assembly by calling the built-in zip_wait() function. This instruction sets the SLEEP bit in the ZipCPU’s control register. Further, if the ZipCPU isn’t already in user mode (interrupts are enabled only in user mode), it puts the ZipCPU into user mode. Then, when an interrupt trips, the ZipCPU will return to supervisor mode (where interrupts are disabled) so that we can process the interrupt.

The ZipCPU specification discusses how to do this using a wait_for_interrupt function. This function primarily deals with setting up the interrupt controller, but once done it issues a zip_wait() instruction for exactly this purpose.

void	wait_for_interrupt(int interruptmask) {
	// Turn off all interrupts, while acknowledging none of them
	*_pic = 0x7fff0000;
	// Turn on only this interrupt
	*_pic = (interruptmask << 16)|0x80008000;

	// Be careful not to adjust any interrupts that have
	// already tripped (the bottom 16 bits of the
	// _pic), lest we miss the interrupt we are
	// looking for

	// Wait for the interrupt, forcing the CPU to sleep
	// until the next interrupt
	zip_wait();

	// We won't clear the interrupt here,
	// lest we destroy some information the
	// calling function needs.
}

Now we can rewrite our code from above to wait for an interrupt. Once done, the CPU will sleep between its top-of-the-millisecond computations. The biggest difference in the code below is that we now issue a wait_for_interrupt() call at the top of every loop. Following that call, things should look about the same as before.

	_pic = 0xffff7fff; // Clear all interrupts
	while(1) {
		wait_for_interrupt(INT_TIMER);

		// Now check for any tripped interrupts
		int	picv = *_pic;

		// Reset any tripped interrupts
		_pic &= (0x0ffff & picv);

		// Check if the timer interrupt has tripped
		if (picv & INT_TIMER) {
			// if it has, adjust our LEDs
			// based upon the millisecond
			// counter, count_ms
			count_ms ++;
			if (count_ms >= 1000)
				count_ms = 0;
			toggle_leds_one_ms(count_ms, 1, 1);
			toggle_leds_one_ms(count_ms, 2, 2);
			toggle_leds_one_ms(count_ms, 3, 4);
			toggle_leds_one_ms(count_ms, 5, 8);
		} 

Not bad, huh?

This program now functions exactly the same as the last one, save that the ZipCPU is inactive while waiting for the interrupt. This can have two advantages. First, the ZipCPU will stop using the bus, allowing any non-CPU logic to transfer data without contention from the ZipCPU. Second, since the ZipCPU will stop issuing instructions, it can be placed into a lower power state. Stopping the CPU clock at this point might even be an option to lower power–as long as any interrupt source kept clocking, and as long as the interrupt controller could restart the CPU’s clock. Still, it is doable, although it does depend upon how much work you want to do to keep your power down.

What if we wanted to get really fancy, though, and create a multitasking blinky? One where we had a separate program to toggle each of several various LEDs?

Multi Blinky

To build the multi-tasking blinky, let’s step back and just look at the question of how to run multiple software programs on a piece of hardware.

Fig 7. Blinking an LED with four separate CPUs

The easiest way to do this, software-wise, might be to build multiple CPUs, as shown in Fig. 7 on the right. Easy, that is, until the two programs need to communicate with each other … but that’s a story for another day. Each CPU might then run a separate program toggling an LED.

Let’s build such a program now. We’ll start by removing the counter from our toggle_leds_one_ms() function. Instead, we’ll make the total number of milliseconds that have passed into a global variable.

int	milliseconds = 0;

We’ll then adjust our toggle_leds_one_ms function so that it just reads and references this global milliseconds counter. We’ll call this new function toggleled.

void    toggleled(int led_bits, int rate) {
	while(1) {
		int     ms = milliseconds, led;
		ms = ms * (rate * 2);
		led = (ms / 1000) & 1;

		if (led)
			// LED was on.  Turn it off
			*_spio = (led_bits<<8);
		else
			// LED was off.  Turn it on
			*_spio = (led_bits << 8) | (led_bits);
	}
}

Notice our care to only read the global milliseconds value once. Since reads are atomic and since we copied the milliseconds value, we don’t have to worry about it changing mid-routine as a result of any routine that might interrupt this processing.

A second thing to notice is that the design we are basing this off of has a Special Purpose I/O register, _spio. Setting LED’s in this _spio register is just like using the _gpio register above, save that the LED area is now the lower 8-bits, and the “adjust-these-bits” area is the 8-bits above that. Hence, our reference above to _spio and (led_bits<<8).

Finally, as long as something exists to count and keep the milliseconds counter for us, we can write a program to toggle our first LED at 1 Hz as simple as,

void    one_hz(void) {
	toggleled(1, 1);
}

Now if each CPU on our circuit board had such a program, they could all toggle their LEDs together.

void    two_hz(void) {
        toggleled(4, 2);
}

void    three_hz(void) {
        toggleled(16, 3);
}

void    five_hz(void) {
        toggleled(64, 5);
}

There are a couple of problems with this approach. First and foremost, the ZipCPU project has been built around small FPGAs and low logic. (I never really had the budget for much more.) Four CPUs, whether on four circuit boards, four chips on the same circuit board, or four CPUs within the same FPGA, has never been within my budget. I’d like to run each of these four separate programs on the same CPU instead.

The solution to this problem is to virtually switch CPUs over time, as shown in Fig. 8 below, through a process called time-sharing.

Fig 8. Blinking an LED with four separate CPUs

Here’s how it works: First, the CPU will start out running one program–we’ll call this context 1 or C1 for short. Then, after some period of time, we’ll call it a quanta and set it to 1ms, an interrupt will return the CPU to its supervisory task, S. The supervisory task will switch programs to the second context, C2, which will then run for the next quanta.

Fig 9. Contexts contain CPU Register sets

The key to the whole operation is that each “program” needs to believe that it owns the CPU. In order to support this, CPUs are typically designed so that their entire state is captured within a set of registers. This register set, defining what the CPU is up to, is called a context. Each context contains its own set of registers, R0-R15 on the ZipCPU, to include the stack pointer SP (also known as R13), condition codes CC (also known as R14), and the address of the next instruction, often called the program counter PC or equivalently R15 on the ZipCPU. Therefore, in order to switch from running one program to another all the CPU needs to do is to write the current context (i.e. register set) to memory, and to then to read the stored context for the next program from memory. This is often called a context switch.

To facilitate context switching, the ZipCPU maintains two copies of its register set: one for the supervisor and one for user programs. Of these two, the supervisor context is never swapped. Indeed, it is the supervisor context that swaps user contexts.

Fig 10. Each context needs its own stack area

Outside of the CPU, each context will also need an area of memory for its local variables. This is called the stack, and the stack pointer will point to a location within this memory–starting at the end of the memory. As memory is allocated, the stack pointer will grow towards lower memory as shown by the upwards arrow in Fig. 10.

We can use a simple array of integers to capture the information contained in a ZipCPU register set.

typedef	struct	{
	int	r[16];
} task_context;

To swap between one program and the next, we’ll just swap register sets. The two programs will never know what happened, because when the next program is activated, all of its data will be right there in its registers just as it was when it left off.

Let’s walk through how we might do this.

The first step will be to assign a stack to each task, as shown in Fig. 10 above. This is a place of memory designed to hold each task’s local variables.

Fig 11. The heap grows downwards

Normally, I’d use malloc() to do this or even the C++ new operator. Today, we’re going to try to do this without the C-library. As a result, we’ll need an alternative. I’m going to call that alternative ugly_malloc(). It will act the same as malloc(), except that there’s no free() call and so the memory will never return.

This ugly_malloc() function is built around a pointer to the end of our program’s fixed memory locations. The AutoFPGA linker scripts define this pointer as _top_of_heap. We can grab that here,

extern	char	_top_of_heap[0];
unsigned	_heap = (unsigned)_top_of_heap;

Just getting to the point where _heap even gets this value takes a lot of work, much of which we will skip here. That work starts in the AutoFPGA configuration script that describes where the ZipCPU’s memory is even located on a particular hardware architecture. That script defines the _top_of_heap pointer and makes certain it’s aligned. The next step takes place in the bootloader, since _heap is a variable whose value might change. Such variables need to be set initially. So, this bootloader copies the initial value into _heap before starting our program.

Finally, we can now write this ugly_malloc() function. This function works by just grabbing the next nbytes from the heap, and then incrementing our _heap pointer to the next unallocated section of memory. Since the ZipCPU cannot (yet) read from unaligned memory, we’ll also need to make certain this pointer remains aligned.

void    *ugly_malloc(unsigned nbytes) {
	// Get the pointer to the next unallocated piece of memory
	// This is where our returned memory will be located..
        void    *r = (void *)_heap;

	// Advance our _heap pointer to the next available piece
	// of memory, while also guaranteeing that the pointer
	// remains aligned.
        _heap = _heap + ((nbytes+3)/4);

	// And return the result--the value of _heap when we entered
        return  r;
}

Using this ugly_malloc() function, we can now allocate some local variable space for each of our programs. This will be the “stack” space used by these programs, and so we’ll use this to set the SP register for each of the programs.

int     *new_stack_ptr(unsigned nbytes) {
	int	*ptr = (int *)ugly_malloc(nbytes);

	ptr = &ptr[((nbytes+3)>>2)-1];
	return ptr;
}

Thus function allocates a section of memory nbytes in length, and then returns a pointer to the end of it. Our tasks won’t actually write to this end value, but will instead back up this pointer by however much space they need as they need it. This is illustrated in Fig. 11 by an upwards arrow within each task’s stack memory area showing that the stack grows upwards towards low memory.

The astute observer will notice that the stack spaces, which are illustrated in Fig. 10, both for the supervisor and each of the user contexts, are not unlimited. If they grow too far, the stack will overflow into other memory regions causing … lots of problems. Picking how much space to allocate for the stack is therefore quite important, and a problem for which I don’t (yet) have a good solution for.

Since we’ve chosen not to use any system libraries for this code, we’ll need to write our own memzero routine so that we can start our tasks off with a clean slate. This memzero() (should-be a) library function is also pretty basic,

void    memzero(void *p, int v, unsigned cnt) {
	char    *cp = p;
	for(int i=0; i<cnt; i++)
		*cp++ = v;
}

Much as one might expect.

Incidentally, we could do this operation four times faster if we took advantage of the fact that our memory is both word aligned and an integer number of words. For now, we’ll leave this as an exercise for the student to try later.

Now that we have this background under our belt, it’s time to build our multi-tasking blinky.

Starting at the top, the CPU will begin by executing a bootloader. This function copies our program into memory (if necessary), and then initializes any global variables. Once complete, the bootloader will call our main() function–giving the appearance that this is where our program starts. Once in main(), we will first define all of our tasks (i.e. contexts), as well as a current task pointer to point to the task that is currently active. The tasks themselves in this simple example consist of nothing more than the task_context structure we defined above containing the register values of each running program.

int	main(int argv, char **argv) { // **s
        unsigned        heartbeats = 0;
        task_context    tasks[NTASKS];
        task_context    *current;

        memzero(tasks, 0, sizeof(task_context)* NTASKS);

I’ve also declared a heartbeats variable as well. We’ll use this to debug our program and to determine if anything has gone wrong and we need to enter into the debugger. Specifically, if ever the heartbeats counter stops counting, we’ll know our program is dead.

There’s actually a couple ways of implementing this heartbeats idea.

  1. One way, shown above, is to create a variable on the stack to hold this value.

    The first problem with this approach is that the compiler might move this variable into a register. When it’s in a register, we’ll need to use the CPU’s debugging interface to read it in order to know if the heartbeats have ever stopped, rather than reading this value from memory using the debugging bus.

    There two big problems with this approach. The first is knowing whether heatbeats is in a register vs being in memory. The second problem is knowing which register heartbeats is kept within. Until the ZipCPU supports a source-level debugger, this will require examining some (dis)assembly to see how the compiler allocated it. You can use zip-objdump -D intdemo > intdemo.txt to examine this (dis)assembly. Since I find myself doing this so often, there’s a make option in the Makefile to make intdemo.txt which does almost exactly this. The make option puts some other information into intdemo.txt as well, so feel free to try it out yourself and see what you think.

  2. A second way to handle the heartbeats value would be to declare it as a volatile unsigned value. If we do that, the heartbeats value will be forced into local (stack) memory. We can then use our debugging bus to read it even while the CPU is running.

    The problem is, which memory address will heartbeats get placed into? Typically, as long as heartbeats is the first value declared in main(), it’ll always be at the same place on the stack, but finding this place the first time might take some work.

  3. A third option would be to declare heartbeats as a global variable.

unsigned	heartbeats = 0;

int	main(int argc, char **argv) { // **s
	// ... code follows

Were we to do this instead, wbregs has an option where, if given a map file, you can read this value by name.

% wbregs -m obj-zip/intdemo.map heartbeats

But let’s get back to our program. We just created memory to hold NTASKS register sets (contexts). Now let’s give them some initial values. The most important values to provide are the stack pointer and the program counter.

// Define a void function pointer type
typedef void    (*voidfn)();	//*

// ...
int	main(argv, char **argv) { // **s
	// ...
	// returning back to our main() function ...
	//

	// Give each task a stack, and a program counter
	for(int i=0; i<NTASKS; i++)
		// Allocate 256-bytes of local memory for each task
		tasks[i].r[13] = (unsigned)new_stack_ptr(256);

	// Set the "Program Counter" for each context, giving each
	// task a function to start processing on entry
	tasks[0].r[15] = (unsigned)((voidfn)one_hz);
	tasks[1].r[15] = (unsigned)((voidfn)two_hz);
	tasks[2].r[15] = (unsigned)((voidfn)three_hz);
	tasks[3].r[15] = (unsigned)((voidfn)five_hz);

Once we’ve done all that, we’ve almost finished our startup processing. There are only a couple steps left.

The first is to choose to start the first task, and then to load the user registers from that task pointer. The ZipCPU toolchain provides a zip_restore_context() function to make things easier. This function expands into some code to copy the register values from the address given, in this case a pointer to current->r[0], into the user register set, uR0 through uPC.

	current = &tasks[0];
	zip_restore_context(current->r);

Once done, we can then issue a “return-to-usermode” instruction, assembly RTU or zip_rtu() from C, to switch from supervisor to user mode.

Only, we can’t do that just yet. ZipCPU programs only exit user mode on interrupts, exceptions (i.e. faults), and traps (system calls). Since our program shouldn’t be creating any exceptions (if it works), and since we’re not issuing any system calls, we’ll need another way to grab control back from userspace: interrupts.

Let’s therefore set our timer to interrupt us every millisecond.

	*_bustimer = (0x80000000)|((CLKFREQHZ-1)/1000);
	// Enable interrupts, and our interrupt in particular
	*_buspic = 0x80008000 | BUSPIC_BUSTIMER | (BUSPIC_BUSTIMER<<16);

	while(1) {

It’s now time to write out main logic loop. We’ll start out by clearing every other LED, and incrementing our heartbeats counter. We can then run the user task.

               int     picv;

                *_spio = 0xaa00;
                heartbeats++;

		// Run the user task, via a "return-to-userspace" insn
                zip_rtu();

Once we issue the RTU instruction, the ZipCPU will switch to using the user-register set. Since everything is captured by this context, switching to this user-register set will feel like switching which program is running.

Unlike many other CPUs which have a single register set, the ZipCPU maintains the supervisor’s context while running in user mode. That means that the supervisor program counter, stack pointer, and indeed all of the supervisor registers are maintained until the CPU returns to supervisor mode from user mode. What that means is that, on an interrupt, the CPU will continue running this supervisor function where it left off.

The more traditional approach would be for the CPU to suddenly jump to an interrupt service routine. The address of such a routine would be kept in a special memory location that the CPU could look up and start from when the context switch needed to take place.

By just switching register sets, the ZipCPU is kind of unique in this way. I personally find it easier to write multi-tasking programs as a result.

On our return, the first thing we’ll do is set those LEDs we just cleared–every other LED is now set to indicate we are in supervisor mode. I’ve found this to be a really useful way of debugging what goes wrong when things don’t work: if these LEDs are on, the CPU is in supervisor mode, else it is in user mode.

		// Turn on LED's 1, 3, 5, 7, ...
                *_spio = 0xaaaa;

We’ll turn these off again before we leave supervisor mode.

Our next step is to check the user-mode CC register to see if we left user mode as a result of some form of exception. If so, we’ll call a panic() function–more on that later. It’s important to enter the panic() function as soon as possible once we detect an exception, so that we can debug anything that went wrong with as little change to the system as possible.

                if (zip_ucc() & (CC_EXCEPTION))
                        panic();

At long last, it’s now finally time to check the interrupt controller to see if an interrupt has taken place. If so, we’ll increment the milliseconds counter that all of the tasks are using.

		// Read the current state from the interrupt controller
                picv = *_buspic;

		// Check if the timer has triggered an interrupt
                if (picv & BUSPIC_BUSTIMER) {
                        // Timer interrupt triggered

			// Increment our millisecond counter
                        milliseconds = milliseconds+1;
                        if (milliseconds >= 1000)
                                milliseconds = 0;

We’ll then want to acknowledge this interrupt, so we don’t get interrupted again until the next millisecond interrupt.

                	// Reset the interrupt
                	*_buspic = BUSPIC_BUSTIMER;
                }

Our last step is to swap tasks. This is done in three parts. The first part copies the user mode registers into the task_context memory. In this case, since we’ve kept a pointer to our current task context this is fairly straightforward.

                // Swap tasks
                zip_save_context(current->r);

The second step is to decide which task to call next. This is often called “CPU scheduling” or “task scheduling”, and many articles have been written on this topic. We’ll just keep it simple here and move to the “next” task in our list. In hind sight, it might’ve been easier to maintain a current task_id index as well, but I’ll leave that to you to do.

                for(int i=0; i<NTASKS; i++) {
                        if (current == &tasks[i]) {
                                current = &tasks[(i+1 >= NTASKS)?0:i+1];
                                break;
                        }
                }

The final step in a ZipCPU context switch is to restore the registers from the new “current” task back into the user register set.

                zip_restore_context(current->r);
        }
}

Task swapping like this is more often an assembly function, and so these two builtin-function calls simply implement what would be those assembly instructions. They are easily identifiable from the ZipCPU disassembly since these are the only instructions that will reference the uR register set. If you are really interested in actually seeing their definition, you can find it in the GCC ZipCPU patchset. Here it is for saving the context, and here again for restoring the context. The function basically works by reading four registers from either memory or the user register set, then writing them to the user register set or memory. Indeed, the function is little more than a memory copy.

That leaves us with only one loose end to return to: the panic() function.

The purpose of this function is just to tell us that something has gone wrong, and we need to do some debugging. It also helps us start that process by helping us identify which problem has taken place. I mean, we’re writing a blinky function therefore it should be obvious if there’s a problem–the LEDs won’t blink like they should. But how to start debugging next?

  1. We’ve chosen to set certain LEDs to indicate we are in supervisor mode (interrupts disabled), and others to indicate we are in user mode (interrupts enabled).

    This supervisor mode indicator LEDs should blink so fast that they appear to be lit dimly. If they ever turn off, or start shining brightly, then we can therefore identify which mode we were in when the CPU stopped.

  2. The purpose of the panic() function is to help us diagnose what happened to a broken subtask. In this case, if we just stop the LEDs from blinking, we might not be able to tell the difference from a CPU freeze above.

    Therefore we’ll blink all of our LEDs, either all on or all off together, to indicate a user exception took place.

This particular implementation of panic() uses a system power-up counter. This counter increases on every system clock, starting at power up, until the top bit is set. Once the top bit is set, the power up counter keeps that bit set and becomes a rolling 31-bit counter with the other bits.

We can grab bit 28 of this counter as an indication that we need to change the LEDs.

void    panic(void) {
	while(1) {
		int     v;
		v = *_pwrcount >> 28;
		v &= 1;
		if (v)
			*_spio = 0x0ffff;
		else
			*_spio = 0x0ff00;
	}
}

While we could’ve used the timer for this, understand that we are in a panic() situation. We want to leave the CPU state as un-changed as possible so that we can diagnose whatever fault took place. In particular, we’ll want to leave the user register set untouched. We also don’t know if the fault was associated with interrupt processing or task swapping or something else. For all of these reasons, this code is has been kept as simple as possible.

Another thing we could’ve done in this panic() routine would’ve been to issue a simulation NHALT (hardware NOOP) instruction to halt any simulator at the fault itself. By turning tracing on and then running the simulator like this, it’s fairly easy to figure out what went wrong. (Easy, perhaps, but still pretty intense–examining a trace is not trivial.) Alternatively, we could’ve triggered any CPU-focused Wishbone Scope. This is typically how I debug the CPU if a bug makes it to hardware and the user register set doesn’t tell me enough to know the cause of any fault.

Video

Want to see how we did? Check out the video below.

Fig 12. The LED's blink! Multitasking at work

In this video you’ll see a MAX-1000 board with 8-LED’s. Half of the LEDs are toggling at rates of 1Hz, 2Hz, 3Hz, and 5Hz. These are the far left LED, and every other LED to the right. The other four LEDs are shining dimly, indicating that the CPU truly is handling interrupts and swapping tasks as desired.

Conclusions

Since we’ve already covered both how to make an LED blink from Verilog, as well as from C, it only made sense that we’d discuss how to blink an LED in some more advanced fashions–such as by using an interrupt or from a multi-tasking program.

While blinking an LED might seem like an exceptionally trivial task, let’s consider what we’ve learned: We learned how easy it was to blink an LED from Verilog. Even synchronizing multiple blinking LEDs to within a clock period was fairly easy. Blinking an LED on a CPU has the advantage that it doesn’t typically take (much) more logic resources–but only after you’ve already paid for the CPU, it’s boot code, it’s bootloader, memory, the system bus and the interconnect/crossbar used to hold everything together.

Of course, this would be more valuable if we were doing something more than just blinking an LED. Still, we’ve demonstrated how to build an interrupt driven task, as well as how to split the CPU’s time across multiple independent task contexts, by using an interrupt to tell us when to switch contexts. Both of these capabilities are very powerful and can be used outside of a simple LED context.

Where the CPU starts to have an advantage over the FPGA fabric is where you need something to perform complex sequencing operations–such as performing a complicated startup script, or performing a complex script periodically. Depending on the complexity of the task, adding it to what a CPU is already doing might be cheaper than performing it in the fabric of the FPGA. At the same time, adding a CPU to an FPGA just to blink an LED is truly overkill.