AutoFPGA's linker script support gets an update
It’s been a while since I’ve discussed AutoFPGA. If you remember from my introduction to it, it’s a System on a Chip composition tool designed to compose a design together from multiple components. While most of the work it does consists of copying tags from a configuration file to one of several output files, it will also assign addresses to peripherals on a bus and create an interconnect to connect those peripherals together.
I’ve now used AutoFPGA on several of my projects. I used it first on my VideoZip project. I maintain a ZipCPU simulation test capability in ZBasic, another project that uses AutoFPGA. My iCE40 designs, both ICOZip for the icoboard and TinyZip for the TinyFPGA BX, both use AutoFPGA. Even OpenArty is slowly getting an AutoFPGA upgrade.
Why? Because (when done right) it makes it easy to compose systems from other components. Components may be added to or removed from a design simply by adding or removing them from the AutoFPGA command line and rebuilding.
Why not other tools? Because I really didn’t want to learn another language besides the Verilog, C++, make, and so forth that I already work with. But aren’t AutoFPGA scripts a new language in themselves? To some extent this is true, although the purpose of AutoFPGA remains being a tool that does its job and gets out of the way.
However, when I went to build a design for the TinyFPGA BX, I discovered a big hole in AutoFPGA’s capabilities. While it has always created linker scripts, the script it has created hasn’t had the flexibility required to handle such diverse designs as the very spartan TinyFPGA BX as well as the more full featured Nexys Video board from Digilent.
Understanding the problem
To understand the problem, we’ll need to take a look at how memory is used within an FPGA SoC design.
The CPU within an SoC needs access to memory for several purposes. It needs a place to store its instructions, another space for global data structures, another space for allocable data structures commonly called a heap, and finally a stack space to support local variables.
One common arrangement of the address space to support these various
purposes
contains a keep out region near
zero, followed by
code instructions,
data structures,
the heap, and then a
stack
in that order, as shown in Fig 1. The
stack
typically starts at the
end of memory and grows upwards with each subroutine call, whereas the
heap
typically starts at the end of global variable memory and grows downwards
with each malloc()
call.
However, FPGA systems tend not to have one monolithic type of memory. They typically have several memory types within any design. These basic memories include:
-
This is a slow non-volatile memory. It is great for initial instructions to get a program off the ground. Since it is slow to access, it may not be ideal to execute programs from, although small designs may need to do just that.
The two big details you need to know are that flash is slow, and it is very difficult to write to as part of a program. This makes it really good as a ROM memory, but not so great for other purposes.
-
This is the ideal type of RAM you’d want to use in any system. It is fast. It’s simple to use and create. The cost to access one part of block RAM is the same as the cost to access any other part of block RAM.
The big downside of block RAM? You only have a limited amount of it. For example, the iCE40HX8k FPGA typically has only about 8kB of usable block RAM. Yes, the data sheet will claim 16kB of block RAM. Realistically, some of that 16kB will be used elsewhere in the design, so the most you are likely to get is probably going to be about 8kB of block RAM.
-
This is similar to block RAM, but not quite as fast or as simple to work with. Like block RAM, it is volatile. However, it tends to be off chip, slower to access, fairly easy to build a controller for, and it also cheap enough that you can have more memory than block RAM. The drawback is the technology within: SRAM tends to use more power and take more room than the dynamic types of RAM.
One cool feature of SRAM is that if you ignore it, and don’t cut the power, the memory doesn’t change. As a result, Digilent once had a Nexys board design that allowed you to load the SRAM with one FPGA configuration, and then swap FPGA configurations. Sadly, the board with this capability is no longer actively marketed and there may only be a small number of these boards left. As I understand the story, Digilent struggled to get the SRAM chips they needed to continue manufacturing the boards, and so they were forced to switch to SDRAM.
-
Synchronous, Dynamic Random Access Memory (SDRAM)
The big grandaddy of all RAM devices tend to be the SDRAMs.
By this I’m going to include not only the simpler SDRAMs, but also the DDR, DDR2, and DDR3 SDRAMs. Since these RAM devices are built out of capacitors, the memory can be made compact, and so they are inexpensive to manufacture, and therefore some of the cheapest RAM devices to purchase. The biggest drawbacks to SDRAM are that the controllers tend to be complex, and the access latency tends to be high. How hard are the controllers? Well, let’s just say that I have yet to complete my first working DDR3 SDRAM controller. I know it’s possible, since LiteDRAM has built some awesome SDRAM controllers. Other than that, SDRAMs tend to be high volume low cost devices.
-
A newcomer to the digital design space are the Hyperram chips. These really belong in the SDRAM category above, since they tend to be built from SDRAMs internally. The big difference is that HyperRAMs have a simpler interface that is easy to build a controller for. Likewise, HyperRAMs tend to have lower latencies than many other DDR SDRAM solutions, since the complex SDRAM array control is handled within the HyperRAM chip itself.
Ok, so that’s what we have to play with. What might an FPGA address space look like with these various types of RAMs?
Block RAM Only
The simplest memory configuration we might build would be a block RAM only configuration. This configuration would be built as though there were no other memories in the system. It would typically consist of a keep-out address range near zero, addresses for the various peripherals, then the block RAM address itself.
Of course, the problem with this configuration is that block RAM is both limited and volatile: it won’t have the values we need within it when we power up our new design, or later when we reset our design. Still, this is a great memory model if you are first bringing up your CPU, and you haven’t yet debugged any other types of memory.
I’ll admit I’ve even thought about segmenting the block RAM into both a read only component, or block ROM if you will, and a volatile block RAM component.
Why would I do this? Because it seems like few processors measure their Dhrystone performance in the absence of their bus. Were I to build a system like this, I might be able to measure the speed of the ZipCPUs instruction set independent of the bus implementation.
Of course, the problem with both of these designs is that block RAM is scarce. What else might we use?
Block RAM and Flash ROM together
Most FPGAs, have a SPI flash of some type which can be used as a ROM. The flash itself exists for the purpose of storing the FPGA’s power up configuration, but typically there’s 75% of the flash left over once that is done. Hence, you get this ROM memory. for “free” with the price of your board.
When block RAM isn’t enough, or alternatively when you want your program to run from non-volatile memory, this flash is available to you. Indeed, some FPGA boards don’t really have much more than block RAM and flash devices to act as memories as discussed above. Examples of these boards include the TinyFPGA BX, the iCEBreaker board, and the CMod S6. This leads to a memory space such as Fig. 4 below.
The original linker script I used for my CMod S6 design placed all of the CPU instructions in flash following the FPGA’s configuration, and all the data memory into the block RAM. This configuration is shown in Fig. 5 below.
In this figure, the ‘D’ below the
block RAM represents
global data,
‘H’ represents the heap, and
the ‘S’ represents the
Stack memory.
Likewise the
CPU’s
flash
memory area would start following the
FPGA’s
configuration data, shown here as FPGA
.
This would then be followed by a bootloader ‘B’, traditional instructions
Insns
and any constant program data Const
. The purpose of the bootloader
was to move any pre-initialized global data, shown here as D
, to
the beginning of the
block RAM.
When the design failed to meet my real-time speed requirements, driven by the
need for an audio output, I then came back and placed certain instructions,
those in critical sections of my code that needed to run at high speed, into the
block RAM–copying
them from their original location
in flash. This new configuration
is shown in Fig 6 below, with the K
section denoting these high speed
instructions that needed to be copied to
block RAM
by the bootloader B
.
While I managed to solve this challenge, it was a challenge that needed to be solved and the solution I found won’t necessarily work for all designs. Imagine, for example, if I wanted to load the C-library into block RAM. It’s not going to fit no matter how you try to squeeze it. (It’s not a pair of Levi’s) Therefore, given that flash is slow, you might wish to move up to a faster RAM type: SDRAM.
Flash and SDRAM
Some of my larger devices, such as my Arty A7 or my Nexys Video boards, have a DDR3 SDRAM as well. The XuLA2-LX25 SoC I have also works with an SDRAM, just not a DDR3 SDRAM. Either way, an SDRAM chip provides a lot of memory, allowing programs to copy themselves from the flash device to the SDRAM device. This could easily fit the model above, only we would now replace the block RAM with SDRAM. Not only that, for speed we could copy our instructions from the extremely slow flash onto the SDRAM.
But what about that block RAM? How might we use it now?
The classic answer would be to use all of the block RAM on your device as caches for the CPU. This would mitigate the latency found within the SDRAM.
Flash, Block RAM, and SDRAM
Alternatively, we could place certain memories, at our discretion, within the block RAM. I’ve often done this with the stack memory, but you could also do this with any kernel memory that needed to be low-latency as well.
Flash, Block RAM, and HyperRAM
Now, just when you think you have everything figured out, someone will give you an auxiliary memory chip, such as this Hyperram from one bit-squared, and you’ll wonder how to integrate it with the rest of your system. It may never be a permanent fixture to any given design, or it may be the SDRAM the iCEBreaker was was lacking. Either way, you now need to quickly and easily reconfigure the design you once had working.
My whole point is that, in the realm of reconfigurable memory spaces, the place where you want to keep all the various parts of your software programs will likely keep changing.
AutoFPGA was just given an upgrade to handle just that issue.
The basic Linker Script File
The linker scripts that I build tend to have four parts to them. First, the script describes a pointer to the first instruction the CPU will execute. The second block declares the various memories on board. The third part declares some fixed pointers that can then be referenced from within my code. Finally, the fourth part describes how the various components of my design will be laid out in memory. Let’s take a look at what this might look like.
The following is an AutoFPGA generated script to handle a block RAM only configuration on the Arty platform.
Binutils
supports script comments delimited by /*
and
*/
.
The generated
script
therefore begins with a block of legalese comments, followed by
the entry point for your program.
The important part of this section is the ENTRY()
command, which
specifies that the
CPU
entry point will be _start
. This label will be set
by the linker to point to the entry point in your code. For the
ZipCPU,
this is always the first instruction in the instruction address space.
As for the legalese, if you don’t like my legalese then feel free to replace it
with your own. The legalese in the
AutoFPGA output
files is copied from a file I typically call
legalgen.txt,
and introduced through
AutoFPGA
via a @LEGAL=
tag in the
global.txt
file. Further, as the owner of
AutoFPGA, I assert
no ownership rights over the designs you create with it, just over the
AutoFPGA
code itself–which is released under
GPLv3.
The second section is the MEMORY
section. This section lists the address
location and length of every physical memory component within the system.
The comment you see in this section below was added by
AutoFPGA.
It is one of many throughout the various
AutoFPGA
generated files to help guide you through the process of creating and updating
AutoFPGA
configuration files.
This MEMORY
section contains a list of all peripherals that contained
a @SLAVE.TYPE
key with a MEMORY
value. If you recall,
AutoFPGA
works off of configuration files containing
@KEY=VALUE
statements. The @SLAVE.TYPE
key currently supports one of
four types of peripherals: SINGLE
, DOUBLE
, OTHER
, and MEMORY
.
What makes MEMORY
peripherals different is that they are included in the
linker script MEMORY
section above. You can read more about this in my
AutoFPGA
icd.txt file.
The ORIGIN
value is assigned by
AutoFPGA when
AutoFPGA assigns
addresses. The LENGTH
value, indicating the
size of the peripheral, is given by the @NADDR
tag times the byte-width
of the bus the peripheral is on. Hence an @NADDR
of 0x8000
will create
a LENGTH
of 0x20000
as shown above for a 32-bit wide bus.
The names given above come from either the @LD.NAME
tag within the peripheral,
or the peripheral’s name itself as found within its @PREFIX
tag.
The point is that as your design is composed, and the memories given addresses,
AutoFPGA
supports this reconfiguration by creating and populating the MEMORY
section
of the linker script.
The next section contains a variety of symbol declarations and assignments. These symbol names, if defined and used within your C/C++ code, will be replaced with the values given below.
First, all of the MEMORY
peripherals are given names and values pointing
to the beginning of their memory regions.
Second, if there is an LD.DEFNS
tag within the
AutoFPGA script,
its value will be copied into this section as well.
Together, the sections above tell the linker that we have three types of memories, block RAM, flash, and SDRAM. It identifies the origins of those memories, their lengths, and then creates symbols so that your code can access these values.
Next, the _kram
, _ram
, _rom
, and _top_of_stack
symbols are used by
the ZipCPU’s
bootloader
to load items from
ROM
into a high-speed kernel
RAM (i.e.
block RAM, if used)
or otherwise into regular
RAM
(i.e. an SDRAM).
Finally, the top of the
stack
is set to be the end of the
block RAM section
in this design.
These are just symbols assigned to values. We haven’t described any real linking yet. Those instructions are found in the next section.
This last section describes where the various segments of your program need to be placed into memory. In this example, I define a new memory section starting at the origin of the block RAM, aligned on units of 4 octets, and filled with a series of segments.
There are also a series of assignments in this section as well. These define
both values that will be used by the
bootloader,
such as _ram_image_start
and _bss_image_end
, as well as an ending value
which will then be the pointer to the beginning of the heap, _top_of_heap
.
A simple pair of lines within your C++ code, such as,
will allow you to get the value of this _top_of_heap
value, and to initialize
the heap
pointer with it.
But what about those sections? Here are some of their basic meanings:
-
*(.start) *(.boot)
: These two segments are ZipCPU specific segments. The*(.start)
segment is used by the ZipCPU to make certain the startup code is the first set of instructions following the reset address–which is typically the beginning of theSECTIONS
area although not in this case. The most important part of this startup code is that it sets the stack pointer that everything else will depend upon, and then jumps to the bootloader. When the bootloader returns, it then jumps to yourmain()
function. Whenmain()
returns, it halts the CPU.The
*(.boot)
code is another ZipCPU section where I place the bootloader instructions.Both of these need to come early in the code order, primarily for the times when I need to copy instructions from flash to RAM–although they aren’t necessarily used in this example.
-
*(.kernel)
: I created this ZipCPU specific section to support my S6SoC project. Any code placed in this section will be copied to the fastest RAM in the project (block RAM), in case the CPU has code that must run at high speed.Both the
*(.kernel)
section as well as the*(.start)
and*(.boot)
sections are unknown to the binutils linker or GCC. The code to be placed in these sections must specifically be marked as such. -
*(.text*)
: These sections contain the instructions for the program in question. Now that we have all the nastiness above out of the way, we can actually place these sections, with the*(.text.startup)
section among these placed into memory first. -
*(.rodata*) *(.strings) *(.data) *(COMMON)
: These sections contain the read-only (i.e.const
) data used by my program, any strings within the program, and finally any global data structures with initial values.The bootloader needs to copy these sections into their places, but nothing else is required.
-
*(.bss)
: The final section is the BSS segment. Unlike the other segments above, where the bootloader just needs to copy them into place, the BSS segment needs to be cleared to all zeros. This is where any uninitialized global variables within your program will be placed.
There’s one other thing you need to know about this section, the }> bkram
notation. This means that the section just described should be allocated
a place in the bkram
device. Something else you might see is
}> bkram AT>flash
. This means that the section needs to be placed into
bkram
, and that your code needs to be linked as though the section were
placed into bkram
. However, it is first placed into the flash
memory
area, and left there for your
bootloader
to copy it into bkram
.
Now that you know what the various sections of this file are, and how the segments within your program will be allocated among them, what happens if we want to do something else?
Multiple Linker Configurations
Originally, AutoFPGA
created one linker script, called board.ld
, and adjusted it based upon
the peripherals available to it. For example, it could handle designs with
Flash and
SDRAM,
but couldn’t really do much with
Flash and
block RAMs.
This worked great for some designs, such as those with a massive amount of
RAM
as shown in Fig. 7 or 8 above, but horrible for others, such as Fig. 2
through 6 above.
As an example, if I wanted a design to run from
block RAM alone,
such as to test the
CPU
itself apart from its memory peripherals with
the form in Fig. 2 above, this one size fits all
linker script
would have be inadequate. Likewise, if I had a design that didn’t have enough
room in RAM
to copy the various program segments into (imagine the C-library here), the
stock linker script wouldn’t work either. While I could create a script by
hand for each of these scenarios, such as I was starting to
do
in my TinyZip design, that script would
then need to be updated by hand every time the addresses in
the MEMORY
region changed.
This was getting annoying.
To deal with this, I just recently created some new AutoFPGA tags for creating linker scripts:
-
@LD.FILE
: If present in a given configuration file, AutoFPGA will create a linker script and write it out to the named file. -
@LD.DEFNS
: If present, these definitions will be added to the definitions section of the new linker script.Well, sort of. What if a design has multiple linker script configuration files? In this case, the components that have no
@LD.FILE
tags will have their@LD.DEFNS
tags copied to all linker scripts, while the components with an@LD.FILE
tag will have their@LD.DEFNS
tag copied into the linker script defined by that component only. -
@LD.SCRIPT
: This tag, containing theSECTION
component above, will be copied into the linker script associated with the@LD.FILE
tag in the same component verbatim, although with variable substitution applied. So, for example, if our design creates aRESET_ADDRESS
tag within the peripheral namedzip
(i.e. having aPREFIX
tag ofzip
, then we might reference@$(zip.RESET_ADDRESS)
to get a copy of what that address was here in this location.
Several former linker tags have kept their functionality, but now have new names.
-
@LD.NAME
: This is the name of the memory component, as found in the linker script. In the example above, we had names ofbkram
,flash
, andsdram
. This tag used to be called@LDSCRIPT.NAME
. -
@LD.PERM
: TheMEMORY
section of a linker script requires a permission string. The binutils documentation calls this a set of attributes. So far, I’ve only usedrx
andwx
for executable ROM and executable RAM respectively. Other possible attributes are defined can be found in the binutils documentation. AutoFPGA does nothing more than copy then from your design file to theMEMORY
section of the linker script.Remember, AutoFPGA is primarily a copy-paste tool with the ability to compose bus interconnects, and a limited variable substitution and expression evaluation capability sprinkled within. Similarly, another of the goals of AutoFPGA was that when it’s work was done, the computer generated files would be comprehensible, rather than your more typical computerese.
-
@LD.ENTRY
: If present, this will define the entry symbol for a given linker script. If not specified, this will default to the_start
symbol as above.
This updated method of generating custom linker script has now worked so well for me that I have several linker scripts defined for the AutoFPGA upgrade to my OpenArty project: one for block RAM only, another for flash plus block RAM, and I’ll be adding a third for flash, block RAM, and SDRAM support. Even better, using this approach, adding support for a HyperRAM controller should be just as simple as copying the controller components to my RTL directory (or a subdirectory of it) and adding the HyperRAM AutoFPGA linker script configuration to my design.
Conclusion
Working with one CPU design across many different hardware components and capabilities can be a challenge. It can be difficult to take a basic design and rapidly configure it for a new set of hardware, or to maintain support across several different hardware implementations. AutoFPGA can handle many of these reconfiguration needs, to make reconfiguring designs from one hardware configuration to another easier.
Even better, AutoFPGA’s linker script generation just got an upgrade to help it deal with the need for multiple different memory configurations–either between designs or even within the same design.
Of course, the unwritten reality of this article is that I don’t really want to spend my time writing linker script. I would rather be spending my time getting my new HyperRAM to work. This is just my way of trying to simplify the massive configuration challenges I have along the way.
Let him that stole steal no more: but rather let him labour, working with his hands the thing which is good, that he may have to give to him that needeth. (Eph 4:28)