If you are looking for raw computational speed, FPGA’s are a well known solution.
Unlike FPGA’s, a CPU’s speed is limited by the fact that the CPU must perform its instructions one at a time, even though many algorithms could be made to run faster. The CPU’s general purpose nature simpy slows it down when compared to what could be done within an electronic circuit.
Not only can FPGA’s run their algorithms in parallel, but they can also tailor their silicon for just your algorithm. Modern CPU’s support divides, floating point operations, and SIMD instructions, many of which may have nothing to do with what you are trying to accomplish. The FPGA on the other hand has a majority of its logic available for you to configure. So, just how many copies of your algorithm do you wish to configure onto your FPGA solution?
While you might consider using an ASIC to run your algorithm even faster, FPGA’s they tend to be a lot cheaper to debug and manufacture than ASIC’s are. For example, making a mistake on an FPGA usually only costs your time to fix it, whereas a mistake within an ASIC design may cost you many millions of dollars.
Just don’t neglect the speed of your interface while you consider engineering an FPGA solution.
It’s the interfaces, Sir
Want to calculate a Haar wavelet transform on an image? FPGA’s have enough logic to run the algorithm many times over within them! You can run the transform horizontally, vertically, no problem–the raw task is easy for an FPGA.
That’s not the hard part.
The hard part is feeding the algorithm with data, and getting the results back out fast enough to be competitive with the alternatives.
Perhaps an example will help explain what I’m talking about.
Years ago, I had the opportunity to work on a really neat GPS processing algorithm. If you are familiar with GPS processing, you’ll know that the success of a GPS processing algorithm is based upon how many correlations you can do and how fast you can do them.
In my case, I was starting with a special algorithm that had been demonstrated in software, and had proven itself in software on a general purpose CPU. My problem was that the algorithm ran slower than pond water. My team needed speed. For other reasons, there was an FPGA connected to to our CPU microcontroller. So we asked ourselves, why not use the FPGA to speed up the processing?
Specifically, I was interfacing an ARM based PXA271 XScale CPU, built by Intel at the time, with a Spartan 3 FPGA, from Xilinx. The GPS algorithms performance was dominated by the number FFT’s the CPU had to accomplish. Why not do those FFT’s within the FPGA, and run them that much faster?
If you look at Xilinx’s FFT IP core, they offer a pipelined FFT core that can run one FFT sample per clock–just like they did when I was working on this problem. Hence, an N point FFT costs N clocks to ingest into the FFT, and after some (short) processing delay it takes N clocks to get the data back out. How much faster could you get? The ARM CPU took many more clocks than that to process the FFT, so this should be much faster, right?
So I built it.
With a little bit of debugging, I managed to get it all to work. It wasn’t all that hard technically to build, and speed was very important to our application. So, we ran the algorithm with a stop watch, anxiously waiting to see how much faster it would run.
That’s when I learned the painful lesson that an algorithm’s speed is dependent upon the interface speed that feeds the algorithm. In our case, the interface was so slow that just transferring the data to the FPGA and reading the results back took more time to do than to perform the FFT in the first place.
Learn the interfaces!
This is one of the reasons why the study of FPGA design needs to include a study of interfaces and how to interact with them. Indeed, if you look at one of my favorite FPGA websites, fpga4fun.com, you’ll find a lot of discussions about how to build interfaces. They discuss serial ports, I2C, SPI, JTAG, simple video ports (play Pong!), HDMI, and more. All of these interfaces have their purpose, and the FPGA student is well served by studying how to interact with them.
So, for this reason, let me recommend to you that before you spend your whole dime on making your FPGA run super fast with multiple copies of your algorithm all running in parallel, that you at least spend as much (or more) of that dime guaranteeing that the FPGA can read and write your data fast enough to keep your FPGA busy running that super-algorithm.
But ye said, No; for we will flee upon horses; therefore shall ye flee: and, We will ride upon the swift; therefore shall they that pursue you be swift. (Is 30:16)