FPGA news roundup: Microsoft “Catapult”, Intel’s hybrid and Xilinx OpenCL
There has been some activity in the FPGA realm lately. First, Microsoft has published a paper at ISCA (a very well-known peer-reviewed computer architecture conference) about using FPGAs in datacenters for page ranking processing for Bing. In a test deployment, MS reported up to 95% more throughput for only 10% more power. The added total cost of ownership (TCO) was less than 30%. Microsoft used Altera Stratix V FPGAs in a PCIe form-factor with 8GB of DDR3 RAM on each board. The FPGAs were connected with each other in a 6×8 torus configuration using a 10Gb SAS network. Microsoft mentioned that it programmed the FPGAs in Verilog and that this hand-coding was one of the challenging aspects of the work. However, Microsoft believes using high-level tools such as Scala (presumably a domain-specific subset), OpenCL or "C-to-gates" tools such as AutoESL or ImpulseC might also be suitable for such jobs in the future. Microsoft appears to be pretty happy with the results overall and hopes to deploy the system in production in 2015.
Intel revealed plans to manufacture a CPU-FPGA hybrid chip that combines Intel's traditional Xeon-line CPU cores with FPGAs on a single chip. The FPGA and CPU will have coherent access to memory. The exact details of the chip, such as number of CPU cores or the amount of logic and other resources of the FPGA, or even who is the source for the FPGAs (likely Altera), is not revealed. However, we do know that the chip will be package compatible with the existing Xeon E5 line. Intel mentions that FPGAs can deliver "up to 10x" the performance on unspecified industry benchmarks. Intel further claims its implementation will deliver another 2x improvement (so 20x total) because of coherency and lower CPU-FPGA latency. We will have to wait for more information about this product to validate any of Intel's claims. It will also be interesting what software and development tools Intel provides for this chip.
Finally, you may remember our previous coverage of OpenCL on Altera's FPGAs and we had mentioned that Xilinx had some plans for OpenCL as well. Recently (~2 months ago) Xilinx updated its Vivado design suite and now includes "early access" support for OpenCL.
Overall, these announcements point to increased adoption of FPGAs in mainstream applications. Datacenters are very focused on performance per watt as they tend to be power limited, with increasing performance needs. Progress on scaling performance through multicore CPUs has slowed, and relying on GPUs to increase overall performance per watt has an upper bound as well. In a power constrained environment where two different types of general purpose processors are limited by progress on the process node side, we need to find another option to continue to scale performance. In an ideal world, one may design application-specific integrated circuits (ASICs) to get the highest performance/watt, but ASICs are hard to design and once deployed cannot be changed. This solution is not a good fit for datacenter applications, where the workload (such as algorithms for search) are tweaked and changed over time. FPGAs can offer a happy medium between CPUs and ASICs in that they offer programmable and reconfigurable hardware and can still offer a performance/watt advantage over CPUs because they effectively customize themselves to the algorithm. Making FPGAs accessible to more mainstream application programmers (i.e. those who are used to writing C, C++, Java etc. and not Verilog) will be one of the problems and tools such as OpenCL (and more) are gaining steam in this space.