News


ISC 2014: NVIDIA Tesla Cards Add ARM64 Host Compatibility

ISC 2014: NVIDIA Tesla Cards Add ARM64 Host Compatibility

Kicking off this week for the world of supercomputing is the 2014 International Supercomputing Conference in Leipzig, Germany. One of the major supercomputing conferences, ISC is Europe’s largest supercomputing conference and as one would exp…

QNAP TS-x51 NAS Series: Intel Quick Sync Gets its Killer App

At Computex 2014, we visited QNAP and came away with a lot of information (some of which had already been demonstrated at CES). After Computex, QNAP got in touch with me to better explain the various features of the newly introduced TS-x51 series (which was not at CES). And, boy, was I floored?! Usually, you don’t see me getting very excited over a product announcement. However, I believe that QNAP’s TS-x51 family has the capability to revolutionize the NAS market for home users and media enthusiasts, particularly in the way it utilizes Intel Quick Sync technology. Read on to for our analysis of where that market segment is headed, and why the TS-x51’s unique feature set may be the start of interesting things to come.

FPGA news roundup: Microsoft "Catapult", Intel's hybrid and Xilinx OpenCL

FPGA news roundup: Microsoft “Catapult”, Intel’s hybrid and Xilinx OpenCL

There has been some activity in the FPGA realm lately.  First, Microsoft has published a paper at ISCA (a very well-known peer-reviewed computer architecture conference) about using FPGAs in datacenters for page ranking processing for Bing. In a test deployment, MS reported up to 95% more throughput for only 10% more power. The added total cost of ownership (TCO) was less than 30%. Microsoft used Altera Stratix V FPGAs in a PCIe form-factor with 8GB of DDR3 RAM on each board. The FPGAs were connected with each other in a 6×8 torus configuration using a 10Gb SAS network. Microsoft mentioned that it programmed the FPGAs in Verilog and that this hand-coding was one of the challenging aspects of the work. However, Microsoft believes using high-level tools such as Scala (presumably a domain-specific subset), OpenCL or “C-to-gates” tools such as AutoESL or ImpulseC might also be suitable for such jobs in the future. Microsoft appears to be pretty happy with the results overall and hopes to deploy the system in production in 2015.

Intel revealed plans to manufacture a CPU-FPGA hybrid chip that combines Intel’s traditional Xeon-line CPU cores with FPGAs on a single chip. The FPGA and CPU will have coherent access to memory.  The exact details of the chip, such as number of CPU cores or the amount of logic and other resources of the FPGA, or even who is the source for the FPGAs (likely Altera), is not revealed. However, we do know that the chip will be package compatible with the existing Xeon E5 line. Intel mentions that FPGAs can deliver “up to 10x” the performance on unspecified industry benchmarks. Intel further claims its implementation will deliver another 2x improvement (so 20x total) because of coherency and lower CPU-FPGA latency. We will have to wait for more information about this product to validate any of Intel’s claims. It will also be interesting what software and development tools Intel provides for this chip.

Finally, you may remember our previous coverage of OpenCL on Altera’s FPGAs and we had mentioned that Xilinx had some plans for OpenCL as well.  Recently (~2 months ago) Xilinx updated its Vivado design suite and now includes “early access” support for OpenCL.

Overall, these announcements point to increased adoption of FPGAs in mainstream applications. Datacenters are very focused on performance per watt as they tend to be power limited, with increasing performance needs. Progress on scaling performance through multicore CPUs has slowed, and relying on GPUs to increase overall performance per watt has an upper bound as well. In a power constrained environment where two different types of general purpose processors are limited by progress on the process node side, we need to find another option to continue to scale performance. In an ideal world, one may design application-specific integrated circuits (ASICs) to get the highest performance/watt, but ASICs are hard to design and once deployed cannot be changed. This solution is not a good fit for datacenter applications, where the workload (such as algorithms for search) are tweaked and changed over time. FPGAs can offer a happy medium between CPUs and ASICs in that they offer programmable and reconfigurable hardware and can still offer a performance/watt advantage over CPUs because they effectively customize themselves to the algorithm. Making FPGAs accessible to more mainstream application programmers (i.e. those who are used to writing C, C++, Java etc. and not Verilog) will be one of the problems and tools such as OpenCL (and more) are gaining steam in this space.

FPGA news roundup: Microsoft "Catapult", Intel's hybrid and Xilinx OpenCL

FPGA news roundup: Microsoft “Catapult”, Intel’s hybrid and Xilinx OpenCL

There has been some activity in the FPGA realm lately.  First, Microsoft has published a paper at ISCA (a very well-known peer-reviewed computer architecture conference) about using FPGAs in datacenters for page ranking processing for Bing. In a test deployment, MS reported up to 95% more throughput for only 10% more power. The added total cost of ownership (TCO) was less than 30%. Microsoft used Altera Stratix V FPGAs in a PCIe form-factor with 8GB of DDR3 RAM on each board. The FPGAs were connected with each other in a 6×8 torus configuration using a 10Gb SAS network. Microsoft mentioned that it programmed the FPGAs in Verilog and that this hand-coding was one of the challenging aspects of the work. However, Microsoft believes using high-level tools such as Scala (presumably a domain-specific subset), OpenCL or “C-to-gates” tools such as AutoESL or ImpulseC might also be suitable for such jobs in the future. Microsoft appears to be pretty happy with the results overall and hopes to deploy the system in production in 2015.

Intel revealed plans to manufacture a CPU-FPGA hybrid chip that combines Intel’s traditional Xeon-line CPU cores with FPGAs on a single chip. The FPGA and CPU will have coherent access to memory.  The exact details of the chip, such as number of CPU cores or the amount of logic and other resources of the FPGA, or even who is the source for the FPGAs (likely Altera), is not revealed. However, we do know that the chip will be package compatible with the existing Xeon E5 line. Intel mentions that FPGAs can deliver “up to 10x” the performance on unspecified industry benchmarks. Intel further claims its implementation will deliver another 2x improvement (so 20x total) because of coherency and lower CPU-FPGA latency. We will have to wait for more information about this product to validate any of Intel’s claims. It will also be interesting what software and development tools Intel provides for this chip.

Finally, you may remember our previous coverage of OpenCL on Altera’s FPGAs and we had mentioned that Xilinx had some plans for OpenCL as well.  Recently (~2 months ago) Xilinx updated its Vivado design suite and now includes “early access” support for OpenCL.

Overall, these announcements point to increased adoption of FPGAs in mainstream applications. Datacenters are very focused on performance per watt as they tend to be power limited, with increasing performance needs. Progress on scaling performance through multicore CPUs has slowed, and relying on GPUs to increase overall performance per watt has an upper bound as well. In a power constrained environment where two different types of general purpose processors are limited by progress on the process node side, we need to find another option to continue to scale performance. In an ideal world, one may design application-specific integrated circuits (ASICs) to get the highest performance/watt, but ASICs are hard to design and once deployed cannot be changed. This solution is not a good fit for datacenter applications, where the workload (such as algorithms for search) are tweaked and changed over time. FPGAs can offer a happy medium between CPUs and ASICs in that they offer programmable and reconfigurable hardware and can still offer a performance/watt advantage over CPUs because they effectively customize themselves to the algorithm. Making FPGAs accessible to more mainstream application programmers (i.e. those who are used to writing C, C++, Java etc. and not Verilog) will be one of the problems and tools such as OpenCL (and more) are gaining steam in this space.

NVIDIA Kepler Cards Get HDMI 4K@60Hz Support (Kind Of)

NVIDIA Kepler Cards Get HDMI 4K@60Hz Support (Kind Of)

An interesting feature has turned up in NVIDIA’s latest drivers: the ability to drive certain displays over HDMI at 4K@60Hz. This is a feat that would typically require HDMI 2.0 – a feature not available in any GPU shipping thus far – so to say it’s unexpected is a bit of an understatement. However as it turns out the situation is not quite cut & dry as it first appears, so there is a notable catch.

First discovered by users, including AT Forums user saeedkunna, when Kepler based video cards using NVIDIA’s R340 drivers are paired up with very recent 4K TVs, they gain the ability to output to those displays at 4K@60Hz over HDMI 1.4. These setups were previously limited to 4K@30Hz due to HDMI bandwidth availability, and while those limitations haven’t gone anywhere, TV manufacturers and now NVIDIA have implemented an interesting workaround for these limitations that teeters between clever and awful.

Lacking the available bandwidth to fully support 4K@60Hz until the arrival of HDMI 2.0, the latest crop of 4K TVs such as the Sony XBR 55X900A and Samsung UE40HU6900 have implemented what amounts to a lower image quality mode that allows for a 4K@60Hz signal to fit within HDMI 1.4’s 8.16Gbps bandwidth limit. To accomplish this, manufacturers are making use of chroma subsampling to reduce the amount of chroma (color) data that needs to be transmitted, thereby freeing up enough bandwidth to increase the image resolution from 1080p to 4K.


An example of a current generation 4K TV: Sony’s XBR 55X900A

Specifically, manufacturers are making use of Y’CbCr 4:2:0 subsampling, a lower quality sampling mode that requires ¼ the color information of regular Y’CbCr 4:4:4 sampling or RGB sampling. By using this sampling mode manufacturers are able to transmit an image that utilizes full resolution luma (brightness) but a fraction of the chroma resolution, allowing manufacturers to achieve the necessary bandwidth savings.


Wikipedia: diagram on chroma subsampling

The use of chroma subsampling is as old as color television itself, however the use of it in this fashion is uncommon. Most HDMI PC-to-TV setups to date use RGB or 4:4:4 sampling, both of which are full resolution and functionally lossless. 4:2:0 sampling on the other hand is not normally used for the last stage of transmission between source and sink devices – in fact HDMI didn’t even officially support it until recently – and is instead used in the storage of source material itself, be it Blu-Ray discs, TV broadcasts, or streaming videos.

Perceptually 4:2:0 is an efficient way to throw out unnecessary data, making it a good way to pack video, but at the end of the day it’s still ¼ the color information of a full resolution image. Since video sources are already 4:2:0 this ends up being a clever way to transmit video to a TV, as at the most basic level a higher quality mode would be redundant (post-processing aside). But while this works well for video it also only works well for video; for desktop workloads it significantly degrades the image as the color information needed to drive subpixel-accurate text and GUIs is lost.

In any case, with 4:2:0 4K TVs already on the market, NVIDIA has confirmed that they are enabling 4:2:0 4K output on Kepler cards with their R340 drivers. What this means is that Kepler cards can drive 4:2:0 4K TVs at 60Hz today, but they are doing so in a manner that’s only useful for video. For HTPCs this ends up being a good compromise and as far as we can gather this is a clever move on NVIDIA’s part. But for anyone who is seeing the news of NVIDIA supporting 4K@60Hz over HDMI and hoping to use a TV as a desktop monitor, this will still come up short. Until the next generation of video cards and TVs hit the market with full HDMI 2.0 support (4:4:4 and/or RGB), DisplayPort 1.2 will remain the only way to transmit a full resolution 4K image.