Madgwick.xyz

July 22, 2018 (2018-08-22)
Updated: December 23, 2018 (2018-12-23)

OLD VERSION A quick look at Compute Languages

This is the old version. Please see the new version. This is just a quick look at the platforms available and the PC hardware they support put into one place (mostly for my own reference). I use Tensorflow as an example of how this situation plays out for a piece of software that utilises compute. There are a variety of different compute platforms available, targeting varying platforms with differing degrees of development, documentation and adoption.

The two well-known and longest developed platforms are CUDA and OpenCL. CUDA is developed by Nvidia exclusively for their devices, while OpenCL is developed by the Khronos group and is an open, royalty-free standard which can be implemented by any hardware manufacturer.

CUDA

Since its introduction in 2007 CUDA has been continuously developed by Nvidia. There are two different version numbers to pay attention to. Compute Capability refers to the instruction level which can be used on a particular device. A new generation of devices comes a newer version supporting extra operations which don’t exist on earlier devices. The CUDA development toolkit itself has a different version number system. Earlier versions of the toolkit supported all devices regardless of the Compute Capability version. However, since toolkit version 7 support for the oldest devices has been gradually removed when those devices became legacy, this is when they stop receiving updated drivers.

CUDA Support
Codename Compute Capability GPU Number(s) Latest Driver? Latest CUDA Toolkit?
Tesla 1.0 Early Geforce 8800 No
(Legacy Driver 342.01)
No
Toolkit version 6.5
1.1,1.2 Older Geforce 8xxx and 9xxx
GT/GTX 2xx and 3xx except below
1.3 GTX 295, 285, 280, 275, 260
Fermi 2.0 GTX 480, 470, 465, 590, 580, 570 Yes No
Toolkit version 8.0
2.1 GTX 460, GTS 450, GT 4xx, GT 6xx
Kepler 3.0 GTX 690, 680, 670, 660, 650, 770, 760, GT 740 Yes
3.5 GTX Titan(s), 780 Ti, 780
Maxwell 5.0 GTX 750 Ti, GTX 750
5.2 GTX Titan X, 980 Ti, 980, 970, 960, 950
Pascal 6.1 GTX Titan(s), 1080 Ti, 1080, 1070 (Ti), 1060, 1050, GT 1030
Volta 7.0 GTX Titan V
Turing 7.5 RTX Titan, 2080 Ti, 2080, 2070, 2060

OpenCL and SYCL

Released 2 years after CUDA, OpenCL aims to be an open compute standard for any device which can implement it and is not vendor specific. Versions are simple in OpenCL as development is not as fast paced as in CUDA. Version 1.2 was finalised in 2011 and the latest Version 2.2 is still in development and not supported by many vendors. Due to the way that compute kernels are implemented, unlike CUDA, OpenCL doesn’t need its own compiler. Only C header and library files are needed to start development not a whole toolkit.

SYCL1 is an abstraction layer for OpenCL intended to be more efficient than OpenCL which is lower level. However, it doesn’t appear to have seen as much adoption as OpenCL itself.

OpenCL Support
OpenCL Nvidia AMD Intel
1.0 None HD 48xx, HD 46xx None
1.1 Geforce 8xxx, 9xxx, GTX 2xx, 4xx, 5xx HD 5xxx None
1.2 GTX 6xx, 7xx, 9xx, 10xx, Titan(s) HD 6xxx,7xxx Core ix-3xxx(Ivy Bridge), ix-4xxx (Haswell)
2.0 None (see notes) HD 7790, HD 8xxx, R7/9 2xx, R9 3xx, RX 4xx,5xx, Fury(X), Vega 56/64, Instinct None
2.1 None None (see notes) Core ix-5xxx(Broadwell), ix-6xxx (Skylake)
ix-7xxx(Kaby Lake), ix-8xxx (Coffee Lake)

Vulkan

The Vulkan API, also developed by the Khronos group, is an open standard which is used as a replacement for Direct X/OpenGL in 3D game engines. But it can also be used for compute tasks9, the disadvantage is that it’s a very low-level API. An advantage would be that, due to being used in game engines, it is equally well supposed by Nvidia and AMD.

A complex situation with AMD

Historically AMD promoted OpenCL for compute applications on its GPUs. More recently this has evolved. AMD has released an entirely new open source driver and software stack called ROCm. This includes two high-level languages, HC and HIP. HC (Heterogeneous Compute)10 is a C++ API similar to C++ AMP, it targets only AMD devices. HIP (Heterogeneous Computing Interface for Portability)11 is a API which is intentionally very similar to CUDA. It is intended to simplify the process of converting CUDA programs to run on AMD devices. AMD supply a tool to “hipify” existing CUDA code, converting it to HIP which can then be compiled to and run on AMD devices. AMD’s compiler also allows Nvidia devices to be targeted by HIP code. This is possible because internally HIP code can operate as an interfacing layer. When targeting AMD devices, the compiler (called hcc) compiles HIP directly for AMD devices. When targeting Nvidia devices, hcc converts the HIP code back to CUDA and then passes it to Nvidia’s compiler (nvcc) which then complies it as usual. The diagram below is a very simplified representation of the interactions between CUDA, HIP, HCC and NVCC.

diagram
ROCm Driver Support
Name Codename Example Cards HC Support OpenCL Support Notes
gfx701 Hawaii R9 290(X), R9 390(X) Yes Yes Experimental Support, no CPU or PCIe requirements
gfx802 Tonga R9 285, R9 385, R9 380(X) No Supported by driver but NOT by the HCC Compiler (see note)
gfx803 Fiji R9 Fury(X), R9 Nano Yes Requires CPU with PCIe Gen3 + PCIe Atomics
Polaris 10/11 RX 480–60, RX 580,570,560(some)
gfx901 Vega 10 Vega FE, RX Vega 64, RX Vega 56 Requires CPU with PCIe Gen3 + PCIe Atomics
(can be configured for PCIe Gen 2 w/out Atomics)
gfx906 Vega 20 AMD Radeon Instinct MI60, MI50
gfx1010 Navi 10 TBD AMD GPUs Unknown TBD

Tensorflow as an example

Tensorflow4 is a well-known platform for machine learning and deep learning. It works as a library for python and aims to be simpler to use, by abstracting low level implementations away. It supports CPUs and GPUs as its back end. The downloadable binaries (installed through python pip) support standard CPUs and Nvidia GPUs which have at least Compute Capability 3.5, older versions supported CC 3. Nvidia support requires the CUDA toolkit and the cuDNN library to be installed.

There is no out of the box support for AMD graphics cards. AMD have created a separate port5 of Tensorflow which supports some of their latest GPUs. This port requires the ROCm runtime (explained above) to be installed, and a supported GPU to function. Benchmarks show performance to be similar to Nvidia cards.6 The main disadvantage is that fewer cards are supported and the version of Tensorflow is a version behind the latest release.

There is also an implementation of Tensorflow which uses OpenCL and SYCL.7 It requires OpenCL 1.2 and an extension called cl_khr_spir. This means that in theory it’s supported by most AMD cards. However, from what I can tell the performance is inferior8 to AMDs port which has an optimised implementation.

Tensorflow and AMD ROCm Tensorflow GPUs
Vendor Tensorflow Supported GPUs
Nvidia GTX 780 Ti, 780, 750 (Ti), 9xx, 10xx, 20xx, Titan(s)
AMD R9 Fury(X), R9 Nano, RX 480–60
Vega FE, RX Vega 64, Vega 56, RX 580–60, Instinct
References