This is the old version. Please see the new version. This is just a quick look at the platforms available and the PC hardware they support put into one place (mostly for my own reference). I use Tensorflow as an example of how this situation plays out for a piece of software that utilises compute. There are a variety of different compute platforms available, targeting varying platforms with differing degrees of development, documentation and adoption.
The two well-known and longest developed platforms are CUDA and OpenCL. CUDA is developed by Nvidia exclusively for their devices, while OpenCL is developed by the Khronos group and is an open, royalty-free standard which can be implemented by any hardware manufacturer.
Since its introduction in 2007 CUDA has been continuously developed by Nvidia. There are two different version numbers to pay attention to. Compute Capability refers to the instruction level which can be used on a particular device. A new generation of devices comes a newer version supporting extra operations which don’t exist on earlier devices. The CUDA development toolkit itself has a different version number system. Earlier versions of the toolkit supported all devices regardless of the Compute Capability version. However, since toolkit version 7 support for the oldest devices has been gradually removed when those devices became legacy, this is when they stop receiving updated drivers.
Codename | Compute Capability | GPU Number(s) | Latest Driver? | Latest CUDA Toolkit? |
---|---|---|---|---|
Tesla | 1.0 | Early Geforce 8800 | No (Legacy Driver 342.01) |
No Toolkit version 6.5 |
1.1,1.2 | Older Geforce 8xxx and 9xxx GT/GTX 2xx and 3xx except below |
|||
1.3 | GTX 295, 285, 280, 275, 260 | |||
Fermi | 2.0 | GTX 480, 470, 465, 590, 580, 570 | Yes | No Toolkit version 8.0 |
2.1 | GTX 460, GTS 450, GT 4xx, GT 6xx | |||
Kepler | 3.0 | GTX 690, 680, 670, 660, 650, 770, 760, GT 740 | Yes | |
3.5 | GTX Titan(s), 780 Ti, 780 | |||
Maxwell | 5.0 | GTX 750 Ti, GTX 750 | ||
5.2 | GTX Titan X, 980 Ti, 980, 970, 960, 950 | |||
Pascal | 6.1 | GTX Titan(s), 1080 Ti, 1080, 1070 (Ti), 1060, 1050, GT 1030 | ||
Volta | 7.0 | GTX Titan V | ||
Turing | 7.5 | RTX Titan, 2080 Ti, 2080, 2070, 2060 |
Released 2 years after CUDA, OpenCL aims to be an open compute standard for any device which can implement it and is not vendor specific. Versions are simple in OpenCL as development is not as fast paced as in CUDA. Version 1.2 was finalised in 2011 and the latest Version 2.2 is still in development and not supported by many vendors. Due to the way that compute kernels are implemented, unlike CUDA, OpenCL doesn’t need its own compiler. Only C header and library files are needed to start development not a whole toolkit.
SYCL1 is an abstraction layer for OpenCL intended to be more efficient than OpenCL which is lower level. However, it doesn’t appear to have seen as much adoption as OpenCL itself.
OpenCL | Nvidia | AMD | Intel |
---|---|---|---|
1.0 | None | HD 48xx, HD 46xx | None |
1.1 | Geforce 8xxx, 9xxx, GTX 2xx, 4xx, 5xx | HD 5xxx | None |
1.2 | GTX 6xx, 7xx, 9xx, 10xx, Titan(s) | HD 6xxx,7xxx | Core ix-3xxx(Ivy Bridge), ix-4xxx (Haswell) |
2.0 | None (see notes) | HD 7790, HD 8xxx, R7/9 2xx, R9 3xx, RX 4xx,5xx, Fury(X), Vega 56/64, Instinct | None |
2.1 | None | None (see notes) | Core ix-5xxx(Broadwell), ix-6xxx (Skylake) ix-7xxx(Kaby Lake), ix-8xxx (Coffee Lake) |
The Vulkan API, also developed by the Khronos group, is an open standard which is used as a replacement for Direct X/OpenGL in 3D game engines. But it can also be used for compute tasks9, the disadvantage is that it’s a very low-level API. An advantage would be that, due to being used in game engines, it is equally well supposed by Nvidia and AMD.
Historically AMD promoted OpenCL for compute applications on its GPUs. More recently this has evolved. AMD has released an entirely new open source driver and software stack called ROCm. This includes two high-level languages, HC and HIP. HC (Heterogeneous Compute)10 is a C++ API similar to C++ AMP, it targets only AMD devices. HIP (Heterogeneous Computing Interface for Portability)11 is a API which is intentionally very similar to CUDA. It is intended to simplify the process of converting CUDA programs to run on AMD devices. AMD supply a tool to “hipify” existing CUDA code, converting it to HIP which can then be compiled to and run on AMD devices. AMD’s compiler also allows Nvidia devices to be targeted by HIP code. This is possible because internally HIP code can operate as an interfacing layer. When targeting AMD devices, the compiler (called hcc) compiles HIP directly for AMD devices. When targeting Nvidia devices, hcc converts the HIP code back to CUDA and then passes it to Nvidia’s compiler (nvcc) which then complies it as usual. The diagram below is a very simplified representation of the interactions between CUDA, HIP, HCC and NVCC.
Name | Codename | Example Cards | HC Support | OpenCL Support | Notes |
---|---|---|---|---|---|
gfx701 | Hawaii | R9 290(X), R9 390(X) | Yes | Yes | Experimental Support, no CPU or PCIe requirements |
gfx802 | Tonga | R9 285, R9 385, R9 380(X) | No | Supported by driver but NOT by the HCC Compiler (see note) | |
gfx803 | Fiji | R9 Fury(X), R9 Nano | Yes | Requires CPU with PCIe Gen3 + PCIe Atomics | |
Polaris 10/11 | RX 480–60, RX 580,570,560(some) | ||||
gfx901 | Vega 10 | Vega FE, RX Vega 64, RX Vega 56 | Requires CPU with PCIe Gen3 + PCIe Atomics (can be configured for PCIe Gen 2 w/out Atomics) |
||
gfx906 | Vega 20 | AMD Radeon Instinct MI60, MI50 | |||
gfx1010 | Navi 10 | TBD AMD GPUs | Unknown TBD |
Tensorflow4 is a well-known platform for machine learning and deep learning. It works as a library for python and aims to be simpler to use, by abstracting low level implementations away. It supports CPUs and GPUs as its back end. The downloadable binaries (installed through python pip) support standard CPUs and Nvidia GPUs which have at least Compute Capability 3.5, older versions supported CC 3. Nvidia support requires the CUDA toolkit and the cuDNN library to be installed.
There is no out of the box support for AMD graphics cards. AMD have created a separate port5 of Tensorflow which supports some of their latest GPUs. This port requires the ROCm runtime (explained above) to be installed, and a supported GPU to function. Benchmarks show performance to be similar to Nvidia cards.6 The main disadvantage is that fewer cards are supported and the version of Tensorflow is a version behind the latest release.
There is also an implementation of Tensorflow which uses OpenCL and SYCL.7 It requires OpenCL 1.2 and an extension called cl_khr_spir
. This means that in theory it’s supported by most AMD cards. However, from what I can tell the performance is inferior8 to AMDs port which has an optimised implementation.
Vendor | Tensorflow Supported GPUs |
---|---|
Nvidia | GTX 780 Ti, 780, 750 (Ti), 9xx, 10xx, 20xx, Titan(s) |
AMD | R9 Fury(X), R9 Nano, RX 480–60 Vega FE, RX Vega 64, Vega 56, RX 580–60, Instinct |