OpenCL is a computational framework designed to take advantage of multicore platforms. Over time, some myths about OpenCL have been formed, which we would like to get to the bottom of in this article.
OpenCL is an open, cross-platform, parallel computing standard of the Khronos working group. Applications are written in a variant of C and the latest OpenCL 2.2 brings a static subset of C++ 14 into the mix.
OpenCL has broad support on both the hardware and software side. It is often a checkbox for hardware feature sets, and developers can use OpenCL-based software without knowing what is associated with the hardware or software it runs on. Those more familiar with OpenCL and know how it works will find nothing new in this post. So will those who simply use OpenCL-based software, whether or not their software runs properly on hardware supported by OpenCL. The rest of you should definitely read on.
I just have to write C code to use OpenCL.
That’s right. OpenCL is a language specification, a runtime API, and a software framework that includes an OpenCL compiler and a matching runtime. The application code is written in the form of small kernels that are then executed by the runtime. OpenCL kernels are written in C or C++.
Part of the challenge of writing OpenCL code is to take advantage of the extensions and constructs to parallelize computation, because this is now OpenCL and accelerates the execution of an application. This cannot be accomplished simply by using an OpenCL compiler. Determining how functions are implemented in a kernel also affects performance.
The runtime schedules kernels and their matching data to run on hardware that is typically made up of multiple cores, potentially speeding up the application. A kernel runs to the end and the resulting data can be used for future calculations.
The code for a kernel can be shared between the cores in a shared memory system, or it can be copied to a node as in a cluster system. Kernels are not portable to different hardware, but the source code is.
OpenCL only runs on GPUs.
That’s not true. OpenCL is a specification implemented by an OpenCL compiler. It can generate code for the target hardware, which can include CPUs, GPUs, or a mixture of both. The compiler determines which targets it should support. An OpenCL application can run on a single-core CPU, but usually multicore systems are the target to get more overall performance from a system.
OpenCL can also run on FPGAs. This approach is somewhat more static because the kernels are implemented in an FPGA and its configuration does not normally change over time. Altera’s SDK for OpenCL includes an OpenCL compiler that generates an FPGA configuration. This includes configuration support for the essentially OpenCL runtime. There is also software support that moves data between a host and the FPGA and initiates kernel execution in the FPGA.
Essentially, data is stored in the memory of the FPGA to which a kernel has access. The results can then be extracted in the same way or made available to another kernel, since OpenCL works on a CPU or GPU. FPGAs have the advantage that they perform many operations in parallel.
The runtime schedules kernels and their matching data to run on hardware that is typically made up of multiple cores, potentially speeding up the application. A kernel runs to the end and the resulting data can be used for future calculations.
The code for a kernel can be shared between the cores in a shared memory system, or it can be copied to a node as in a cluster system. Kernels are not portable to different hardware, but the source code is.
CUDA is just Nvidia’s version of OpenCL.
Wrong. Nvidia’s CUDA is similar to OpenCL, but they are different. Both use the kernel approach to partition code, and both support C and C++ for kernel code.
CUDA is an Nvidia architecture that is specifically designed for Nvidia GPU hardware, but can also generate code for some CPUs, such as x86 platforms that offer similar features to OpenCL. In addition, the CUDA toolkit includes additional libraries such as cuDNN for Deep Learning. CuFFT, cuBLAS and NPP (Nvidia Performance Primitives for Imaging). Nvidia device drivers for Nvidia hardware support CUDA and OpenCL. Typically, developers using Nvidia GPUs choose CUDA or OpenCL for their projects.
OpenCL runs better on a GPU than on a CPU.
Yes and no. Typically, a GPU has many more cores than even a multicore CPU. Many cores can help in some applications, but any acceleration is usually application-specific. Some applications perform better on a multicore CPU, while others perform better on GPUs. Much depends on how different kernels are written and executed, and the type of operations to be performed. Operations that take advantage of GPU functionality typically perform better on a GPU. Sometimes a mix of hardware can be the best alternative. It is possible to have an OpenCL system that includes both.
I have to program in OpenCL to use it.
Wrong. It is possible to use applications written and compiled for an OpenCL platform. In this case, the OpenCL runtime must be installed on the platform (usually as a device driver) and the application must be compiled for that platform. This is similar to traditional native code applications written for an operating system. The applications require the appropriate operating system and hardware to run.
Another way to use OpenCL is to use an OpenCL-based library and call functions in the library from an application running on the CPU. The OpenCL APIs are defined for a variety of programming languages, including Python, Java, C, and C++.
You only need to write an OpenCL program if neither of the previous two scenarios is suitable.
OpenCL can only be used by C applications.
Incorrect. Kernel code is also compiled from C or C++ source code, but an OpenCL application is not just kernel code. It is possible to use OpenCL-based libraries and applications in conjunction with application code on the CPU side that is written in almost any programming language.
OpenCL specifies C and C++ APIs to interface with OpenCL applications. These APIs have been translated into other programming languages, including popular ones like Java and Python.
OpenCL is difficult to learn and program.
True and false. OpenCL uses C and C++, which makes it easy to get started if you have a background in C or C++. The trick is that OpenCL has its own conventions, techniques, and debugging methods that differ from standard C and C++ development.
There’s a wealth of information, documentation, and examples of OpenCL on the Internet, as well as numerous books on the subject. Nevertheless, there is still a lot to learn to build efficient and error-free OpenCL applications. Debuggers like AMD’s CodeXL have similar, but different, features from traditional CPU debuggers. OpenCL also uses various tracing and profiling tools.
OpenCL cannot be used for stream programming.
Incorrect. Stream programming is usually associated with a data stream, such as an audio or video stream. Managing the data is a little more complex than a typical OpenCL application, but it’s possible.
OpenCL only runs on AMD and Nvidia GPUs.
Wrong. OpenCL runs on most GPGPUs, including GPUs from ARM, Imagination Technologies, Intel and others. However, it does not run on all GPUs and requires a suitable runtime/driver and an OpenCL compiler. The GPGPUs can be integrated into a CPU or they can be standalone GPUs connected to a host (usually via PCI Express).
OpenCL requires a lot of hardware to run.
Wrong, although more hardware tends to improve performance. OpenCL can run on single-core microcontrollers, but it makes more sense to run it on devices with multiple cores or GPUs that support OpenCL. OpenCL systems can be scaled to very large clusters that contain multiple nodes with multiple CPUs or GPUs. Distributing code and data within a cluster can be a more complex task, but this allows developers to focus on the code for the kernels.
OpenCL is not good for embedded applications.
Wrong. OpenCL can be very useful in embedded applications. The embedded system will need support for OpenCL, but there are a number of microcontrollers and SoCs that have sufficient resources, along with OpenCL support, to make their use useful.
Embedded designers actually have an advantage because the scope and requirements are known. You can often customize the chips used to scale up or down to meet these requirements. OpenCL can also help with performance-sensitive applications because it is often more efficient.
However, OpenCL is not cheap. The necessary hardware costs money and consumes power. It can enable the implementation of features that would not be possible without OpenCL. Deciding when and how to use OpenCL is the key factor for developers.
Thank you for your visit.