Knowing how and when to use GPGPUs in HPEC computers will give embedded designers a better understanding of how to manage power consumption and load balancing with this powerful processing technology.

myths GPGPU computing

Two of the biggest difficulties of modern embedded systems can be summarized in a few words: loss of processing power and increased power consumption. The main ones include the influx of data sources, continuous technology upgrades, reduction in system size and increasing density within the system itself.

High-performance embedded computer (HPEC) systems have begun to take advantage of the special parallel computing speed and power on general-purpose graphics processing units (GPGPUs), enabling system designers to bring exceptional performance to robust, durable small form factors (SFFs).

GPU-accelerated computing combines a graphics processing unit (GPU) with a central processing unit (CPU) to accelerate applications and transfer some of the computationally intensive parts from the CPU to the GPU.

It is important to note that as processing requirements continue to increase, the main processing unit (CPU) will eventually replace the GPU. However, the GPU has evolved into an extremely flexible and powerful processor, and thanks to improved programmability, precision and parallel processing, it can handle certain computing tasks better and faster than the CPU.

A deeper understanding of GPGPU computing, its limitations and capabilities can help you choose the best products that deliver the best performance for your application.

GPGPUs are really only useful in the entertainment industry, such as rendering graphics in games.

This is not true. As has been shown in recent years, GPGPUs are redefining what is possible in terms of data processing and deep learning networks and are shaping expectations in the field of artificial intelligence. A growing number of military and defence projects based on GPGPU technology have already been deployed, including systems that use the advanced processing capabilities for radar, image recognition, classification motion detection, coding, etc.

A deeper understanding of GPGPU computing, its limitations and capabilities can help you choose the best products that deliver the best performance for your application.

Because they are “general purpose” GPUs, they are not designed for complex, high-density computing tasks.

Also not true. A typical high-performance RISC or CISC CPU has dozens of “sophisticated” cores. A GPU has thousands of “specialized” cores that are optimized for addressing and manipulating large data arrays such as displays or input devices and optical cameras. These GPU cores allow applications to spread algorithms across many cores and make parallel processing easier to design and perform. The ability to build many simultaneous “kernels” on one GPU – where each “kernel” is responsible for a subset of specific computations – and the ability to perform complex, high-density computations.

While multi-core CPUs enable enhanced processing, CUDA-based GPUs provide thousands of cores that work in parallel and process large amounts of data simultaneously.

The GPGPU pipeline uses parallel processing on GPUs to analyze data as if it were images or other graphical data. While GPUs operate at lower frequencies, they typically have many times the number of cores. As a result, GPUs can process far more images and graphical data per second than a conventional CPU. Using the parallel GPU pipeline for scanning and analyzing graphics data results in a great acceleration.

GPGPUs are not rugged enough to withstand harsh environments such as borehole monitoring, mobile or military applications.

Wrong. The responsibility for ruggedness actually lies with the board or system manufacturer. Many parts and components used in a harsh electronics environment are not rugged enough in manufacture, and the same is true for GPGPUs. This is where knowledge of how to design acceptable systems comes in, including the techniques that best mitigate the effects of things like environmental hazards and ensure that systems meet specific application requirements.

Aitech, for example, has GPGPU-based boards and SFF systems that are qualified for and survive in a range of avionics, marine, ground and mobile applications, thanks to decades of experience that can be applied to system development. Space is now the benchmark as the frontier that the company is exploring for GPGPU technology.

If the processing requirements exceed the system requirements, the alternatives require increased power consumption (e.g. purchase of more powerful hardware, overclocking of existing devices).

True. If the user tries to avoid using GPGPU technology, there is usually a lack of CPU power. To solve this dilemma, additional CPU boards are usually added or existing boards are overclocked, resulting in increased power consumption. In most cases, this leads to a reduction in CPU frequency performance and the need for downclocking to compensate for rising tool temperatures.

Wouldn’t adding another processing engine increase the complexity and integration problems in my system?

In the short term, perhaps, because you need to consider the learning curve of dealing with a new, cutting-edge technology. But in the long term, no. CUDA has become a de facto computer language for image processing and algorithms.

Once you create a CUDA algorithm, you can “reuse” it on any other platform that supports an Nvidia GPGPU card. Porting from one platform to another is easy, making the approach less hardware-specific and therefore more “generic”.

Since these GPGPU-based systems process extremely large amounts of data, power consumption increases.

Not really. GPGPU cards are very energy efficient today. The power consumption on some GPGPU cards today is the same as on CPU cards. However, GPGPU cards can handle much more parallel data with thousands of CUDA cores. So the performance ratio is what is very positive. More processing with the same and sometimes a little less performance.

There are still compromises between performance and power consumption.

That is right and these compromises will always be there. Higher performance and faster throughputs require higher power consumption. That is a fact. But those are the same tradeoffs you’ll find when you use a CPU or other processing unit.

As an example we can mention the “Nvidia Optimus Technology”, which is used by Aitech. This is a compute GPU switching technology where the discrete GPU handles all rendering tasks and the final image output to the display continues to come from the RISC processor with integrated graphics processor (IGP). In fact, the RISC CPU’s IGP is only used as a simple display controller, resulting in a seamless, flicker-free, real-time experience without the GPGPU having to carry the full load of image rendering and generation or allocating CPU resources for image recognition across the entire RISC CPU. This load distribution or balancing makes these systems even more powerful.

When less critical or less demanding applications are running, the discrete GPU can be turned off. The Intel IGP processes both rendering and display calls to save power and provide the highest possible performance/power ratio.

The utilization of my CPU can be achieved with a simple board upgrade and is sufficient to manage the data processing required by my system.

This is not true. The industry is definitely moving toward parallel processing and taking over GPU processing functions, and with good reason. Parallel processing an image is the optimal task for a GPU – that’s what it was designed for. As the number of data inputs and camera resolutions continue to increase, the need for a parallel processing architecture is becoming the norm, not a luxury. This is especially true for business and safety-critical industries that need to capture, compare, analyze and make decisions on several hundred images simultaneously.

Moore’s law also applies to GPGPUs.

Yes, but there is a solution. Nvidia is currently developing a multi-chip module GPU (MCM-GPU) architecture that will allow continuous scaling of GPU performance despite the slowdown in transistor scaling and the limitations of the photo line.

At GTC 2019, Nvidia’s discussions about the MCM GPU chip revealed many technologies that could be applied to higher-level computer systems. These include mesh networking, low-latency signaling and a scalable deep learning architecture, as well as a very efficient die-to-die on an organic substrate transfer technology.

Learning a completely new programming language (e.g. CUDA) requires too much time and money.

Not really. Currently CUDA is the de facto parallel computer language and many CUDA-based solutions are already in use, so many algorithms have already been ported to CUDA. Nvidia has a large online forum with many examples, web training courses, user communities etc. In addition, software companies are ready to help you with your first steps in CUDA. In addition, the CUDA language is now part of the programming language curriculum in many universities.

Learning a new computer technology can seem daunting. However, with the resources available and the exponential prospects of GPGPU technology, this is a programming language that will be a very worthwhile investment.

There are no “industrial grade” GPGPUs for the embedded market, especially SFF, SwaP optimized systems.

That’s not true. Nvidia has a complete “Jetson” product line targeted at the embedded market. It currently includes the following System on Modules (SoMs), each with small form factors and sizes, weight and performance optimization:

  • TX1
  • TX2
  • TX2i special version “Industrial” for very “rough” environments.
  • Xavier

Designed for both industrial and military applications, GPGPUs redefine the SwaP optimization and expected performance of SFF systems.

Nvidia has announced a longer life cycle for the TX2i module, which means that component obsolescence is less of a risk for longer-term programs such as aerospace and defense, and for several rugged industrial applications. Aitech has implemented many military and industrial projects and customer programs based on the embedded family mentioned above, with new applications coming to market every day.

Thank you for your visit.