Hands-On GPU：Accelerated Computer Vision with OpenCV and CUDA

上QQ阅读APP看书，第一时间看更新

Parallel processing

In recent years, consumers have been demanding more and more functionalities on a single hand held device. So, there is a need for packaging more and more transistors on a small area that can work quickly and consume minimal power. We need a fast processor that can carry out multiple tasks with a high clock speed, a small area, and minimum power consumption. Over many decades, transistor sizing has seen a gradual decrease resulting in the possibility of more and more transistors being packed on a single chip. This has resulted in a constant rise of the clock speed. However, this situation has changed in the last few years with the clock speed being more or less constant. So, what is the reason for this? Have transistors stopped getting smaller? The answer is no. The main reason behind clock speed being constant is high power dissipation with high clock rate. Small transistors packed in a small area and working at high speed will dissipate large power, and hence it is very difficult to keep the processor cool. As clock speed is getting saturated in terms of development, we need a new computing paradigm to increase the performance of the processors. Let's understand this concept by taking a small real-life example.

Suppose you are told to dig a very big hole in a small amount of time. You will have the following three options to complete this work in time:

You can dig faster.
You can buy a better shovel.
You can hire more diggers, who can help you complete the work.

If we can draw a parallel between this example and a computing paradigm, then the first option is similar to having a faster clock. The second option is similar to having more transistors that can do more work per clock cycle. But, as we have discussed in the previous paragraph, power constraints have put limitations on these two steps. The third option is similar to having many smaller and simpler processors that can carry out tasks in parallel. A GPU follows this computing paradigm. Instead of having one big powerful processor that can perform complex tasks, it has many small and simple processors that can get work done in parallel. The details of GPU architecture are explained in the next section.

本周热推：

Java EE实用教程 Web应用程序设计：ASP 青少年Python编程入门 R语言数据分析从入门到实战 C++服务器开发精髓