openMP加速

xmubingo · 发表于 2014-5-5 09:44:00

类似CUDA的openACC，加一条代码就实现加速。首先要开启openmp的支持，GCC加-openmp一起编译，VS开发打开openmp支持。

openMP支持的编程语言包括C语言、C++和Fortran，支持OpenMP的编译器包括Sun Studio，Intel Compiler，Microsoft Visual Studio，GCC。我使用的是Microsoft Visual Studio 2008，CPU为Intel i5 四核，首先讲一下在Microsoft Visual Studio 2008上openMP的配置。非常简单，总共分2步：

(1) 新建一个工程。这个不再多讲。

(2) 建立工程后，点击菜单栏->Project->Properties，弹出菜单里，点击 Configuration Properties->C/C++->Language->OpenMP Support，在下拉菜单里选择Yes。

至此配置结束。下面我们通过一个小例子来说明openMP的易用性。这个例子是有一个简单的test()函数，然后在main()里，用一个for循环把这个test()函数跑8遍。

1 #include <iostream>
2 #include <time.h>
3 void test()
4 {
5 int a = 0;
6 for (int i=0;i<100000000;i++)
7 a++;
8 }
9 int main()
10 {
11 clock_t t1 = clock();
12 for (int i=0;i<8;i++)
13 test();
14 clock_t t2 = clock();
15 std::cout<<"time: "<<t2-t1<<std::endl;
16 }

编译运行后，打印出来的耗时为：1.971秒。下面我们用一句话把上面代码变成多核运行。

1 #include <iostream>
2 #include <time.h>
3 void test()
4 {
5 int a = 0;
6 for (int i=0;i<100000000;i++)
7 a++;
8 }
9 int main()
10 {
11 clock_t t1 = clock();
12 #pragma omp parallel for
13 for (int i=0;i<8;i++)
14 test();
15 clock_t t2 = clock();
16 std::cout<<"time: "<<t2-t1<<std::endl;
17 }

编译运行后，打印出来的耗时为：0.546秒，几乎为上面时间的1/4。

参考这个文章 http://www.eyeler.com/article-8-1.html

guojiasheng · 发表于 2015-12-7 15:49:58

本帖最后由 guojiasheng 于 2015-12-7 15:58 编辑

正好使用openmp，对师兄这边做个补充哈~
gcc 版本4.2以上支持openMP，实验的环境是在linux下的gcc-4.8.2编译，12核。

编译：/opt/compiler/gcc-4.8.2/bin/g++ main.c -fopenmp （记得需要加个-fopenmp）

/**
* @file main.c
* @author guojiasheng
* @date 2015/12/04 19:27:51
* @brief
*
**/
#include <iostream>
#include <omp.h>
#include <stdio.h>
using namespace std;
//同步标识
void critical()
{
int sum=0;
#pragma omp parallel
{
#pragma omp for
for (int i =0 ;i<10 ; i++)
{
#pragma omp critical
sum += 1;
}
}
cout << "sum: " << sum << endl;
}
//并行
void parallel()
{
#pragma omp parallel
{
cout << "thread: " << omp_get_thread_num() << endl;
}
}
//并行for
void parallelFor()
{
//设置并行核数
omp_set_num_threads(4);
#pragma omp parallel for
for (int i=0; i<10 ;i++)
printf("thread%d,value:%d\n",omp_get_thread_num(),i);
}
int main(){
critical();
parallel();
parallelFor();
}
/* vim: set expandtab ts=4 sw=4 sts=4 tw=100: */

复制代码

结果说明：
（1）parallel();
（2）parallelFor();
使用prarallel的话，每个线程都有执行一个printf
使用prarallel for引导的话，for会自动切分成给线程，可以看出来，线程0输出了0，1,2
（3）critical();
因为sum是共享变量，所以如果多线程不加以锁控制的话，结果会和预期的不一样，openMP内置一个标志，可以保证只有一个线程读取sum，当然这个会使得效率变低。

		自动登录	找回密码
密码			注册

openMP加速

本帖子中包含更多资源

浏览过的版块