机器学习和生物信息学实验室联盟

标题: 实验室cuda开发环境性能 [打印本页]

作者: xmubingo 时间: 2012-3-13 19:17
标题: 实验室cuda开发环境性能
本帖最后由 xmubingo 于 2012-5-3 21:12 编辑

感谢 @cwc 配置实验室cuda编程环境。
192.168.1.100
192.168.1.103

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 1 CUDA Capable device(s)

Device 0: "GeForce GTX 550 Ti"
  CUDA Driver Version / Runtime Version       4.2 / 4.1
  CUDA Capability Major/Minor version number: 2.1
  Total amount of global memory:                1024 MBytes (1073283072 bytes)
  ( 4) Multiprocessors x (48) CUDA Cores/MP:    192 CUDA Cores
  GPU Clock Speed:                            1.90 GHz
  Memory Clock rate:                            2050.00 Mhz
  Memory Bus Width:                            192-bit
  L2 Cache Size:                               393216 bytes
  Max Texture Dimension Size (x,y,z)          1D=(65536), 2D=(65536,65535), 3                                                                                                                                                    D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers       1D=(16384) x 2048, 2D=(16384,16                                                                                                                                                    384) x 2048
  Total amount of constant memory:             65536 bytes
  Total amount of shared memory per block:    49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                  32
Maximum number of threads per block:          1024
  Maximum sizes of each dimension of a block: 1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:    65535 x 65535 x 65535
  Maximum memory pitch:                         2147483647 bytes
  Texture alignment:                            512 bytes
  Concurrent copy and execution:                Yes with 1 copy engine(s)
  Run time limit on kernels:                   Yes
  Integrated GPU sharing Host Memory:          No
  Support host page-locked memory mapping:    Yes
  Concurrent kernel execution:                Yes
  Alignment requirement for Surfaces:          Yes
  Device has ECC support enabled:             No
  Device is using TCC driver mode:             No
  Device supports Unified Addressing (UVA):    Yes
  Device PCI Bus ID / PCI location ID:          3 / 0
  Compute Mode:
   < Default (multiple host threads can use ::cudaSetDevice() with device similtaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA Runtime Version = 4.1, NumDevs = 1, Device = GeForce GTX 550 Ti
[deviceQuery] test results...
PASSED

作者: xmubingo 时间: 2012-3-14 21:39
本帖最后由 xmubingo 于 2012-3-14 21:45 编辑

zouquan 发表于 2012-3-14 21:20
是吗？才1000元的显卡，桌子上的2个是550

桌子左下角的机器和新配的64G机器是560,300+个流处理单元，也是 ...

下面的是236的性能。计算能力较差，cuda版本较旧，主要是他们的板子是早期的。但是236显存是4G的，而且cuda核数比我们多。

Device 0: "Tesla T10 Processor"
  CUDA Driver Version:                         4.0
  CUDA Runtime Version:                         4.0
  CUDA Capability Major/Minor version number: 1.3
  Total amount of global memory:                4294770688 bytes
  (30) Multiprocessors x ( 8) CUDA Cores/MP:    240 CUDA Cores
  Total amount of constant memory:             65536 bytes
  Total amount of shared memory per block:    16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                  32
  Maximum number of threads per block:          512
  Maximum sizes of each dimension of a block: 512 x 512 x 64
  Maximum sizes of each dimension of a grid:    65535 x 65535 x 1
  Maximum memory pitch:                         2147483647 bytes
  Texture alignment:                            256 bytes
  Clock rate:                                  1.30 GHz
  Concurrent copy and execution:                Yes
  # of Asynchronous Copy Engines:             1
  Run time limit on kernels:                   No
  Integrated:                                  No
  Support host page-locked memory mapping:    Yes
  Compute mode:                               Default (multiple host threads                                                                         can use this device simultaneously)
  Concurrent kernel execution:                No
  Device has ECC support enabled:             No
  Device is using TCC driver mode:             No

作者: zouquan 时间: 2012-3-14 22:28
在cuda计算时，GPU处理单元个数重要还是显存更重要？

236应该是多个显卡一共4G吧？

如果需要我们也可以加：）

作者: xmubingo 时间: 2012-3-14 23:09
本帖最后由 xmubingo 于 2012-3-14 23:10 编辑

zouquan 发表于 2012-3-14 22:28
在cuda计算时，GPU处理单元个数重要还是显存更重要？

236应该是多个显卡一共4G吧？

我刚特意查了下NVDIA，GeForce显卡，Quadro显卡和Tesla显卡这三种显卡的区别。

GeForce是面向一般用户的主流显卡.Quadro是面向工作站的显卡.Tesla是面向服务器的显卡.使用cuda进行计算是一样的.一般用户可以只用考虑流处理、带宽等参数(geforce,quadro显卡)Tesla显卡有特殊要求

236的Tesla T10其实是Tesla C1060协处理器，可以理解为一个超级显卡。现在最新的是T20的Tesla C2070的协处理器：http://www.nvidia.cn/object/workstation-solutions-tesla-cn.html

三者中Tesla更为专业，专门为企业计算设计。http://www.xasun.com/article/c9/1000.html

236有5块Tesla C1060协处理器，基本上一万块一张。08年就更贵了。

作者: zouquan 时间: 2012-3-14 23:59
嗯，235,236都是好机器，当年买的花了不少钱，也没人用，咱们再不替领导用一下，那么多钱就只为GDP做贡献了。。
可惜96G内存的235挂了

作者: xmubingo 时间: 2012-5-3 21:10
192.168.1.104

两块GTS 450

  Device 0: "GeForce GTS 450"
  CUDA Driver Version / Runtime Version       4.2 / 4.1
  CUDA Capability Major/Minor version number: 2.1
  Total amount of global memory:                4096 MBytes (4294508544 bytes)
  ( 4) Multiprocessors x (48) CUDA Cores/MP:    192 CUDA Cores
  GPU Clock Speed:                            1.57 GHz
  Memory Clock rate:                            600.00 Mhz
  Memory Bus Width:                            128-bit
  L2 Cache Size:                               262144 bytes
  Max Texture Dimension Size (x,y,z)          1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers       1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:             65536 bytes
  Total amount of shared memory per block:    49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                  32
  Maximum number of threads per block:          1024
  Maximum sizes of each dimension of a block: 1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:    65535 x 65535 x 65535
  Maximum memory pitch:                         2147483647 bytes
  Texture alignment:                            512 bytes
  Concurrent copy and execution:                Yes with 1 copy engine(s)
  Run time limit on kernels:                   Yes
  Integrated GPU sharing Host Memory:          No
  Support host page-locked memory mapping:    Yes
  Concurrent kernel execution:                Yes
  Alignment requirement for Surfaces:          Yes
  Device has ECC support enabled:             No
  Device is using TCC driver mode:             No
  Device supports Unified Addressing (UVA):    Yes
  Device PCI Bus ID / PCI location ID:          1 / 0
  Compute Mode:
   < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

作者: xmubingo 时间: 2012-5-8 09:46
301
GTX 460

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 1 CUDA Capable device(s)

Device 0: "GeForce GTX 460"
  CUDA Driver Version / Runtime Version       4.2 / 4.1
  CUDA Capability Major/Minor version number: 2.1
  Total amount of global memory:             1024 MBytes (1073283072 bytes)
  ( 7) Multiprocessors x (48) CUDA Cores/MP:    336 CUDA Cores
  GPU Clock Speed:                            1.40 GHz
  Memory Clock rate:                            1800.00 Mhz
  Memory Bus Width:                            256-bit
  L2 Cache Size:                               524288 bytes
  Max Texture Dimension Size (x,y,z)          1D=(65536), 2D=(65536,65535), 3                                              D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers       1D=(16384) x 2048, 2D=(16384,16                                              384) x 2048
  Total amount of constant memory:             65536 bytes
  Total amount of shared memory per block:    49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                  32
  Maximum number of threads per block:          1024
  Maximum sizes of each dimension of a block: 1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:    65535 x 65535 x 65535
  Maximum memory pitch:                         2147483647 bytes
  Texture alignment:                            512 bytes
  Concurrent copy and execution:                Yes with 1 copy engine(s)
  Run time limit on kernels:                   Yes
  Integrated GPU sharing Host Memory:          No
  Support host page-locked memory mapping:    Yes
  Concurrent kernel execution:                Yes
  Alignment requirement for Surfaces:          Yes
  Device has ECC support enabled:             No
  Device is using TCC driver mode:             No
  Device supports Unified Addressing (UVA):    Yes
  Device PCI Bus ID / PCI location ID:          1 / 0
  Compute Mode:
   < Default (multiple host threads can use ::cudaSetDevice() with device simu                                              ltaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.2, CUDA Runtime Versi                                              on = 4.1, NumDevs = 1, Device = GeForce GTX 460
[deviceQuery] test results...
PASSED

作者: xmubingo 时间: 2012-5-24 14:20
提示: 该帖被管理员或版主屏蔽

作者: zouquan 时间: 2013-6-12 08:14
怎么看这个信息？用的什么命令？

作者: xmubingo 时间: 2013-6-12 12:30

zouquan 发表于 2013-6-12 08:14
怎么看这个信息？用的什么命令？

执行cuda sdk下的deviceQuery程序即可输出

欢迎光临机器学习和生物信息学实验室联盟 (http://123.57.240.48/)