CUDA: Cosmos — All that is, or was, or ever will be

CUDA

cosmos 30th March 2017 at 3:42pm

GPU computing, Parallel computing

move data from CPU to GPU memory cudaMcopy
compute on GPU with Kernels
1. Launch blocks of threads (forming a grid of blocks). Why blocks?: GPU allocates each blocks to a Streaming Multiprocessor (SM) (an SM may run more than 1 block). CUDA makes few guarantess about where and when thread blocks will run
mode data back grom GPU to CPU memory

Kernels look like serial code, but you can specify the parallelism, which is the number of simultaneous threads each of which executes a copy of the kernel.

Memory model – full model

Need for synchronization!

Barrier

Introduction to parallel programming by nvidia in Udacity: https://classroom.udacity.com/courses/cs344/lessons/55120467/concepts/671181630923

What happens when many threads try to write to same memory location – atomic operations