Parallel computing

cosmos 4th April 2017 at 11:26am
High-performance computing

Nice video about parallel computing

Why we cannot keep increasing CPU speed? Power has emerged as one of the primary factors in processor design.

Often used in Computer cluster and GPU computing. Main application is for High-performance computing (see more there)

Fundamental concept: total time vs total work

We say that a parallel algorithm is work efficient if its work complexity is asymptotically the same as the equivalent serial algorithm

Analysis of parallel algorithms


Parallel programming

Parallel communication patterns

Tasks <> Memory

  • Map. 1-to-1.. 1 thread on 1 part of memory, independently.
  • Scatter. 1-to-many. 1 thread, write to a potentially different and potentially more than 1 part of memory, independently.
  • Gather. many-to-1. Like scatter but for reading instead of writting.
    • Stencil. Read from a fixed set of neighbours, and write to 1 part of memory
  • Transpose.1-to-1. Any read and any write locations?
  • Reduce. all-to-1.
  • scan/sort. all-to-all.
  • More methods

Thread diveregence


Introduction to parallel programming by nvidia in Udacity: https://classroom.udacity.com/courses/cs344/lessons/55120467/concepts/671181630923


Latency vs throughput tradeoff

Latency: time for a single unit operation to take place

Throughput: number of operations per second.

Latency has advanced more slowly than throughput in technologies: Latency lags throughput

Types of parallel computing

  • High-throughput computing, aka embarassingly parallel computing: lots of *independent* tasks.
  • High-performance computing often refers to a big task divided into many parallel computing nodes, but they are not totally independent, and so issues of communication ened to be addressed.

Memory models

distributed and shared memory parallel computing models

  • Share memory: all the cores can see the same memory. OpenMP. Limited to one node in a Computer cluster
  • Distributed memory: each core has a separate memory they can access. MPI. Scales to many many thousdands of cores accross several nodes..

Often use a combination of both, like CUDA


– Clusters and job managers. – Jobs vs Tasks. • Creating and submitting them. • Getting the results – Code portability. – Callback functions • Advanced parallelism. – spmd mode, message passing. – GPU computing.

https://uk.mathworks.com/help/distcomp/how-parallel-computing-products-run-a-job.html