High-performance computing: Cosmos — All that is, or was, or ever will be

High-performance computing

cosmos 30th March 2017 at 1:49pm

Parallel computing, Distributed computing, GPU computing, Neuromorphic computing

Roofline model

Value of the computational intensity of an algorithm versus its floating-point operations per second, usually plotted in a 2D graph. It gives an idea of how the data bandwidth affects flops.

computational/arithmetic intensity (AI): arithmetic operations (ao) per byte AI in parallel comp

Low AI means we do few aos per bit of data, and so data bandwidth is important. As soon as we do enough aos per byte, then the blottleneck is our actual processing unit (depending on hardware/architecture), whether CPU or GPU.

https://www.wikiwand.com/en/Roofline_model

High-performance computing with Matlab

Code profiling

Identify bottlenecks in the code.

tic, toc

profile

code analyzer

Vector preallocation. Arrays which dynamically change size can be slow, because array memory has to be reallocated.

Use backlash, and store sparse matrices in sparse format.

vectorization

Parallel computing

parfor: parallel for loops. Variable types in parfors

Multithreading. Several execution threads, Concurrent computing.

MEX functions

Parallel computing with Computer clusters

spmd mode: single program multiple data. Concurrent computing.

GPU computing

Minimize time spent on memory