Parallel computing, Distributed computing, GPU computing, Neuromorphic computing
Value of the computational intensity of an algorithm versus its floating-point operations per second, usually plotted in a 2D graph. It gives an idea of how the data bandwidth affects flops.
computational/arithmetic intensity (AI): arithmetic operations (ao) per byte AI in parallel comp
Low AI means we do few aos per bit of data, and so data bandwidth is important. As soon as we do enough aos per byte, then the blottleneck is our actual processing unit (depending on hardware/architecture), whether CPU or GPU.
https://www.wikiwand.com/en/Roofline_model
Identify bottlenecks in the code.
tic
, toc
profile
code analyzer
Vector preallocation. Arrays which dynamically change size can be slow, because array memory has to be reallocated.
Use backlash, and store sparse matrices in sparse format.
vectorization
parfor: parallel for loops. Variable types in parfors
Multithreading. Several execution threads, Concurrent computing.
MEX functions
Parallel computing with Computer clusters
spmd mode: single program multiple data. Concurrent computing.