Scientific and parallel computing¶

``Computational science (also scientific computing or scientific computation (SC)) is a rapidly growing multidisciplinary field that uses advanced computing capabilities to understand and solve complex problems. It is an area of science which spans many disciplines, but at its core it involves the development of models and simulations to understand natural systems.’’ - Wikipedia

What are the applications?

Computational finance,
Computational biology,
Simulation of complex systems,
Network analysis
Multi-physics simulations,
Weather and climate models,
…

Why the need for parallelism?

“The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years.” - G. Moore, 1975

Computers should reach the physical limits of Moore’s Law at some point in the 2020s… exponential functions saturates physical capabilities!

We are hitting the wall of single processor transistor count/computing capabilities,
Some applications needs more memory than the one that could be available on a single machine,
Optimization of sequential algorithms can bring us only to a certain extent

Therefore, we need

Algorithms that can work in parallel,
A communications protocol for parallel computation integrated with our programming languages
Parallel machines that can actually run this code

Flynn’s Taxonomy¶

Let us start from the bottom: the machines.

What is a parallel computer? … well, it can be a certain number of different “things”

Multi-core computing
Symmetric multiprocessing
Distributed computing
Cluster computing
Massively parallel computing
Grid computing
General-purpose computing on graphics processing units (GPGPU)
Vector processors

Let us abstract from the machine by describing Flynn’s taxonomy

SISD	SIMD	MISD	MIMD

Single instruction stream, single data stream	Single instruction stream, multiple data stream	Multiple instruction stream, single data stream	Multiple instruction stream, multiple data stream

Parallel Computers: our computer model¶

For our task of introducing parallel computations we need to fix a specific multiprocessor model, i.e., a specific generalization of the sequential RAM model in which there is more than one processor.

Since we want to stay in a SIMD/MIMD model, we focus on a local memory machine model, i.e., a set of \(M\) processors each with its own local memory that are attached to a common communication network.

We can be more precise about the connection between processors, one can consider a network (a collection of switches connected by communication channels) and delve in a detailed way into its pattern of interconnection, i.e., into what is called the network topology.
An alternative is to summarize the network properties in terms of two parameters: latency and bandwidth
- Latency the time it takes for a message to traverse the network;
- Bandwidth the rate at which a processor can inject data into the network.

The TOP500 List¶

“… we have decided in 1993 to assemble and maintain a list of the 500 most powerful computer systems. Our list has been compiled twice a year since June 1993 with the help of high-performance computer experts, computational scientists, manufacturers, and the Internet community in general… In the present list (which we call the TOP500), we list computers ranked by their performance on the LINPACK Benchmark”

The LINPACK Benchmark.

Solution of a dense \(n\times n\) system of linear equations \(A\mathbf{x} = \mathbf{b}\), so that

\(\frac{\| A \mathbf{x} - \mathbf{b}\|}{\|A\|\|\mathbf{x}\| n \varepsilon} \leq O(1)\), for \(\varepsilon\) machine precision,
It uses a specialized right-looking LU factorization with look–ahead
Measuring
- \(R_\text{max}\) the performance in GFLOPS for the largest problem run on a machine,
- \(N_\text{max}\) the size of the largest problem run on a machine,
- \(N_{1/2}\) the size where half the \(R_\text{max}\) execution rate is achieved,
- \(R_{\text{peak}}\) the theoretical peak performance GFLOPS for the machine.

A Short Introduction to Parallel Computing

Scientific and parallel computing¶

Flynn’s Taxonomy¶

Parallel Computers: our computer model¶

The TOP500 List¶

Parallel Algorithms¶

Parallel Algorithms: speedup¶

Parallel Algorithms: Amdahl’s Law for parallel hardware¶