The next generation of supercomputers will likely consist of a hierarchy of parallel computers.
If we can define each node as a parameterized abstract machine, then it is possible to design
algorithms even if the actual hardware varies. Such an abstract machine is defined by the
OpenCL language to consist of a collection of vector (SIMD) processors, each with a small
shared memory, communicating via a larger global memory. This abstraction fits a variety of
hardware, such as Graphical Processing Units (GPUs), and multi-core processors with vector
extensions. To program such an abstract machine, one can use ideas familiar from the past:
vector algorithms from vector supercomputers, blocking or tiling algorithms from cache-based
machines, and domain decomposition from distributed memory computers. Using the OpenCL
language itself is not necessary. Examples from our GPU Particle-in-Cell code will be shown
to illustrate this approach.