Algorithms have two costs: arithmetic and communication, i.e. moving data between levels of a memory hierarchy or processors over a network. Communication costs (measured in time or energy per operation) already greatly exceed arithmetic costs, and the gap is growing over time following technological trends. Thus our goal is to design algorithms that minimize communication. We present new algorithms that communicate asymptotically less than their classical counterparts, for a variety of linear algebra and machine learning problems, demonstrating large speedups on a variety of architectures. Some of these algorithms attain provable lower bounds on communication. We describe a generalization of these lower bounds and optimal algorithms to arbitrary code that can be expressed as nested loops accessing arrays, assuming only that array subscripts are affine functions of the loop indices, a special case being convolutional neural nets.
Back to Long Programs