Deep learning has become one of the most important areas for high performance computing. Classical mapping of neuron processing can be comprehensively realized by matrix algebra, in many cases using libraries such as BLAS and GEMM. The search for more efficient algorithms has been a focus of deep learning for several years. There are many efforts to reduce the dimensionality of models in the interest of reducing redundancy and improving entropy. In this talk we examine current and emerging methods for improving compute efficiency in deep learning models with focus on GPU compute targets. Notable methods include compression, pruning (static and dynamic), imposing sparsity, and reduced precision. Metrics include not only compute/memory bandwidth performance, but also training accuracy. The final part of the talk is focused on projecting some of the future improvements that we can anticipate, based on further understanding and development of the theoretical basis of deep learning models.
Back to Workshop III: HPC for Computationally and Data-Intensive Problems