Recently, Artificial Intelligence (AI), exploiting both bio-inspired algorithms, such as Spike-Timing-Dependent-Plasticity (STDP), or back-propagation algorithms, as in Deep Neural Networks (DNNs), can perform accurate classification of large amounts of data. However, to further proceed in the development of AI, novel hardware technologies supporting fast calculation should be developed. Recently, many algorithms have been efficiently mapped into arrays of Non-Volatile Memories, such as Phase-Change Memory (PCM) or Resistive Memory (RRAM).
In this presentation, we provide a summary of recent progress in hardware acceleration of AI, such as the training of Fully Connected (FC) DNNs based on large arrays of PCMs. In such schemes, weights are encoded in the conductances of resistive devices, with orders of magnitude estimated increase in speed and energy efficiency with respect to current state of the art based on CPUs and GPUs.
In addition to speed and power consumption, the desired chip for FC DNNs training should also provide equivalent accuracy to software training. We recently provided a novel weight scheme based on PCM and CMOS circuitry able to obtain software-equivalent training accuracy on MNIST and other small and medium size datasets. Results were obtained with a mixed hardware-software experiment where CMOS circuitry was accurately simulated and PCM behavior was measured from real device arrays.
After this, we provide some design guidelines for the implementation of a multicore chip able to perform training of DNNs. This is obtained with many NVM arrays connected through a routing circuitry able to efficiently distribute internal signals, external inputs such as images and labels and, finally, providing the trained weights to the output.
Stefano Ambrogio, Pritish Narayanan, Hsinyu Tsai, Charles Mackin, An Chen, Bob M. Shelby, Geoffrey W. Burr
IBM Research-Almaden, 650 Harry Road, 95120 San Jose, USA