We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although the problem has been widely studied empirically, much remains unknown concerning the theory and practice underlying this trade-off. In this talk, we will quantify the trade-off in terms of the gap between the risk for adversarial examples and the risk for non-adversarial examples. The challenge is to provide tight bounds on this computationally-inefficient quantity in terms of a surrogate loss. We give an optimal upper bound on this quantity in terms of classification-calibrated loss, which matches the lower bound in the worst case. Inspired by our theoretical analysis, we design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets and shows strong interpretability. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of 1,995 submissions, surpassing the runner-up approach by 11.41% in terms of mean L2 perturbation distance.
Joint work with Yaodong Yu (UCB), Jiantao Jiao (UCB), Eric P. Xing (CMU), Laurent El Ghaoui (UCB), and Michael I. Jordan (UCB)