Driving requires interaction with a complex, dynamic and uncertain environment. It is remarkable that many perception tasks in this domain have approached human-level performance, for instance visual object detection and pose estimation. Much of this progress has been made possible by large annotated datasets used to train deep neural networks. What is far from human-level is the robustness of the resulting systems to unforeseen or rare disturbances. Deep networks are vulnerable to adversarial perturbations, even for regression tasks such as depth estimation. In the case of stereo, where under mild assumptions the depth of the scene can be inferred uniquely, deep networks have been shown to ignore disparity and make gross errors. Even worse, these errors can be targeted, meaning that one can choose a depth map different that the real one, and perturb the images so the trained model returns the chosen depth map. While one can inoculate the system to avoid such spectacular failures, they are symptoms of the fact that the representation produced by current perception systems is inadequate to support complex planning and control tasks in mission-critical settings. What should representation return, when we know the task is safe driving? I will discuss a few desiderata that a representation for control should yield, based on a separation principle analogous to the classic one for linear-Gaussian systems. Naturally, uncertainty plays a key role in the definition of a representation that is sufficient for control. I will discuss some initial progress, and vast open challenges in this space. Cross-modal validation and exploitation of the natural statistics can offer added degrees of robustness against single-modality failure and contribute to modularity and interpretability of the overall system.