Gradient-based optimization has been extremely successful, especially in machine learning, efficiently scaling to models with billions of parameters. I discuss my research on creating a differentiable traffic simulator with the goal of making traffic optimization easier. There are several challenges in differentiating a practical, discretized, traffic microsimulation, owing to the inherent discontinuities and lack of sensitivity (e.g. lane changing, state/parameter-dependent switching conditions). I explain sufficient conditions for the objective to be (lipschitz) continuous and illustrate the implications and future research directions with several interactive examples.