Recent studies have illustrated the interest of using misfit functions based on optimal transport distances in an attempt to mitigate cycle skipping, a well known issue in Full Waveform Inversion. The optimal transport distance allows us to perform global comparisons between traces and shot gathers, compared to local comparisons associated with L^p norms. This property increases the convexity of the resulting misfit function with respect to the wave velocity model. However, the use of optimal transport distance for the comparison of oscillatory waveforms is not straightforward, mainly because it is designed for the comparison of positive measures. In this presentation, we shall review one possible implementation based on the dual form of the Wasserstein-1 distance, before discussing how this strategy could be extended to more general and possibly more convex Wasserstein-p distances.