The Simplex Method is Strongly Polynomial for the Markov Decision Process with Fixed Discount

Yinyu Ye
Stanford University

We prove that the classic simplex method with the most-negative-reduced-cost pivoting rule (Dantzig 1947) for solving the Markov decision process (MDP) with a fixed discount rate is a strongly polynomial-time algorithm. The result seems surprising since this very pivoting rule was shown to be exponential for solving a general linear programming (LP) problem, and the simplex (or simple policy iteration) method with the smallest-index pivoting rule was shown to be exponential for solving an MDP problem regardless of discount rates. As a corollary, the policy iteration method (Howard 1960) is also a strongly polynomial-time algorithm for solving the MDP with fixed discount.

Presentation (PDF File)

Back to Workshop II: Numerical Methods for Continuous Optimization