Outside the laboratories of psychologists, decisions seldom lead to immediate and terminal reward or punishment. Instead, they usually lead to new decision points -- that is, decision making is usually
*sequential* in nature, and decision quality is evaluated over environment *histories* rather than single, immediate outcome states. This is as true in motor control tasks as it is in options trading. This lecture provides some background Markov decision processes (MDPs), which formalize sequential decision problems, and briefly explains the main algorithmic approaches to solving them. Reinforcement learning methods are online algorithms for solving large, initially unknown MDPs from direct experience of the task environment. In addition to being (sometimes) surprisingly effective, they may actually be used by people and animals.