Scheduling Cost-Sensitive Inference of AND-OR Graphs for Video Parsing

Sinisa Todorovic
Oregon State University

Recent work demonstrates that activity representations should consider context; account for parts and their spatiotemporal relations; relate group activities with actions and interactions of individuals in the group; and even enable tracking of activities for better performance. However, each of these modeling aspects has been typically studied in isolation, disregarding the others. This talk will present a principled framework for their efficient integration. The activities are probabilistically modeled using a spatiotemporal AND-OR graph (ST-AOG), which captures parts and contexts, and allows tracking of objects, individual actions, and group activities. The key challenge is the prohibitively expensive video parsing using the ST-AOG which would require running a multitude of activity detectors, and tracking their detections. This is resolved by formulating a cost-sensitive inference as Monte Carlo Tree Search (MCTS), which optimally schedules which detectors and trackers to run, and where in the space-time volume to apply them. Evaluation on the benchmark datasets demonstrates that MCTS makes such an unprecedented modeling complexity possible with two-magnitude speed-ups of inference relative to the standard message passing, and thus leads to superior performance over existing approaches.

Presentation (PowerPoint File)

Back to Graduate Summer School: Computer Vision