In this talk I will propose a new way of evaluating the performance of machines in provding semantic annotation for images of natural scenes. I will argue that current evaluation practice does not scale to rich representations and propose instead asking a series of binary questions. The querying is sequential and adaptive, which allows the questions to build on each other to systematically uncover the semanitc structure and generate story lines, eventually posing very detailed questions without having their answers be almost certainly "no." I will discuss the key concept of "unpredictability" within a statistical and information-theoretic framework, some issues in machine learning, and illustrate the RTT with urban street scenes.
Back to Graduate Summer School: Computer Vision