When humans encounter images or videos of the visual world, our visual system is capable of extracting a rich plethora of information in as short as a single glance. A large portion of this information is related to semantic meanings, such as objects, scenes and purposeful motions.
This ability still poses a large challenge to today's computer vision algorithms. In this talk, we will introduce algorithms that perform such high level visual recognition tasks as object, scene, event and human motion categorization. Furthermore, we will attempt to achieve these recognition tasks under various learning conditions that mimic the human learning conditions, such as one-shot learning, unsupervised learning, and incremental learning. In object categorization, we will show two projects focusing on one-shot learning as well as incremental learning of objects. We will also show a recent study of true 3D object categorization. Beyond objects, we will introduce several studies on scene and event categorization. Finally, we will finish the talk with a study on unsupervised learning of human motion categories.