Active Object Recognition: a survey of a (re-)emerging domain


A tutorial at

Salerno, Italy – September 2nd, 2019



Back in 1988, Aloimonos et al. introduced the first general framework for active vision, proving that an active observer can solve basic vision problems more efficiently than a passive one. They defined an active observer as an agent that can engage in some kind of activity whose purpose is to improve the quality of the perceptual results. Active vision is demonstrated to be particularly effective to cope with problems like occlusions, limited field of view and limited resolution of the camera. Active control of the camera view-point also helps in focusing computational resources on the relevant element of the scene. Researchers also suggested that visual attention and the selective aspect of active camera control can help in tasks like learning more robust models of objects and environments with few labeled samples or autonomously. Active vision was a very popular topic in the computer vision community during the late '80s and '90s, with strong effort spent on it and a number of publications in related fields such as 3D reconstruction, autonomous robots and video-surveillance.

During the 2000s – for several reasons including the growing amount of data provided by the internet and the advent of depth sensors – active vision approaches experienced a phase of low popularity. Nevertheless, since 2010, mostly because of a re-emerging interest in robotic vision, a higher availability of low-cost robots and the increasing computational power provided by GPUs, active vision is experiencing a second hype of popularity, in particular when used in conjunction with reinforcement learning techniques and applied to tasks like environment exploration and object categorization.

In this tutorial the speaker will first present an overview of this field throughout the history, from very foundational works published back in the ‘90s to most recent approaches relying on deep neural networks and reinforcement learning. Then, the main open problems and challenges will be analyzed in more details, trying to isolate the most promising research directions and connections with other topics of interest for the CAIP community.


Presentation outline


This tutorial will last for 3 hours, with a single speaker. The tentative outline of the presentation is:


Speaker


Francesco Setti

Dr. Francesco Setti is a temporary assistant professor (RTD-a) at the Department of Computer Science at the University of Verona working on the H2020 project SARAS (collaborative robotics in surgery). His main research interest is in the development of computer vision and machine learning algorithms with applications to social/collaborative robotics and industrial automation. He co-authored 7 journal papers and more than 20 papers in international peer-reviewed conferences. He has been in the Organizing Committee of CONTACT workshop @ECCV2014 and in the Program Committee of GROW workshop @CVPR2015 and CVSport workshop @ICCV2015, @CVPR2017, and @CVPR2019. He also serves as reviewer for top ranked journals and conferences like Neurocomputing, CVIU and ACM Multimedia.