DEEP LEARNING FOR VISUAL OBJECT TRACKING
|In its simplest definition, visual object tracking consists in the persistent recognition and localization of a generic target object in a video. Several challenges such as object occlusions, pose and scale changes, rotations and shape variations, and the presence of similar objects, must be tackled to accurately keep track of a target’s position. The ultimate goal of generic object tracking is to build robust models capable to overcome such challenging factors. In the past, such issues have been addressed by disparate principles formalizing the concepts of appearance model, motion model, and matching operation. In recent years, algorithms based on deep learning tried to learn such conceptual blocks by exploiting the ability of deep neural networks in learning complex functions from visual examples. Thanks to these advancements, today deep learning-based solutions are the way-to-go to implement strong visual tracking algorithms. The goal of this tutorial is to present the latest progress in the exploitation of deep learning for building an accurate visual tracker. After the introduction of the fundamental concepts of the visual object tracking domain, the session will describe how the state-of-the-art solutions employ deep learning architectures and optimization techniques. The tutorial will also cover the datasets, protocols and metrics available to evaluate deep learning-based trackers, as well as the most popular software tools developed by the community.|
Matteo Dunnhofer is a PhD candidate at the University of Udine (Udine, Italy). He received the BSc and MSc in Computer Science from the same institute in 2016 and 2018 respectively. His research is focused on the application and development of deep learning techniques for video understanding. In particular, his PhD thesis focused on different issues about the usage of deep learning techniques for visual object tracking. In 2021, he was part of the winning team of the Visual Object Tracking VOT2021 Long-term Challenge held at ICCV 2021.
Christian Micheloni received the M.Sc. and Ph.D. degrees from the University of Udine, Italy, in 2002 and 2006, respectively. He is Professor with the Department of Mathematics, Computer Science and Physics, University of Udine, Italy. His current interests include active vision for the wide area scene analysis, resource aware camera networks, pattern recognition, camera network self reconfiguration, video object tracking, image super-resolution, person Re-Identification and machine learning. In 2021, he supervised the winning team of the Visual Object Tracking VOT2021 Long-term Challenge held at ICCV 2021.