||Visual tracking is one of the fundamental problems in computer vision. In its most general form, no prior knowledge about the target object is given, except for its initial location. The unconstrained nature of this problem makes it particularly difficult, yet applicable to a wider range of scenarios. The same holds for the related problem of Video Object Segmentation, where the task is to predict a pixel-wise segmentation mask of the target. Due to the lack of a-priori knowledge in these problems, the method must learn an appearance model of the target online. Cast as a machine learning problem, it imposes several major challenges.
This talk first gives an overview of the, so called, Discriminative Correlation Filter (DCF) framework, which has attracted considerable attention among researchers and engineers due to its excellent performance. It utilizes the Fourier transform to efficiently learn a discriminative model online. Secondly, I will present the recent ATOM tracker, employing both efficient online learning for target classification and extensive offline training to perform accurate bounding box estimation. Finally, I will present a generative appearance model seamlessly integrated into a convolutional neural network for the task of video object segmentation. Our approach is capable of learning a descriptive model of the scene through a single forward pass, enabling full end-to-end training.