Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities

Max Planck Institute for Intelligent Systems
*Indicates Equal Contribution
Published at NeurIPS 2023

Abstract

Unsupervised video-based object-centric learning is a promising avenue to learn structured representations from large, unlabeled video collections, but previous approaches have only managed to scale to real-world datasets in restricted domains. Recently, it was shown that the reconstruction of pre-trained self-supervised features leads to object-centric representations on unconstrained real-world image datasets. Building on this approach, we propose a novel way to use such pre-trained features in the form of a temporal feature similarity loss. This loss encodes semantic and temporal correlations between image patches and is a natural way to introduce a motion bias for object discovery. We demonstrate that this loss leads to state-of-the-art performance on the challenging synthetic MOVi datasets. When used in combination with the feature reconstruction loss, our model is the first object-centric video model that scales to unconstrained video datasets such as YouTube-VIS.

Affinity matrix \( A \), transition probabilities \( T \), and decoder predictions of transition probabilities \( \hat{T} \) between patches (marked by purple and green) of the frame \( \mathbf{x}_{t} \) and patches of the next frame \( \mathbf{x}_{t+1} \) for YouTube-VIS 2021 validation videos. Red indicates maximum affinity/probability.

Interactive visualization of Affinity matrix \( A \). Hover over the patches to see the corresponding similarities.

VideoSAUR examples on YouTube-VIS, MOVi-C and MOVi-E datasets.

Poster

Related Projects

  • DINOSAUR (ICLR 2023): real-world object-centric learning for images using self-supervised feature reconstruction

BibTeX

If you find this work useful, please cite our paper :


@inproceedings{zadaianchuk2023objectcentric,
    title={Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities},
    author={Zadaianchuk, Andrii and Seitzer, Maximilian and Martius, Georg},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)},
    year={2023},
}