Ramy Mounir, Roman Gula, Jorn Theuerkauf, Sudeep Sarkar
We present a self-supervised perceptual prediction framework capable of temporal event segmentation by building stable representations of objects over time and demonstrate it on long videos, spanning several days. The self-learned attention maps effectively localize and track the event-related objects in each frame. The proposed approach does not require labels. It requires only a single pass through the video, with no separate training set.



If you like this project, please check out other related works from our group:
This dataset was made possible through funding from the Polish National Science Centre (grant NCN 2011/01/M/NZ8/03344 and 2018/29/B/NZ8/02312). Province Sud (New Caledonia) issued all permits - from 2002 to 2020 - required for data collection. This research was supported in part by the US National Science Foundation grant IIS 1956050.
@misc{EventSegmentation,
title = {Spatio-Temporal Event Segmentation for Wildlife Extended Videos},
author = {Ramy Mounir and Roman Gula and Jorn Theuerkauf and Sudeep Sarkar},
booktitle = {International Conference on Computer Vision & Image Processing},
year = {2021},
note = {CVIP},
award = {Oral}
}