Ramy Mounir

Show all Hide

STREAMER: Streaming Representation Learning and Event Segmentation in a Hierarchical Manner
Ramy Mounir, Sujal Vijayaraghavan, Sudeep Sarkar
Advances in Neural Information Processing Systems (NeurIPS'23)
paper | webpage | code | docs | bibtex

We present a novel self-supervised approach for hierarchical representation learning and segmentation of perceptual inputs in a streaming fashion. Our research addresses how to semantically group streaming inputs into chunks at various levels of a hierarchy while simultaneously learning, for each chunk, robust global representations throughout the domain. To achieve this, we propose STREAMER, an architecture that is trained layer-by-layer, adapting to the complexity of the input domain. Notably, our model is fully self-supervised and trained in a streaming manner, enabling a single pass on the training data. We evaluate the performance of our model on the egocentric EPIC-KITCHENS dataset, specifically focusing on temporal event segmentation. Furthermore, we conduct event retrieval experiments using the learned representations to demonstrate the high quality of our video event representations.
Long-term Monitoring of Bird Flocks in the Wild
Kshitiz, Sonu Shreshtha, Ramy Mounir, Mayank Vatsa, Richa Singh, Saket Anand, Sudeep Sarkar, Severam Mali Parihar
International Joint Conference on Artificial Intelligence (IJCAI'23)
paper | webpage | code | bibtex

The work highlights the importance of monitoring wildlife for conservation and conflict management. It highlights the success of AI-based camera traps in planning conservation efforts. This project, part of the NSF-TIH Indo-US partnership, aims to analyze longer bird videos, addressing challenges in video analysis at feeding and nesting sites. The goal is to create datasets and tools for automated video analysis to understand bird behavior. A major achievement is a dataset of high-quality images of Demoiselle cranes, revealing issues with current methods in tasks like segmentation and detection. The ongoing project aims to expand the dataset and develop better video analytics for wildlife monitoring.
Towards Automated Ethogramming: Cognitively-Inspired Event Segmentation for Wildlife Monitoring
Ramy Mounir, Ahmed Shahabaz, Roman Gula, Jorn Theuerkauf, Sudeep Sarkar
International Journal of Computer Vision (IJCV)
CV4Animals@CVPR'22 (poster presentation)
paper | webpage | dataset | code | docs | bibtex

Advances in visual perceptual tasks have been mainly driven by the amount, and types, of annotations of large scale datasets. Inspired by cognitive theories, we present a self-supervised perceptual prediction framework to tackle the problem of temporal event segmentation. Our approach is trained in an online manner on streaming input and requires only a single pass through the video, with no separate training set. Given the lack of long and realistic (includes real-world challenges) datasets, we introduce a new wildlife video dataset – nest monitoring of the Kagu (a flightless bird from New Caledonia) – to benchmark our approach. Our dataset features a video from 10 days (over 23 million frames) of continuous monitoring of the Kagu in its natural habitat. We annotate every frame with bounding boxes and event labels. Additionally, each frame is annotated with time-of-day and illumination conditions.
Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration
Aditi Bal, Ramy Mounir, Sathyanarayanan Aakur, Sudeep Sarkar, Anuj Srivastava
ECCV'22 (Oral presentation)
paper | webpage | bibtex

Graph-based representations are becoming increasingly popular for representing and analyzing video data, especially in object tracking and scene understanding applications. Accordingly, an essential tool in this approach is to generate statistical inferences for graphical time series associated with videos. This paper develops a Kalman-smoothing method for estimating graphs from noisy, cluttered, and incomplete data.
Spatio-Temporal Event Segmentation for Wildlife Extended Videos
Ramy Mounir, Roman Gula, Jorn Theuerkauf, Sudeep Sarkar
CVIP'21 (Oral presentation)
CV4Animals@CVPR'21 (Oral presentation)
paper | webpage | bibtex

We present a self-supervised perceptual prediction framework capable of temporal event segmentation by building stable representations of objects over time and demonstrate it on long videos, spanning several days. The self-learned attention maps effectively localize and track the event-related objects in each frame. The proposed approach does not require labels. It requires only a single pass through the video, with no separate training set.
Polyrhythmic Bimanual Coordination Training using Haptic Force Feedback
Ramy Mounir, Kyle Reed
ArXiv Preprint
paper | webpage | code | bibtex

This work looks specifically towards training humans to perform a 2:3 polyrhythmic bimanual ratio using haptic force feedback devices (SensAble Phantom OMNI). We implemented an interactive training session to help participants learn to decouple their hand motions quickly.
BCI-Controlled Hands-Free Wheelchair Navigation with Obstacle Avoidance
Ramy Mounir, Redwan Alqasemi, Rajiv Dubey
IROS'18 Workshop on Haptic-enabled shared control of robotic systems
paper | webpage | video | slides | bibtex

Brain-Computer interfaces (BCI) are widely used in reading brain signals and converting them into real-world motion. However, the signals produced from the BCI are noisy and hard to analyze. This paper looks specifically towards combining the BCI’s latest technology with ultrasonic sensors to provide a hands-free wheelchair that can efficiently navigate through crowded environments.
Recent Assistive Technology Research at CARRT
Ramy Mounir, Urvish Trivedi, Andoni Aguirrezabal, Daniel Ashley, Stephen Sundarrao, Redwan Alqasemi, Rajiv Dubey
RESNA'18
paper | webpage | video | bibtex

This work aim at recovering and improving individuals’ functionality to maintain independence and self-sufficiency. This paper introduces four different assistive technology devices developed by the Center for Assistive, Rehabilitation and Robotics Technologies (CARRT) at the University of South Florida.
Speech Assistance for Persons With Speech Impediments Using Artificial Neural Networks
Ramy Mounir, Redwan Alqasemi, Rajiv Dubey
IMECE'17 (Oral presentation)
ISG'18
paper | webpage | slides | bibtex
This work highlights the different techniques used in deep learning to achieve ASR and how it can be modified to recognize and dictate speech from individuals with speech impediments.

Ramy Mounir

PhD student at AI+X - USF

News

Book Chapters

Publications

Work experience

Funding

Reviewer

Invited Talks

Blog Articles

Teaching