Ramy Mounir

PhD student at AI+X - USF

4202 E Fowler Ave
Tampa, FL 33620

Email: ramy (at) usf (dot) edu

I am a fifth-year PhD candidate in the AI+X research group at University of South Florida (USF) under the supervision of Dr. Sudeep Sarkar. Before that, I received my Bachelor's and Master's degrees in mechanical engineering from USF in 2019.

Interests include Computer Vision, Multimodal Perception, Hierarchical Representation Learning, Neuroscience and Cognitive Psychology.


Book Chapters

Advanced Methods and Deep Learning in Computer Vision (Chapter 12)
Ramy Mounir, Sathyanarayanan N. Aakur, Sudeep Sarkar
Elsevier, ISBN: 9780128221099
book | bibtex

We discuss three perceptual prediction models from EST in three progressive versions: temporal segmentation using perceptual prediction framework, temporal segmentation along with event working models based on attention maps, and finally spatial and temporal localization of events. The approaches can learn robust event representations from only a single-pass through an unlabeled streaming video. They show state-of-the-art performance in unsupervised temporal segmentation and spatial-temporal action localization while offering competitive performance with fully supervised baselines that require extensive amounts of annotation.


  • hierarchical.png
    STREAMER: Streaming Representation Learning and Event Segmentation in a Hierarchical Manner
    Ramy Mounir, Sujal Vijayaraghavan, Sudeep Sarkar
    Advances in Neural Information Processing Systems (NeurIPS'23)
    paper | webpage | code | docs | bibtex

    We present a novel self-supervised approach for hierarchical representation learning and segmentation of perceptual inputs in a streaming fashion. Our research addresses how to semantically group streaming inputs into chunks at various levels of a hierarchy while simultaneously learning, for each chunk, robust global representations throughout the domain. To achieve this, we propose STREAMER, an architecture that is trained layer-by-layer, adapting to the complexity of the input domain. Notably, our model is fully self-supervised and trained in a streaming manner, enabling a single pass on the training data. We evaluate the performance of our model on the egocentric EPIC-KITCHENS dataset, specifically focusing on temporal event segmentation. Furthermore, we conduct event retrieval experiments using the learned representations to demonstrate the high quality of our video event representations.

  • annotations.png
    Long-term Monitoring of Bird Flocks in the Wild
    Kshitiz, Sonu Shreshtha, Ramy Mounir, Mayank Vatsa, Richa Singh, Saket Anand, Sudeep Sarkar, Severam Mali Parihar
    International Joint Conference on Artificial Intelligence (IJCAI'23)
    paper | webpage | code | bibtex

    The work highlights the importance of monitoring wildlife for conservation and conflict management. It highlights the success of AI-based camera traps in planning conservation efforts. This project, part of the NSF-TIH Indo-US partnership, aims to analyze longer bird videos, addressing challenges in video analysis at feeding and nesting sites. The goal is to create datasets and tools for automated video analysis to understand bird behavior. A major achievement is a dataset of high-quality images of Demoiselle cranes, revealing issues with current methods in tasks like segmentation and detection. The ongoing project aims to expand the dataset and develop better video analytics for wildlife monitoring.

  • automated.png
    Towards Automated Ethogramming: Cognitively-Inspired Event Segmentation for Wildlife Monitoring
    Ramy Mounir, Ahmed Shahabaz, Roman Gula, Jorn Theuerkauf, Sudeep Sarkar
    International Journal of Computer Vision (IJCV)
    CV4Animals@CVPR'22 (poster presentation)
    paper | webpage | dataset | code | docs | bibtex

    Advances in visual perceptual tasks have been mainly driven by the amount, and types, of annotations of large scale datasets. Inspired by cognitive theories, we present a self-supervised perceptual prediction framework to tackle the problem of temporal event segmentation. Our approach is trained in an online manner on streaming input and requires only a single pass through the video, with no separate training set. Given the lack of long and realistic (includes real-world challenges) datasets, we introduce a new wildlife video dataset – nest monitoring of the Kagu (a flightless bird from New Caledonia) – to benchmark our approach. Our dataset features a video from 10 days (over 23 million frames) of continuous monitoring of the Kagu in its natural habitat. We annotate every frame with bounding boxes and event labels. Additionally, each frame is annotated with time-of-day and illumination conditions.

  • Bayesian.png
    Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration
    Aditi Bal, Ramy Mounir, Sathyanarayanan Aakur, Sudeep Sarkar, Anuj Srivastava
    ECCV'22 (Oral presentation)
    paper | webpage | bibtex

    Graph-based representations are becoming increasingly popular for representing and analyzing video data, especially in object tracking and scene understanding applications. Accordingly, an essential tool in this approach is to generate statistical inferences for graphical time series associated with videos. This paper develops a Kalman-smoothing method for estimating graphs from noisy, cluttered, and incomplete data.

  • Event.jpg
    Spatio-Temporal Event Segmentation for Wildlife Extended Videos
    Ramy Mounir, Roman Gula, Jorn Theuerkauf, Sudeep Sarkar
    CVIP'21 (Oral presentation)
    CV4Animals@CVPR'21 (Oral presentation)
    paper | webpage | bibtex

    We present a self-supervised perceptual prediction framework capable of temporal event segmentation by building stable representations of objects over time and demonstrate it on long videos, spanning several days. The self-learned attention maps effectively localize and track the event-related objects in each frame. The proposed approach does not require labels. It requires only a single pass through the video, with no separate training set.

  • bimanual.jpg
    Polyrhythmic Bimanual Coordination Training using Haptic Force Feedback
    Ramy Mounir, Kyle Reed
    ArXiv Preprint
    paper | webpage | code | bibtex

    This work looks specifically towards training humans to perform a 2:3 polyrhythmic bimanual ratio using haptic force feedback devices (SensAble Phantom OMNI). We implemented an interactive training session to help participants learn to decouple their hand motions quickly.

  • bci.png
    BCI-Controlled Hands-Free Wheelchair Navigation with Obstacle Avoidance
    Ramy Mounir, Redwan Alqasemi, Rajiv Dubey
    IROS'18 Workshop on Haptic-enabled shared control of robotic systems
    paper | webpage | video | slides | bibtex

    Brain-Computer interfaces (BCI) are widely used in reading brain signals and converting them into real-world motion. However, the signals produced from the BCI are noisy and hard to analyze. This paper looks specifically towards combining the BCI’s latest technology with ultrasonic sensors to provide a hands-free wheelchair that can efficiently navigate through crowded environments.

  • carrt.jpg
    Recent Assistive Technology Research at CARRT
    Ramy Mounir, Urvish Trivedi, Andoni Aguirrezabal, Daniel Ashley, Stephen Sundarrao, Redwan Alqasemi, Rajiv Dubey
    paper | webpage | video | bibtex

    This work aim at recovering and improving individuals’ functionality to maintain independence and self-sufficiency. This paper introduces four different assistive technology devices developed by the Center for Assistive, Rehabilitation and Robotics Technologies (CARRT) at the University of South Florida.

  • asr.png
    Speech Assistance for Persons With Speech Impediments Using Artificial Neural Networks
    Ramy Mounir, Redwan Alqasemi, Rajiv Dubey
    IMECE'17 (Oral presentation)
    paper | webpage | slides | bibtex

    This work highlights the different techniques used in deep learning to achieve ASR and how it can be modified to recognize and dictate speech from individuals with speech impediments.

Work experience



Invited Talks

Blog Articles


© You are welcome to copy this website's code for your personal use, please attribute the source with a link back to this page and remove analytics in the header.
This template is inspired from great websites I like such as this one or this one.

Last update: December 2023