Ramy Mounir

Hello, I'm

Ramy Mounir

Researcher at Thousand Brains Project

Redwood City, San Francisco, CA

About Me

I received my Ph.D. in Computer Science and a Master's degree in Mechanical Engineering from the University of South Florida, specializing in online compositional learning, biologically inspired perception, and robotics. My research focuses on the learning principles of the brain and the mechanisms underlying perception, reasoning, and prediction.

I am currently a researcher on the Thousand Brains Project , where we build neuroscience-inspired intelligent machines by reverse engineering the neocortex.

Computer Vision Multimodal Perception Hierarchical Representation Learning Neuroscience Cognitive Psychology

Latest News

Recent updates and announcements

Book Chapters

Contributions to academic publications

Advanced Computer Vision Book
2020

Advanced Methods and Deep Learning in Computer Vision (Chapter 12)

Ramy Mounir, Sathyanarayanan N. Aakur , Sudeep Sarkar

Elsevier, ISBN: 9780128221099

We discuss three perceptual prediction models from EST in three progressive versions...

Publications

Peer-reviewed research contributions

  • 2024 NeurIPS

    Predictive Attractor Models

    Ramy Mounir, Sudeep Sarkar

    Advances in Neural Information Processing Systems

    We propose PAM, a novel sequence memory architecture with desirable generative properties. PAM is a streaming model that learns a sequence in an online, continuous manner by observing each input only once. Additionally, we find that PAM avoids catastrophic forgetting by uniquely representing past context through lateral inhibition in cortical minicolumns, which prevents new memories from overwriting previously learned knowledge. PAM generates future predictions by sampling from a union set of predicted possibilities; this generative ability is realized through an attractor model trained alongside the predictor. We show that PAM is trained with local computations through Hebbian plasticity rules in a biologically plausible framework. Other desirable traits (e.g., noise tolerance, CPU-based learning, capacity scaling) are discussed throughout the paper.

  • 2023 IJCV

    Towards Automated Ethogramming: Cognitively-Inspired Event Segmentation for Wildlife Monitoring

    Ramy Mounir, Ahmed Shahabaz, Roman Gula, Jorn Theuerkauf, Sudeep Sarkar

    International Journal of Computer Vision

    Advances in visual perceptual tasks have been mainly driven by the amount, and types, of annotations of large scale datasets. Inspired by cognitive theories, we present a self-supervised perceptual prediction framework to tackle the problem of temporal event segmentation. Our approach is trained in an online manner on streaming input and requires only a single pass through the video, with no separate training set. Given the lack of long and realistic (includes real-world challenges) datasets, we introduce a new wildlife video dataset – nest monitoring of the Kagu (a flightless bird from New Caledonia) – to benchmark our approach. Our dataset features a video from 10 days (over 23 million frames) of continuous monitoring of the Kagu in its natural habitat. We annotate every frame with bounding boxes and event labels. Additionally, each frame is annotated with time-of-day and illumination conditions.

  • 2023 IJCAI

    Long-term Monitoring of Bird Flocks in the Wild

    Kshitiz, Sonu Shreshtha, Ramy Mounir, Mayank Vatsa, Richa Singh, Saket Anand, Sudeep Sarkar, Severam Mali Parihar

    International Joint Conference on Artificial Intelligence

    The work highlights the importance of monitoring wildlife for conservation and conflict management. It highlights the success of AI-based camera traps in planning conservation efforts. This project, part of the NSF-TIH Indo-US partnership, aims to analyze longer bird videos, addressing challenges in video analysis at feeding and nesting sites. The goal is to create datasets and tools for automated video analysis to understand bird behavior. A major achievement is a dataset of high-quality images of Demoiselle cranes, revealing issues with current methods in tasks like segmentation and detection. The ongoing project aims to expand the dataset and develop better video analytics for wildlife monitoring.

  • 2023 NeurIPS

    STREAMER: Streaming Representation Learning and Event Segmentation in a Hierarchical Manner

    Ramy Mounir, Sujal Vijayaraghavan, Sudeep Sarkar

    Advances in Neural Information Processing Systems

    We present a novel self-supervised approach for hierarchical representation learning and segmentation of perceptual inputs in a streaming fashion. Our research addresses how to semantically group streaming inputs into chunks at various levels of a hierarchy while simultaneously learning, for each chunk, robust global representations throughout the domain. To achieve this, we propose STREAMER, an architecture that is trained layer-by-layer, adapting to the complexity of the input domain. Notably, our model is fully self-supervised and trained in a streaming manner, enabling a single pass on the training data. We evaluate the performance of our model on the egocentric EPIC-KITCHENS dataset, specifically focusing on temporal event segmentation. Furthermore, we conduct event retrieval experiments using the learned representations to demonstrate the high quality of our video event representations.

  • 2022 ECCV Oral

    Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration

    Aditi Bal, Ramy Mounir, Sathyanarayanan Aakur, Sudeep Sarkar, Anuj Srivastava

    European Conference on Computer Vision

    Graph-based representations are becoming increasingly popular for representing and analyzing video data, especially in object tracking and scene understanding applications. Accordingly, an essential tool in this approach is to generate statistical inferences for graphical time series associated with videos. This paper develops a Kalman-smoothing method for estimating graphs from noisy, cluttered, and incomplete data.

  • 2021 CVIP Oral

    Spatio-Temporal Event Segmentation for Wildlife Extended Videos

    Ramy Mounir, Roman Gula, Jorn Theuerkauf, Sudeep Sarkar

    International Conference on Computer Vision & Image Processing

    We present a self-supervised perceptual prediction framework capable of temporal event segmentation by building stable representations of objects over time and demonstrate it on long videos, spanning several days. The self-learned attention maps effectively localize and track the event-related objects in each frame. The proposed approach does not require labels. It requires only a single pass through the video, with no separate training set.

  • 2020 Preprint

    Polyrhythmic Bimanual Coordination Training using Haptic Force Feedback

    Ramy Mounir, Kyle Reed

    ArXiv Preprint

    This work looks specifically towards training humans to perform a 2:3 polyrhythmic bimanual ratio using haptic force feedback devices (SensAble Phantom OMNI). We implemented an interactive training session to help participants learn to decouple their hand motions quickly.

  • 2018 IROS

    BCI-Controlled Hands-Free Wheelchair Navigation with Obstacle Avoidance

    Ramy Mounir, Redwan Alqasemi, Rajiv Dubey

    International Conference on Intelligent Robots and Systems Workshop

    Brain-Computer interfaces (BCI) are widely used in reading brain signals and converting them into real-world motion. However, the signals produced from the BCI are noisy and hard to analyze. This paper looks specifically towards combining the BCI's latest technology with ultrasonic sensors to provide a hands-free wheelchair that can efficiently navigate through crowded environments.

  • 2018 RESNA

    Recent Assistive Technology Research at CARRT

    Ramy Mounir, Urvish Trivedi, Andoni Aguirrezabal, Daniel Ashley, Stephen Sundarrao, Redwan Alqasemi, Rajiv Dubey

    Rehabilitation Engineering and Assistive Technology Society of North America

    This work aim at recovering and improving individuals' functionality to maintain independence and self-sufficiency. This paper introduces four different assistive technology devices developed by the Center for Assistive, Rehabilitation and Robotics Technologies (CARRT) at the University of South Florida.

  • 2017 IMECE Oral

    Speech Assistance for Persons With Speech Impediments Using Artificial Neural Networks

    Ramy Mounir, Redwan Alqasemi, Rajiv Dubey

    International Mechanical Engineering Congress & Exposition

    This work highlights the different techniques used in deep learning to achieve ASR and how it can be modified to recognize and dictate speech from individuals with speech impediments.

Work Experience

Professional journey and career milestones

Funding & Grants

Research support and recognition

Reviewer Service

Contributing to the academic community

  • 2024 CVPR, NeurIPS, ICLR, ECCV, ICML, WACV
  • 2023 CVPR, TPAMI, ICML, NeurIPS, ICCV, WACV, IEEE RA-L
  • 2022 CVPR, ECCV [Outstanding], NeurIPS, ICLR [Highlighted], WACV, IEEE RA-L, ACMMM
  • 2021 CLVision@CVPR, ACMMM

Talks and Presentations

Sharing research with the community

Blog Articles

Technical tutorials and insights

Teaching Experience

Educational contributions and mentorship

Let's Connect

Interested in collaboration or have questions about my research? Feel free to reach out!