--- title: Advanced Real-Time Faint Detection on Video emoji: 🌍 colorFrom: pink colorTo: pink sdk: gradio sdk_version: 5.25.0 app_file: app.py pinned: false license: apache-2.0 short_description: Advanced Real-Time Faint Detection on Video --- # Advanced Real-Time Faint Detection on Video This repository contains a Hugging Face Spaces demo for detecting faint (or post‑faint) scenarios in video files using an ensemble of deep learning models for person detection, pose estimation, tracking, and environmental classification. The application is built in Python and leverages: - **OpenCV** for video processing. - **Ultralytics YOLOv11 (small)** for person detection. - **DeepSORT Realtime** for robust multi‑object tracking. - **ViTPose** (from [usyd-community/vitpose-base-simple](https://huggingface.co/usyd-community/vitpose-base-simple)) for improved pose estimation. - **RT‑DETR** (from [PekingU/rtdetr_r50vd_coco_o365](https://huggingface.co/PekingU/rtdetr_r50vd_coco_o365)) for environmental object detection. - **PyTorch** as the deep learning backend. - **Gradio** for a user‑friendly web interface. ## Features - **Video File Input:** Upload an MP4 video file to the demo. - **Detection of Lying Persons with Enhanced Accuracy:** The system uses YOLOv11 to detect persons and applies a base heuristic (based on bounding box aspect ratio and vertical position) to flag potential lying postures. This is refined using ViTPose, which extracts keypoints to verify a horizontal pose. - **Velocity-Based Motionlessness Detection:** In addition to pose analysis, the system computes the velocity of the bottom-center point of each person's bounding box over consecutive frames. If the movement is below a preset threshold for a defined duration, the person is considered static. - **Advanced Tracking:** DeepSORT Realtime is used to maintain persistent identities for each person even in challenging scenarios such as occlusions or crowded scenes. - **Environmental Classification:** An RT‑DETR model ([PekingU/rtdetr_r50vd_coco_o365](https://huggingface.co/PekingU/rtdetr_r50vd_coco_o365)) detects common furniture (e.g. beds, couches, sofas). If a person’s bounding box overlaps significantly with furniture, that instance is excluded from faint alerts, reducing false positives. - **Timing and Thresholding:** The demo records the duration for which a person is deemed motionless (via either the pose-based or the velocity-based method). If this duration exceeds a user‑defined threshold (between 5 and 600 seconds), the person is flagged as "FAINTED." - **Annotated Output:** The processed video displays bounding boxes with labels indicating each person’s status (Upright, Lying Down, Static, Fainted, or On Furniture) along with the elapsed static duration. ## How It Works 1. **Detection:** A YOLOv11 model detects persons in each video frame; detections with confidence above a defined threshold are used. 2. **Tracking:** DeepSORT Realtime tracks persons across frames, assigning persistent IDs to maintain continuity. 3. **Pose Estimation:** For each person, ViTPose processes a cropped region to extract keypoints. A base heuristic (bounding box aspect ratio and vertical position) is combined with the pose data (vertical difference between shoulders and hips) to judge if the person is lying down. 4. **Velocity-Based Motionlessness:** The system also computes the bottom-center position of the bounding box for each tracked person. By comparing its position across frames, the script calculates the displacement (velocity). If the displacement remains below a defined threshold (e.g. less than 3 pixels per frame) over a period that exceeds the user-defined threshold, the person is deemed static. 5. **Environmental Classification:** The full frame is analyzed with the RT‑DETR model to detect objects. Detections for furniture (beds, couches, sofas) are used to exclude persons on such objects from static/faint detection. 6. **Timing & Status Update:** Whether through the pose-based method or the velocity-based method (or a combination of both), if a person remains motionless beyond the threshold duration (converted to frames based on FPS), they are flagged as "FAINTED." 7. **Output Generation:** Annotated frames (with colored bounding boxes and descriptive labels) are stitched together into an output video and displayed via the Gradio interface. ## Running on Hugging Face Spaces This demo is optimized for Hugging Face Spaces and supports ZeroGPU acceleration. The GPU is activated during video processing to reduce idle resource usage. ### To Deploy: 1. Fork or clone this repository on Hugging Face Spaces. 2. All dependencies in `requirements.txt` will be installed automatically. 3. Launch the Space and upload a video file to test the demo. ## Running Locally 1. **Clone the Repository:** ```bash git clone https://github.com/your_username/advanced-faint-detection.git cd advanced-faint-detection