Spaces:

Luigi
/

Real-Time-Faint-Detection-on-Video

Sleeping

App Files Files Community

Luigi commited on Apr 14

Commit

a20e63d

1 Parent(s): 5357618

update readme

Browse files

Files changed (1) hide show

README.md +21 -16

README.md CHANGED Viewed

@@ -27,43 +27,48 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
 - **Video File Input:** Upload an MP4 video file to the demo.
 - **Detection of Lying Persons with Enhanced Accuracy:**
-  The system uses YOLOv11 to detect persons and applies a base heuristic (based on bounding box aspect ratio and vertical position) to flag potential lying postures. This is refined using ViTPose, which estimates keypoints to verify a horizontal pose.
 - **Advanced Tracking:**
-  DeepSORT Realtime is used to maintain persistent identities for each person even in complex scenarios such as occlusions or crowded scenes.
 - **Environmental Classification:**
-  An RT‑DETR model ([PekingU/rtdetr_r50vd_coco_o365](https://huggingface.co/PekingU/rtdetr_r50vd_coco_o365)) detects common furniture like beds, couches, and sofas. If a person’s bounding box overlaps significantly with furniture, that instance is excluded from faint alarms, reducing false positives.
 - **Timing and Thresholding:**
-  The demo accumulates the time for which a person is detected as lying down (and not on furniture). If this duration exceeds a user‑defined threshold (from 5 to 600 seconds), the person is flagged as "FAINTED."
 - **Annotated Output:**
-  The processed video displays bounding boxes with labels indicating each person’s status (Upright, Lying Down, Fainted, or On Furniture) along with the elapsed duration.
 ## How It Works
 1. **Detection:**
-   A YOLOv11 model detects persons in each video frame, filtering by confidence.
 2. **Tracking:**
-   DeepSORT Realtime tracks detected persons across frames and assigns unique IDs.
 3. **Pose Estimation:**
-   For each person, ViTPose analyzes the cropped region to extract keypoints and determine if the person is in a horizontal (lying) posture. A base heuristic (e.g., wide bounding boxes in the lower part of the frame) is also applied.
-4. **Environmental Classification:**
-   The entire frame is processed with the RT‑DETR model (`PekingU/rtdetr_r50vd_coco_o365`) to detect objects. Detections corresponding to furniture (e.g., bed, couch, sofa) are used to check if a person’s bounding box overlaps significantly with furniture, thereby suppressing false alarms.
-5. **Timing & Status Update:**
-   If a person remains lying down (and not on furniture) for longer than the selected threshold, they are flagged as "FAINTED."
-6. **Output Generation:**
-   Annotated frames are compiled into an output video which is displayed via the Gradio interface.
 ## Running on Hugging Face Spaces
-This demo is designed for Hugging Face Spaces and supports ZeroGPU acceleration. The GPU is activated during video processing to optimize resource usage.
 ### To Deploy:
 1. Fork or clone this repository on Hugging Face Spaces.
-2. Dependencies in `requirements.txt` will install automatically.
 3. Launch the Space and upload a video file to test the demo.
 ## Running Locally

 - **Video File Input:** Upload an MP4 video file to the demo.
 - **Detection of Lying Persons with Enhanced Accuracy:**
+  The system uses YOLOv11 to detect persons and applies a base heuristic (based on bounding box aspect ratio and vertical position) to flag potential lying postures. This is refined using ViTPose, which extracts keypoints to verify a horizontal pose.
+- **Velocity-Based Motionlessness Detection:**
+  In addition to pose analysis, the system computes the velocity of the bottom-center point of each person's bounding box over consecutive frames. If the movement is below a preset threshold for a defined duration, the person is considered static.
 - **Advanced Tracking:**
+  DeepSORT Realtime is used to maintain persistent identities for each person even in challenging scenarios such as occlusions or crowded scenes.
 - **Environmental Classification:**
+  An RT‑DETR model ([PekingU/rtdetr_r50vd_coco_o365](https://huggingface.co/PekingU/rtdetr_r50vd_coco_o365)) detects common furniture (e.g. beds, couches, sofas). If a person’s bounding box overlaps significantly with furniture, that instance is excluded from faint alerts, reducing false positives.
 - **Timing and Thresholding:**
+  The demo records the duration for which a person is deemed motionless (via either the pose-based or the velocity-based method). If this duration exceeds a user‑defined threshold (between 5 and 600 seconds), the person is flagged as "FAINTED."
 - **Annotated Output:**
+  The processed video displays bounding boxes with labels indicating each person’s status (Upright, Lying Down, Static, Fainted, or On Furniture) along with the elapsed static duration.
 ## How It Works
 1. **Detection:**
+   A YOLOv11 model detects persons in each video frame; detections with confidence above a defined threshold are used.
 2. **Tracking:**
+   DeepSORT Realtime tracks persons across frames, assigning persistent IDs to maintain continuity.
 3. **Pose Estimation:**
+   For each person, ViTPose processes a cropped region to extract keypoints. A base heuristic (bounding box aspect ratio and vertical position) is combined with the pose data (vertical difference between shoulders and hips) to judge if the person is lying down.
+4. **Velocity-Based Motionlessness:**
+   The system also computes the bottom-center position of the bounding box for each tracked person. By comparing its position across frames, the script calculates the displacement (velocity). If the displacement remains below a defined threshold (e.g. less than 3 pixels per frame) over a period that exceeds the user-defined threshold, the person is deemed static.
+5. **Environmental Classification:**
+   The full frame is analyzed with the RT‑DETR model to detect objects. Detections for furniture (beds, couches, sofas) are used to exclude persons on such objects from static/faint detection.
+6. **Timing & Status Update:**
+   Whether through the pose-based method or the velocity-based method (or a combination of both), if a person remains motionless beyond the threshold duration (converted to frames based on FPS), they are flagged as "FAINTED."
+7. **Output Generation:**
+   Annotated frames (with colored bounding boxes and descriptive labels) are stitched together into an output video and displayed via the Gradio interface.
 ## Running on Hugging Face Spaces
+This demo is optimized for Hugging Face Spaces and supports ZeroGPU acceleration. The GPU is activated during video processing to reduce idle resource usage.
 ### To Deploy:
 1. Fork or clone this repository on Hugging Face Spaces.
+2. All dependencies in `requirements.txt` will be installed automatically.
 3. Launch the Space and upload a video file to test the demo.
 ## Running Locally