Luigi commited on
Commit
a20e63d
·
1 Parent(s): 5357618

update readme

Browse files
Files changed (1) hide show
  1. README.md +21 -16
README.md CHANGED
@@ -27,43 +27,48 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
27
 
28
  - **Video File Input:** Upload an MP4 video file to the demo.
29
  - **Detection of Lying Persons with Enhanced Accuracy:**
30
- The system uses YOLOv11 to detect persons and applies a base heuristic (based on bounding box aspect ratio and vertical position) to flag potential lying postures. This is refined using ViTPose, which estimates keypoints to verify a horizontal pose.
 
 
31
  - **Advanced Tracking:**
32
- DeepSORT Realtime is used to maintain persistent identities for each person even in complex scenarios such as occlusions or crowded scenes.
33
  - **Environmental Classification:**
34
- An RT‑DETR model ([PekingU/rtdetr_r50vd_coco_o365](https://huggingface.co/PekingU/rtdetr_r50vd_coco_o365)) detects common furniture like beds, couches, and sofas. If a person’s bounding box overlaps significantly with furniture, that instance is excluded from faint alarms, reducing false positives.
35
  - **Timing and Thresholding:**
36
- The demo accumulates the time for which a person is detected as lying down (and not on furniture). If this duration exceeds a user‑defined threshold (from 5 to 600 seconds), the person is flagged as "FAINTED."
37
  - **Annotated Output:**
38
- The processed video displays bounding boxes with labels indicating each person’s status (Upright, Lying Down, Fainted, or On Furniture) along with the elapsed duration.
39
 
40
  ## How It Works
41
 
42
  1. **Detection:**
43
- A YOLOv11 model detects persons in each video frame, filtering by confidence.
44
 
45
  2. **Tracking:**
46
- DeepSORT Realtime tracks detected persons across frames and assigns unique IDs.
47
 
48
  3. **Pose Estimation:**
49
- For each person, ViTPose analyzes the cropped region to extract keypoints and determine if the person is in a horizontal (lying) posture. A base heuristic (e.g., wide bounding boxes in the lower part of the frame) is also applied.
50
 
51
- 4. **Environmental Classification:**
52
- The entire frame is processed with the RT‑DETR model (`PekingU/rtdetr_r50vd_coco_o365`) to detect objects. Detections corresponding to furniture (e.g., bed, couch, sofa) are used to check if a person’s bounding box overlaps significantly with furniture, thereby suppressing false alarms.
53
 
54
- 5. **Timing & Status Update:**
55
- If a person remains lying down (and not on furniture) for longer than the selected threshold, they are flagged as "FAINTED."
56
 
57
- 6. **Output Generation:**
58
- Annotated frames are compiled into an output video which is displayed via the Gradio interface.
 
 
 
59
 
60
  ## Running on Hugging Face Spaces
61
 
62
- This demo is designed for Hugging Face Spaces and supports ZeroGPU acceleration. The GPU is activated during video processing to optimize resource usage.
63
 
64
  ### To Deploy:
65
  1. Fork or clone this repository on Hugging Face Spaces.
66
- 2. Dependencies in `requirements.txt` will install automatically.
67
  3. Launch the Space and upload a video file to test the demo.
68
 
69
  ## Running Locally
 
27
 
28
  - **Video File Input:** Upload an MP4 video file to the demo.
29
  - **Detection of Lying Persons with Enhanced Accuracy:**
30
+ The system uses YOLOv11 to detect persons and applies a base heuristic (based on bounding box aspect ratio and vertical position) to flag potential lying postures. This is refined using ViTPose, which extracts keypoints to verify a horizontal pose.
31
+ - **Velocity-Based Motionlessness Detection:**
32
+ In addition to pose analysis, the system computes the velocity of the bottom-center point of each person's bounding box over consecutive frames. If the movement is below a preset threshold for a defined duration, the person is considered static.
33
  - **Advanced Tracking:**
34
+ DeepSORT Realtime is used to maintain persistent identities for each person even in challenging scenarios such as occlusions or crowded scenes.
35
  - **Environmental Classification:**
36
+ An RT‑DETR model ([PekingU/rtdetr_r50vd_coco_o365](https://huggingface.co/PekingU/rtdetr_r50vd_coco_o365)) detects common furniture (e.g. beds, couches, sofas). If a person’s bounding box overlaps significantly with furniture, that instance is excluded from faint alerts, reducing false positives.
37
  - **Timing and Thresholding:**
38
+ The demo records the duration for which a person is deemed motionless (via either the pose-based or the velocity-based method). If this duration exceeds a user‑defined threshold (between 5 and 600 seconds), the person is flagged as "FAINTED."
39
  - **Annotated Output:**
40
+ The processed video displays bounding boxes with labels indicating each person’s status (Upright, Lying Down, Static, Fainted, or On Furniture) along with the elapsed static duration.
41
 
42
  ## How It Works
43
 
44
  1. **Detection:**
45
+ A YOLOv11 model detects persons in each video frame; detections with confidence above a defined threshold are used.
46
 
47
  2. **Tracking:**
48
+ DeepSORT Realtime tracks persons across frames, assigning persistent IDs to maintain continuity.
49
 
50
  3. **Pose Estimation:**
51
+ For each person, ViTPose processes a cropped region to extract keypoints. A base heuristic (bounding box aspect ratio and vertical position) is combined with the pose data (vertical difference between shoulders and hips) to judge if the person is lying down.
52
 
53
+ 4. **Velocity-Based Motionlessness:**
54
+ The system also computes the bottom-center position of the bounding box for each tracked person. By comparing its position across frames, the script calculates the displacement (velocity). If the displacement remains below a defined threshold (e.g. less than 3 pixels per frame) over a period that exceeds the user-defined threshold, the person is deemed static.
55
 
56
+ 5. **Environmental Classification:**
57
+ The full frame is analyzed with the RT‑DETR model to detect objects. Detections for furniture (beds, couches, sofas) are used to exclude persons on such objects from static/faint detection.
58
 
59
+ 6. **Timing & Status Update:**
60
+ Whether through the pose-based method or the velocity-based method (or a combination of both), if a person remains motionless beyond the threshold duration (converted to frames based on FPS), they are flagged as "FAINTED."
61
+
62
+ 7. **Output Generation:**
63
+ Annotated frames (with colored bounding boxes and descriptive labels) are stitched together into an output video and displayed via the Gradio interface.
64
 
65
  ## Running on Hugging Face Spaces
66
 
67
+ This demo is optimized for Hugging Face Spaces and supports ZeroGPU acceleration. The GPU is activated during video processing to reduce idle resource usage.
68
 
69
  ### To Deploy:
70
  1. Fork or clone this repository on Hugging Face Spaces.
71
+ 2. All dependencies in `requirements.txt` will be installed automatically.
72
  3. Launch the Space and upload a video file to test the demo.
73
 
74
  ## Running Locally