update readme
Browse files
README.md
CHANGED
@@ -27,43 +27,48 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
|
|
27 |
|
28 |
- **Video File Input:** Upload an MP4 video file to the demo.
|
29 |
- **Detection of Lying Persons with Enhanced Accuracy:**
|
30 |
-
The system uses YOLOv11 to detect persons and applies a base heuristic (based on bounding box aspect ratio and vertical position) to flag potential lying postures. This is refined using ViTPose, which
|
|
|
|
|
31 |
- **Advanced Tracking:**
|
32 |
-
DeepSORT Realtime is used to maintain persistent identities for each person even in
|
33 |
- **Environmental Classification:**
|
34 |
-
An RT‑DETR model ([PekingU/rtdetr_r50vd_coco_o365](https://huggingface.co/PekingU/rtdetr_r50vd_coco_o365)) detects common furniture
|
35 |
- **Timing and Thresholding:**
|
36 |
-
The demo
|
37 |
- **Annotated Output:**
|
38 |
-
The processed video displays bounding boxes with labels indicating each person’s status (Upright, Lying Down, Fainted, or On Furniture) along with the elapsed duration.
|
39 |
|
40 |
## How It Works
|
41 |
|
42 |
1. **Detection:**
|
43 |
-
A YOLOv11 model detects persons in each video frame
|
44 |
|
45 |
2. **Tracking:**
|
46 |
-
DeepSORT Realtime tracks
|
47 |
|
48 |
3. **Pose Estimation:**
|
49 |
-
For each person, ViTPose
|
50 |
|
51 |
-
4. **
|
52 |
-
The
|
53 |
|
54 |
-
5. **
|
55 |
-
|
56 |
|
57 |
-
6. **
|
58 |
-
|
|
|
|
|
|
|
59 |
|
60 |
## Running on Hugging Face Spaces
|
61 |
|
62 |
-
This demo is
|
63 |
|
64 |
### To Deploy:
|
65 |
1. Fork or clone this repository on Hugging Face Spaces.
|
66 |
-
2.
|
67 |
3. Launch the Space and upload a video file to test the demo.
|
68 |
|
69 |
## Running Locally
|
|
|
27 |
|
28 |
- **Video File Input:** Upload an MP4 video file to the demo.
|
29 |
- **Detection of Lying Persons with Enhanced Accuracy:**
|
30 |
+
The system uses YOLOv11 to detect persons and applies a base heuristic (based on bounding box aspect ratio and vertical position) to flag potential lying postures. This is refined using ViTPose, which extracts keypoints to verify a horizontal pose.
|
31 |
+
- **Velocity-Based Motionlessness Detection:**
|
32 |
+
In addition to pose analysis, the system computes the velocity of the bottom-center point of each person's bounding box over consecutive frames. If the movement is below a preset threshold for a defined duration, the person is considered static.
|
33 |
- **Advanced Tracking:**
|
34 |
+
DeepSORT Realtime is used to maintain persistent identities for each person even in challenging scenarios such as occlusions or crowded scenes.
|
35 |
- **Environmental Classification:**
|
36 |
+
An RT‑DETR model ([PekingU/rtdetr_r50vd_coco_o365](https://huggingface.co/PekingU/rtdetr_r50vd_coco_o365)) detects common furniture (e.g. beds, couches, sofas). If a person’s bounding box overlaps significantly with furniture, that instance is excluded from faint alerts, reducing false positives.
|
37 |
- **Timing and Thresholding:**
|
38 |
+
The demo records the duration for which a person is deemed motionless (via either the pose-based or the velocity-based method). If this duration exceeds a user‑defined threshold (between 5 and 600 seconds), the person is flagged as "FAINTED."
|
39 |
- **Annotated Output:**
|
40 |
+
The processed video displays bounding boxes with labels indicating each person’s status (Upright, Lying Down, Static, Fainted, or On Furniture) along with the elapsed static duration.
|
41 |
|
42 |
## How It Works
|
43 |
|
44 |
1. **Detection:**
|
45 |
+
A YOLOv11 model detects persons in each video frame; detections with confidence above a defined threshold are used.
|
46 |
|
47 |
2. **Tracking:**
|
48 |
+
DeepSORT Realtime tracks persons across frames, assigning persistent IDs to maintain continuity.
|
49 |
|
50 |
3. **Pose Estimation:**
|
51 |
+
For each person, ViTPose processes a cropped region to extract keypoints. A base heuristic (bounding box aspect ratio and vertical position) is combined with the pose data (vertical difference between shoulders and hips) to judge if the person is lying down.
|
52 |
|
53 |
+
4. **Velocity-Based Motionlessness:**
|
54 |
+
The system also computes the bottom-center position of the bounding box for each tracked person. By comparing its position across frames, the script calculates the displacement (velocity). If the displacement remains below a defined threshold (e.g. less than 3 pixels per frame) over a period that exceeds the user-defined threshold, the person is deemed static.
|
55 |
|
56 |
+
5. **Environmental Classification:**
|
57 |
+
The full frame is analyzed with the RT‑DETR model to detect objects. Detections for furniture (beds, couches, sofas) are used to exclude persons on such objects from static/faint detection.
|
58 |
|
59 |
+
6. **Timing & Status Update:**
|
60 |
+
Whether through the pose-based method or the velocity-based method (or a combination of both), if a person remains motionless beyond the threshold duration (converted to frames based on FPS), they are flagged as "FAINTED."
|
61 |
+
|
62 |
+
7. **Output Generation:**
|
63 |
+
Annotated frames (with colored bounding boxes and descriptive labels) are stitched together into an output video and displayed via the Gradio interface.
|
64 |
|
65 |
## Running on Hugging Face Spaces
|
66 |
|
67 |
+
This demo is optimized for Hugging Face Spaces and supports ZeroGPU acceleration. The GPU is activated during video processing to reduce idle resource usage.
|
68 |
|
69 |
### To Deploy:
|
70 |
1. Fork or clone this repository on Hugging Face Spaces.
|
71 |
+
2. All dependencies in `requirements.txt` will be installed automatically.
|
72 |
3. Launch the Space and upload a video file to test the demo.
|
73 |
|
74 |
## Running Locally
|