Luigi commited on
Commit
7a13da5
·
1 Parent(s): 4b6d068

update readme

Browse files
Files changed (1) hide show
  1. README.md +25 -23
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Real Time Faint Detection On Video
3
  emoji: 🌍
4
  colorFrom: pink
5
  colorTo: pink
@@ -8,54 +8,56 @@ sdk_version: 5.25.0
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
- short_description: Real-Time Faint Detection on Video
12
  ---
13
 
14
-
15
  # Advanced Real-Time Faint Detection on Video
16
 
17
- This repository contains a Hugging Face Spaces demo for detecting faint (or post‑faint) scenarios in video files using an advanced tracking method based on DeepSORT Realtime. The application is built in Python and leverages:
18
 
19
  - **OpenCV** for video processing.
20
- - **Ultralytics YOLOv8** for person detection.
21
  - **DeepSORT Realtime** for robust multi‑object tracking.
 
22
  - **PyTorch** as the deep learning backend.
23
  - **Gradio** for a user‑friendly web interface.
24
 
25
  ## Features
26
 
27
  - **Video File Input:** Upload an MP4 video file to the demo.
28
- - **Detection of Lying Persons:** The demo uses a YOLOv8 model to detect persons. A simple heuristic (aspect ratio and vertical position) is then applied to decide if a person is lying down.
29
- - **Advanced Tracking:** Integration of DeepSORT Realtime provides robust multi‑person tracking, even in occluded or crowded scenes.
30
- - **Timing and Thresholding:** The system records the duration that a person is detected as lying down. If they remain motionless longer than a user‑defined threshold (between 5 and 600 seconds), they are flagged as "FAINTED."
31
- - **Annotated Output:** The processed video displays bounding boxes and labels for each person along with their current status (Upright, Lying Down, or FAINTED).
 
 
 
 
32
 
33
  ## How It Works
34
 
35
  1. **Detection:**
36
- The YOLOv8 model (nano version) detects people in each frame of the video. Only detections with a confidence greater than 0.5 are passed on.
37
-
38
  2. **Advanced Tracking with DeepSORT:**
39
- The detections are fed into DeepSORT Realtime, which associates detections across frames and assigns unique IDs to each person. This tracker is robust to occlusions and can maintain consistent identities even in crowded scenes.
40
-
41
- 3. **Lying Detection Heuristic:**
42
- For each tracked person, a simple heuristic determines if the person is lying down:
43
- - The bounding box is much wider than it is tall (aspect ratio > 1.5).
44
- - The lower edge of the box is located in the lower half of the frame.
45
-
46
  4. **Timing and Status Update:**
47
- The demo records the first frame when a track meets the lying criteria and computes the duration the person remains in that state. When this duration exceeds the threshold set by the user, the system flags the track as "FAINTED".
48
-
49
  5. **Output Generation:**
50
- Annotated frames (with bounding boxes and labels) are stitched together into an output video that is returned to the user via the Gradio interface.
51
 
52
  ## Running on Hugging Face Spaces
53
 
54
- This demo is designed for Hugging Face Spaces and supports ZeroGPU acceleration. The GPU (e.g., A100) is activated only during processing, optimizing resource usage.
55
 
56
  ### To Deploy:
57
  1. Fork or clone this repository on Hugging Face Spaces.
58
- 2. The dependencies in `requirements.txt` will be installed automatically.
59
  3. Launch the Space and upload a video file to test the faint detection functionality.
60
 
61
  ## Running Locally
 
1
  ---
2
+ title: Advanced Real-Time Faint Detection on Video
3
  emoji: 🌍
4
  colorFrom: pink
5
  colorTo: pink
 
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
+ short_description: Advanced Real-Time Faint Detection on Video
12
  ---
13
 
 
14
  # Advanced Real-Time Faint Detection on Video
15
 
16
+ This repository contains a Hugging Face Spaces demo for detecting faint (or post‑faint) scenarios in video files using a combination of deep learning models for person detection, pose estimation, and advanced tracking. The application is built in Python and leverages:
17
 
18
  - **OpenCV** for video processing.
19
+ - **Ultralytics YOLOv8 (medium model)** for person detection.
20
  - **DeepSORT Realtime** for robust multi‑object tracking.
21
+ - **ViTPose** (from Hugging Face: [usyd-community/vitpose-base-simple](https://huggingface.co/usyd-community/vitpose-base-simple)) for improved pose estimation.
22
  - **PyTorch** as the deep learning backend.
23
  - **Gradio** for a user‑friendly web interface.
24
 
25
  ## Features
26
 
27
  - **Video File Input:** Upload an MP4 video file to the demo.
28
+ - **Detection of Lying Persons with Enhanced Accuracy:**
29
+ The system uses a YOLOv8 (medium) model to detect persons and a simple base heuristic (aspect ratio and vertical position) to flag possible lying postures. These detections are then refined using ViTPose to estimate keypoints and verify a horizontal pose.
30
+ - **Advanced Tracking:**
31
+ Integration of DeepSORT Realtime provides robust multi‑person tracking. Even in occluded or crowded scenes, unique IDs are maintained for each person.
32
+ - **Timing and Thresholding:**
33
+ The system records the duration for which a person is detected as lying down. If this duration exceeds a user‑defined threshold (between 5 and 600 seconds), the system flags that person as "FAINTED."
34
+ - **Annotated Output:**
35
+ Processed video frames include annotated bounding boxes and labels, indicating the current status (Upright, Lying Down, or FAINTED) along with the elapsed duration in each state.
36
 
37
  ## How It Works
38
 
39
  1. **Detection:**
40
+ The YOLOv8 medium model detects persons in each frame of the video. Only detections with a confidence greater than 0.5 are considered.
41
+
42
  2. **Advanced Tracking with DeepSORT:**
43
+ The detections are passed to DeepSORT Realtime, which associates detections across frames, assigning persistent IDs to each person.
44
+
45
+ 3. **Enhanced Lying Detection with ViTPose:**
46
+ For each person track, the region of interest is cropped from the frame and fed to the ViTPose model. Keypoint estimation is used to measure the vertical distance between the shoulders and hips. A small vertical difference (relative to the bounding box height) indicates that the person is likely lying down. This refined check is combined with the base heuristic (aspect ratio and vertical position) to improve accuracy.
47
+
 
 
48
  4. **Timing and Status Update:**
49
+ The demo records the frame index when a person first meets the lying criteria. If they remain in that state beyond the set threshold (converted to frames using the video’s FPS), they are flagged as "FAINTED." Otherwise, they are marked as "Lying Down" or "Upright" accordingly.
50
+
51
  5. **Output Generation:**
52
+ Annotated frames are stitched together into an output video that is returned to the user via the Gradio interface.
53
 
54
  ## Running on Hugging Face Spaces
55
 
56
+ This demo is designed for Hugging Face Spaces and supports ZeroGPU acceleration. The GPU (for example, an A100) is activated only during processing, which helps optimize resource usage.
57
 
58
  ### To Deploy:
59
  1. Fork or clone this repository on Hugging Face Spaces.
60
+ 2. The dependencies listed in `requirements.txt` will be installed automatically.
61
  3. Launch the Space and upload a video file to test the faint detection functionality.
62
 
63
  ## Running Locally