add zero gpu support
Browse files- README.md +26 -25
- app.py +80 -109
- packages.txt +1 -1
- requirementx.txt +1 -0
README.md
CHANGED
@@ -12,54 +12,55 @@ short_description: Real-Time Faint Detection on Video
|
|
12 |
---
|
13 |
|
14 |
|
15 |
-
# Real-Time Faint Detection on Video
|
16 |
|
17 |
-
This repository contains a Hugging Face Spaces demo
|
18 |
|
19 |
- **OpenCV** for video processing.
|
20 |
- **Ultralytics YOLOv8** for person detection.
|
21 |
-
- **
|
|
|
22 |
- **Gradio** for a user‑friendly web interface.
|
23 |
|
24 |
## Features
|
25 |
|
26 |
-
- **Video File Input:** Upload
|
27 |
-
- **Detection of Lying Persons:** The
|
28 |
-
- **Tracking
|
29 |
-
- **
|
30 |
-
- **Annotated Output:** The processed video displays bounding boxes and labels
|
31 |
|
32 |
## How It Works
|
33 |
|
34 |
1. **Detection:**
|
35 |
-
The YOLOv8 model (nano version
|
36 |
-
|
37 |
-
2. **Heuristic for Falling:**
|
38 |
-
A person is assumed to be lying down if:
|
39 |
-
- Their bounding box is significantly wider than tall (aspect ratio > 1.5).
|
40 |
-
- The lower part of the bounding box is in the bottom half of the frame (suggesting the person is on the floor).
|
41 |
|
42 |
-
|
43 |
-
|
44 |
|
45 |
-
|
46 |
-
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
5. **Output Generation:**
|
49 |
-
Annotated frames are stitched
|
50 |
|
51 |
## Running on Hugging Face Spaces
|
52 |
|
53 |
-
This demo is designed for Hugging Face Spaces and
|
54 |
|
55 |
-
### To
|
56 |
1. Fork or clone this repository on Hugging Face Spaces.
|
57 |
-
2. The
|
58 |
-
3. Launch the Space and upload a video to test the faint detection functionality.
|
59 |
|
60 |
## Running Locally
|
61 |
|
62 |
1. **Clone the Repository:**
|
63 |
```bash
|
64 |
-
git clone https://github.com/your_username/
|
65 |
-
cd
|
|
|
12 |
---
|
13 |
|
14 |
|
15 |
+
# Advanced Real-Time Faint Detection on Video
|
16 |
|
17 |
+
This repository contains a Hugging Face Spaces demo for detecting faint (or post‑faint) scenarios in video files using an advanced tracking method based on DeepSORT Realtime. The application is built in Python and leverages:
|
18 |
|
19 |
- **OpenCV** for video processing.
|
20 |
- **Ultralytics YOLOv8** for person detection.
|
21 |
+
- **DeepSORT Realtime** for robust multi‑object tracking.
|
22 |
+
- **PyTorch** as the deep learning backend.
|
23 |
- **Gradio** for a user‑friendly web interface.
|
24 |
|
25 |
## Features
|
26 |
|
27 |
+
- **Video File Input:** Upload an MP4 video file to the demo.
|
28 |
+
- **Detection of Lying Persons:** The demo uses a YOLOv8 model to detect persons. A simple heuristic (aspect ratio and vertical position) is then applied to decide if a person is lying down.
|
29 |
+
- **Advanced Tracking:** Integration of DeepSORT Realtime provides robust multi‑person tracking, even in occluded or crowded scenes.
|
30 |
+
- **Timing and Thresholding:** The system records the duration that a person is detected as lying down. If they remain motionless longer than a user‑defined threshold (between 5 and 600 seconds), they are flagged as "FAINTED."
|
31 |
+
- **Annotated Output:** The processed video displays bounding boxes and labels for each person along with their current status (Upright, Lying Down, or FAINTED).
|
32 |
|
33 |
## How It Works
|
34 |
|
35 |
1. **Detection:**
|
36 |
+
The YOLOv8 model (nano version) detects people in each frame of the video. Only detections with a confidence greater than 0.5 are passed on.
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
+
2. **Advanced Tracking with DeepSORT:**
|
39 |
+
The detections are fed into DeepSORT Realtime, which associates detections across frames and assigns unique IDs to each person. This tracker is robust to occlusions and can maintain consistent identities even in crowded scenes.
|
40 |
|
41 |
+
3. **Lying Detection Heuristic:**
|
42 |
+
For each tracked person, a simple heuristic determines if the person is lying down:
|
43 |
+
- The bounding box is much wider than it is tall (aspect ratio > 1.5).
|
44 |
+
- The lower edge of the box is located in the lower half of the frame.
|
45 |
+
|
46 |
+
4. **Timing and Status Update:**
|
47 |
+
The demo records the first frame when a track meets the lying criteria and computes the duration the person remains in that state. When this duration exceeds the threshold set by the user, the system flags the track as "FAINTED".
|
48 |
|
49 |
5. **Output Generation:**
|
50 |
+
Annotated frames (with bounding boxes and labels) are stitched together into an output video that is returned to the user via the Gradio interface.
|
51 |
|
52 |
## Running on Hugging Face Spaces
|
53 |
|
54 |
+
This demo is designed for Hugging Face Spaces and supports ZeroGPU acceleration. The GPU (e.g., A100) is activated only during processing, optimizing resource usage.
|
55 |
|
56 |
+
### To Deploy:
|
57 |
1. Fork or clone this repository on Hugging Face Spaces.
|
58 |
+
2. The dependencies in `requirements.txt` will be installed automatically.
|
59 |
+
3. Launch the Space and upload a video file to test the faint detection functionality.
|
60 |
|
61 |
## Running Locally
|
62 |
|
63 |
1. **Clone the Repository:**
|
64 |
```bash
|
65 |
+
git clone https://github.com/your_username/advanced-faint-detection.git
|
66 |
+
cd advanced-faint-detection
|
app.py
CHANGED
@@ -1,28 +1,24 @@
|
|
|
|
1 |
import cv2
|
2 |
import numpy as np
|
3 |
-
import time
|
4 |
import os
|
5 |
import tempfile
|
6 |
import gradio as gr
|
7 |
-
from ultralytics import YOLO #
|
|
|
8 |
|
|
|
9 |
def process_video(video_file, threshold_secs):
|
10 |
"""
|
11 |
-
Process
|
12 |
-
|
13 |
-
|
14 |
-
video_file: Path to the uploaded video file.
|
15 |
-
threshold_secs: Duration threshold in seconds that a person must remain
|
16 |
-
lying down to be flagged as 'FAINTED'.
|
17 |
-
|
18 |
-
Returns:
|
19 |
-
out_path: Path to the processed video with annotations.
|
20 |
"""
|
21 |
-
# Open the
|
22 |
cap = cv2.VideoCapture(video_file)
|
23 |
if not cap.isOpened():
|
24 |
raise ValueError("Error opening the video file.")
|
25 |
-
|
26 |
fps = cap.get(cv2.CAP_PROP_FPS)
|
27 |
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
28 |
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
@@ -30,125 +26,99 @@ def process_video(video_file, threshold_secs):
|
|
30 |
out_path = os.path.join(tempfile.gettempdir(), "output.mp4")
|
31 |
out = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
|
32 |
|
33 |
-
# Load YOLOv8 model (
|
34 |
-
# Here we use the nano model for speed.
|
35 |
model = YOLO("yolov8n.pt")
|
36 |
|
37 |
-
#
|
38 |
-
|
39 |
-
|
|
|
|
|
|
|
40 |
frame_index = 0
|
41 |
-
threshold_frames = threshold_secs * fps # convert seconds to
|
42 |
|
43 |
-
# Main processing loop (frame-by-frame)
|
44 |
while True:
|
45 |
ret, frame = cap.read()
|
46 |
if not ret:
|
47 |
break
|
48 |
frame_index += 1
|
49 |
-
|
50 |
# Run YOLO detection on the current frame
|
51 |
results = model(frame)[0]
|
|
|
|
|
|
|
|
|
52 |
if results.boxes is not None:
|
53 |
-
boxes = results.boxes.xyxy.cpu().numpy() #
|
54 |
-
classes = results.boxes.cls.cpu().numpy()
|
55 |
confidences = results.boxes.conf.cpu().numpy()
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
|
65 |
-
#
|
66 |
-
#
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
w = x2 - x1
|
73 |
h = y2 - y1
|
74 |
aspect_ratio = w / float(h) if h > 0 else 0
|
75 |
-
|
76 |
if aspect_ratio > 1.5 and y2 > height * 0.5:
|
77 |
-
|
78 |
-
cy = int((y1 + y2) / 2)
|
79 |
-
detections.append((cx, cy, x1, y1, x2, y2))
|
80 |
-
|
81 |
-
# Simple tracking: match detections to existing tracks based on centroid proximity.
|
82 |
-
updated_track_ids = set()
|
83 |
-
for det in detections:
|
84 |
-
cx, cy, x1, y1, x2, y2 = det
|
85 |
-
matched = None
|
86 |
-
for tid, track in tracks.items():
|
87 |
-
prev_cx, prev_cy = track["centroid"]
|
88 |
-
dist = np.sqrt((cx - prev_cx) ** 2 + (cy - prev_cy) ** 2)
|
89 |
-
if dist < 50: # if distance is less than 50 pixels, consider it the same person
|
90 |
-
matched = tid
|
91 |
-
break
|
92 |
-
if matched is not None:
|
93 |
-
# Update existing track
|
94 |
-
tracks[matched]["centroid"] = (cx, cy)
|
95 |
-
tracks[matched]["last_update"] = frame_index
|
96 |
-
# If this is the first frame the person is detected as lying down, record the start time.
|
97 |
-
if "start_time" not in tracks[matched]:
|
98 |
-
tracks[matched]["start_time"] = frame_index
|
99 |
-
updated_track_ids.add(matched)
|
100 |
-
tracks[matched]["box"] = (x1, y1, x2, y2)
|
101 |
-
else:
|
102 |
-
# Create a new track for a new detection
|
103 |
-
tracks[next_track_id] = {
|
104 |
-
"centroid": (cx, cy),
|
105 |
-
"start_time": frame_index,
|
106 |
-
"last_update": frame_index,
|
107 |
-
"box": (x1, y1, x2, y2),
|
108 |
-
"fainted": False,
|
109 |
-
}
|
110 |
-
updated_track_ids.add(next_track_id)
|
111 |
-
next_track_id += 1
|
112 |
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
|
118 |
-
|
119 |
-
|
|
|
|
|
120 |
|
121 |
-
|
122 |
-
for tid, track in tracks.items():
|
123 |
-
duration_frames = frame_index - track.get("start_time", frame_index)
|
124 |
-
label = "Lying Down"
|
125 |
-
color = (0, 255, 255) # yellow for lying down
|
126 |
if duration_frames >= threshold_frames:
|
127 |
-
label = "FAINTED"
|
128 |
color = (0, 0, 255) # red for fainted
|
129 |
-
|
130 |
-
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
-
|
139 |
-
|
140 |
-
|
141 |
-
2,
|
142 |
-
)
|
143 |
-
# Write the annotated frame to the output video
|
144 |
out.write(frame)
|
145 |
-
|
146 |
cap.release()
|
147 |
out.release()
|
148 |
return out_path
|
149 |
|
150 |
|
151 |
-
# Create
|
152 |
demo = gr.Interface(
|
153 |
fn=process_video,
|
154 |
inputs=[
|
@@ -156,12 +126,13 @@ demo = gr.Interface(
|
|
156 |
gr.Slider(5, 600, value=5, step=1, label="Motionless Duration Threshold (seconds)"),
|
157 |
],
|
158 |
outputs=gr.Video(label="Processed Video"),
|
159 |
-
title="Real-Time Faint Detection on Video",
|
160 |
description=(
|
161 |
-
"Upload a video file and set a threshold duration (in seconds). "
|
162 |
-
"
|
163 |
-
"
|
164 |
),
|
165 |
)
|
166 |
|
167 |
-
|
|
|
|
1 |
+
import spaces
|
2 |
import cv2
|
3 |
import numpy as np
|
|
|
4 |
import os
|
5 |
import tempfile
|
6 |
import gradio as gr
|
7 |
+
from ultralytics import YOLO # for YOLOv8 person detection
|
8 |
+
from deep_sort_realtime.deepsort_tracker import DeepSort # advanced multi-object tracker
|
9 |
|
10 |
+
@spaces.GPU
|
11 |
def process_video(video_file, threshold_secs):
|
12 |
"""
|
13 |
+
Process an uploaded video file to detect persons lying motionless and flag them as "FAINTED"
|
14 |
+
after exceeding the specified threshold duration (in seconds). Uses YOLOv8 for detection
|
15 |
+
and DeepSORT for tracking.
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
"""
|
17 |
+
# Open the video file
|
18 |
cap = cv2.VideoCapture(video_file)
|
19 |
if not cap.isOpened():
|
20 |
raise ValueError("Error opening the video file.")
|
21 |
+
|
22 |
fps = cap.get(cv2.CAP_PROP_FPS)
|
23 |
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
24 |
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
|
|
26 |
out_path = os.path.join(tempfile.gettempdir(), "output.mp4")
|
27 |
out = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
|
28 |
|
29 |
+
# Load the YOLOv8 model for person detection (using the nano variant for speed)
|
|
|
30 |
model = YOLO("yolov8n.pt")
|
31 |
|
32 |
+
# Initialize DeepSORT tracker
|
33 |
+
tracker = DeepSort(max_age=30, n_init=3, embedder="mobilenet", half=True)
|
34 |
+
|
35 |
+
# Dictionary to keep track of when a given track was first detected as "lying"
|
36 |
+
lying_start_times = {}
|
37 |
+
|
38 |
frame_index = 0
|
39 |
+
threshold_frames = threshold_secs * fps # convert seconds to number of frames
|
40 |
|
|
|
41 |
while True:
|
42 |
ret, frame = cap.read()
|
43 |
if not ret:
|
44 |
break
|
45 |
frame_index += 1
|
46 |
+
|
47 |
# Run YOLO detection on the current frame
|
48 |
results = model(frame)[0]
|
49 |
+
# Prepare detections for DeepSORT tracker:
|
50 |
+
# DeepSORT expects each detection in the format:
|
51 |
+
# [ [x, y, w, h], confidence, class_id ]
|
52 |
+
detections = []
|
53 |
if results.boxes is not None:
|
54 |
+
boxes = results.boxes.xyxy.cpu().numpy() # [x1, y1, x2, y2]
|
55 |
+
classes = results.boxes.cls.cpu().numpy()
|
56 |
confidences = results.boxes.conf.cpu().numpy()
|
57 |
+
for box, cls, conf in zip(boxes, classes, confidences):
|
58 |
+
if int(cls) == 0 and conf > 0.5: # COCO class 0 is "person"
|
59 |
+
# Convert bounding box coordinates to int
|
60 |
+
x1, y1, x2, y2 = box.astype(int)
|
61 |
+
w = int(x2 - x1)
|
62 |
+
h = int(y2 - y1)
|
63 |
+
# DeepSORT requires the bbox to be nested in a list
|
64 |
+
detections.append([[int(x1), int(y1), w, h], float(conf), 0])
|
65 |
|
66 |
+
# Update tracker using DeepSORT.
|
67 |
+
# The tracker internally handles matching detections across frames.
|
68 |
+
tracks = tracker.update_tracks(detections, frame=frame)
|
69 |
+
|
70 |
+
# Process each track for lying detection using our heuristic:
|
71 |
+
# Heuristic: a person is considered "lying down" if:
|
72 |
+
# - The bounding box is significantly wider than tall (aspect ratio > 1.5)
|
73 |
+
# - The lower edge of the bounding box is in the lower half of the frame
|
74 |
+
for track in tracks:
|
75 |
+
if not track.is_confirmed():
|
76 |
+
continue
|
77 |
+
|
78 |
+
track_id = track.track_id
|
79 |
+
bbox = track.to_tlbr() # returns [x1, y1, x2, y2]
|
80 |
+
x1, y1, x2, y2 = [int(coord) for coord in bbox]
|
81 |
w = x2 - x1
|
82 |
h = y2 - y1
|
83 |
aspect_ratio = w / float(h) if h > 0 else 0
|
84 |
+
lying = False
|
85 |
if aspect_ratio > 1.5 and y2 > height * 0.5:
|
86 |
+
lying = True
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
87 |
|
88 |
+
# Update or reset the lying start time per track
|
89 |
+
if lying:
|
90 |
+
if track_id not in lying_start_times:
|
91 |
+
lying_start_times[track_id] = frame_index
|
92 |
+
duration_frames = frame_index - lying_start_times[track_id]
|
93 |
+
else:
|
94 |
+
if track_id in lying_start_times:
|
95 |
+
del lying_start_times[track_id]
|
96 |
+
duration_frames = 0
|
97 |
|
98 |
+
# Decide label and color based on duration
|
|
|
|
|
|
|
|
|
99 |
if duration_frames >= threshold_frames:
|
100 |
+
label = f"ID {track_id}: FAINTED ({duration_frames/fps:.1f}s)"
|
101 |
color = (0, 0, 255) # red for fainted
|
102 |
+
elif lying:
|
103 |
+
label = f"ID {track_id}: Lying Down ({duration_frames/fps:.1f}s)"
|
104 |
+
color = (0, 255, 255) # yellow for lying down
|
105 |
+
else:
|
106 |
+
label = f"ID {track_id}: Upright"
|
107 |
+
color = (0, 255, 0) # green for normal upright posture
|
108 |
+
|
109 |
+
# Annotate the frame
|
110 |
+
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
|
111 |
+
cv2.putText(frame, label, (x1, max(y1 - 10, 0)),
|
112 |
+
cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
|
113 |
+
|
|
|
|
|
|
|
114 |
out.write(frame)
|
115 |
+
|
116 |
cap.release()
|
117 |
out.release()
|
118 |
return out_path
|
119 |
|
120 |
|
121 |
+
# Create Gradio interface for the demo.
|
122 |
demo = gr.Interface(
|
123 |
fn=process_video,
|
124 |
inputs=[
|
|
|
126 |
gr.Slider(5, 600, value=5, step=1, label="Motionless Duration Threshold (seconds)"),
|
127 |
],
|
128 |
outputs=gr.Video(label="Processed Video"),
|
129 |
+
title="Advanced Real-Time Faint Detection on Video",
|
130 |
description=(
|
131 |
+
"Upload a video file and set a threshold duration (in seconds). This demo uses YOLOv8 for person detection and "
|
132 |
+
"DeepSORT for advanced tracking. It flags persons as 'FAINTED' if they remain lying motionless (determined by a "
|
133 |
+
"heuristic) for longer than the threshold."
|
134 |
),
|
135 |
)
|
136 |
|
137 |
+
if __name__ == "__main__":
|
138 |
+
demo.launch()
|
packages.txt
CHANGED
@@ -1 +1 @@
|
|
1 |
-
opencv
|
|
|
1 |
+
python3-opencv
|
requirementx.txt
CHANGED
@@ -3,3 +3,4 @@ opencv-python
|
|
3 |
ultralytics
|
4 |
torch
|
5 |
numpy
|
|
|
|
3 |
ultralytics
|
4 |
torch
|
5 |
numpy
|
6 |
+
deep-sort-realtime
|