use mid-hip to determine if a person is inside of alert zone but keep using bottom centor for velocity estimation
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
|
|
30 |
- **Load the first frame** of a video into an image editor.
|
31 |
- **Draw an alert zone** on the first frame using red strokes.
|
32 |
- **Preview the extracted alert zone polygon** (displayed in red) before processing.
|
33 |
-
- The faint detection is applied only for persons whose
|
34 |
|
35 |
- **Integrated Detection, Tracking, and Pose Estimation:**
|
36 |
The system uses a single unified **Yolov11spose** model, which returns both bounding boxes and pose keypoints with an integrated tracker.
|
@@ -38,13 +38,17 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
|
|
38 |
- It extracts keypoints that can be used to verify if a person is lying down, thus improving the accuracy of faint detection.
|
39 |
|
40 |
- **Velocity-Based Motionlessness Detection:**
|
41 |
-
The system computes the displacement of each person’s bottom
|
42 |
|
43 |
- **Timing and Thresholding:**
|
44 |
-
The demo tracks how long a person remains static (via integrated pose and velocity analysis). If this duration exceeds a user‑defined threshold (between
|
45 |
|
46 |
- **Annotated Output:**
|
47 |
-
The processed video displays
|
|
|
|
|
|
|
|
|
48 |
|
49 |
## How It Works
|
50 |
|
@@ -59,17 +63,18 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
|
|
59 |
2. **Unified Detection and Tracking:**
|
60 |
- **Yolov11spose with Integrated Tracker and Pose Estimation:**
|
61 |
The unified model processes each frame to detect persons, track them across frames, and extract keypoints that reflect the persons’ posture.
|
62 |
-
|
63 |
3. **Faint Detection Logic:**
|
64 |
- **Pose-Based Analysis:**
|
65 |
The model’s keypoint outputs are used to assess if the person is lying down by comparing the vertical positions of the shoulders and hips.
|
66 |
- **Velocity Analysis:**
|
67 |
-
The displacement of the person’s bottom
|
68 |
- **Alert Zone Confinement:**
|
69 |
-
|
70 |
|
71 |
4. **Output Generation:**
|
72 |
-
- Processed frames are annotated with the person’s status (Upright, Static, FAINTED) and
|
|
|
73 |
- The annotated output video is displayed through the Gradio interface.
|
74 |
|
75 |
## Running on Hugging Face Spaces
|
@@ -86,4 +91,5 @@ This demo is optimized for Hugging Face Spaces and supports GPU acceleration dur
|
|
86 |
1. **Clone the Repository:**
|
87 |
```bash
|
88 |
git clone https://github.com/your_username/advanced-faint-detection.git
|
89 |
-
cd advanced-faint-detection
|
|
|
|
30 |
- **Load the first frame** of a video into an image editor.
|
31 |
- **Draw an alert zone** on the first frame using red strokes.
|
32 |
- **Preview the extracted alert zone polygon** (displayed in red) before processing.
|
33 |
+
- The faint detection is applied only for persons whose **mid-hip keypoint** falls within the defined alert zone.
|
34 |
|
35 |
- **Integrated Detection, Tracking, and Pose Estimation:**
|
36 |
The system uses a single unified **Yolov11spose** model, which returns both bounding boxes and pose keypoints with an integrated tracker.
|
|
|
38 |
- It extracts keypoints that can be used to verify if a person is lying down, thus improving the accuracy of faint detection.
|
39 |
|
40 |
- **Velocity-Based Motionlessness Detection:**
|
41 |
+
The system computes the displacement of each person’s **bottom-center point** over time. If the movement stays below a set threshold for a defined duration, the person is considered static.
|
42 |
|
43 |
- **Timing and Thresholding:**
|
44 |
+
The demo tracks how long a person remains static (via integrated pose and velocity analysis). If this duration exceeds a user‑defined threshold (between 1 and 600 seconds), the person is flagged as "FAINTED."
|
45 |
|
46 |
- **Annotated Output:**
|
47 |
+
The processed video displays:
|
48 |
+
- Annotated bounding boxes and labels (e.g., Upright, Static, FAINTED).
|
49 |
+
- **Red polygon** for the alert zone.
|
50 |
+
- **Red dot** = mid-hip reference (used for alert zone inclusion).
|
51 |
+
- **Blue dot** = bottom-center point (used for velocity calculation).
|
52 |
|
53 |
## How It Works
|
54 |
|
|
|
63 |
2. **Unified Detection and Tracking:**
|
64 |
- **Yolov11spose with Integrated Tracker and Pose Estimation:**
|
65 |
The unified model processes each frame to detect persons, track them across frames, and extract keypoints that reflect the persons’ posture.
|
66 |
+
|
67 |
3. **Faint Detection Logic:**
|
68 |
- **Pose-Based Analysis:**
|
69 |
The model’s keypoint outputs are used to assess if the person is lying down by comparing the vertical positions of the shoulders and hips.
|
70 |
- **Velocity Analysis:**
|
71 |
+
The displacement of the person’s **bottom-center** is computed over consecutive frames. If the movement is below a preset velocity threshold, the individual is considered motionless.
|
72 |
- **Alert Zone Confinement:**
|
73 |
+
A person is analyzed only if their **mid-hip** keypoint is within the drawn alert zone polygon.
|
74 |
|
75 |
4. **Output Generation:**
|
76 |
+
- Processed frames are annotated with the person’s status (Upright, Static, FAINTED) and stitched back into a video.
|
77 |
+
- Mid-hip and bottom-center points are drawn for visual inspection.
|
78 |
- The annotated output video is displayed through the Gradio interface.
|
79 |
|
80 |
## Running on Hugging Face Spaces
|
|
|
91 |
1. **Clone the Repository:**
|
92 |
```bash
|
93 |
git clone https://github.com/your_username/advanced-faint-detection.git
|
94 |
+
cd advanced-faint-detection
|
95 |
+
```
|
app.py
CHANGED
@@ -149,133 +149,113 @@ def process_video_with_zone(video_file, threshold_secs, velocity_threshold, edit
|
|
149 |
pts = np.array(alert_zone, np.int32).reshape((-1, 1, 2))
|
150 |
cv2.polylines(frame, [pts], isClosed=True, color=(0, 0, 255), thickness=2)
|
151 |
|
152 |
-
# Run the unified model (Yolov8spose) on the frame.
|
153 |
results = yolov8spose_model(frame)[0]
|
154 |
-
|
155 |
-
|
156 |
-
if results.boxes is not None:
|
157 |
-
# Iterate over each detection in the unified output.
|
158 |
-
for det in results.boxes.data:
|
159 |
-
# The expected format: [x1, y1, x2, y2, confidence, class, keypoints..., track_id]
|
160 |
-
# Adjust slicing based on your model's output format.
|
161 |
-
d = det.cpu().numpy()
|
162 |
-
x1, y1, x2, y2 = d[:4].astype(int)
|
163 |
-
conf = d[4]
|
164 |
-
cls = int(d[5])
|
165 |
-
# Only consider persons (assume class 0 corresponds to person).
|
166 |
-
if cls != 0 or conf < 0.5:
|
167 |
-
continue
|
168 |
-
|
169 |
-
# Assume the remaining part (except the last element) are keypoints.
|
170 |
-
# Last element is taken as the integrated track ID.
|
171 |
-
num_keypoint_values = len(d) - 6 - 1 # subtract first 6 fields and track_id.
|
172 |
-
if num_keypoint_values > 0:
|
173 |
-
flat_keypoints = d[6:6+num_keypoint_values]
|
174 |
-
else:
|
175 |
-
flat_keypoints = []
|
176 |
-
track_id = int(d[-1])
|
177 |
-
|
178 |
-
w = x2 - x1
|
179 |
-
h = y2 - y1
|
180 |
-
person_box = [x1, y1, x2, y2]
|
181 |
-
if flat_keypoints != []:
|
182 |
-
kp = np.array(flat_keypoints).reshape(-1, 3)
|
183 |
-
for pair in [
|
184 |
-
(5, 6), (5, 7), (7, 9), (6, 8), (8, 10), # arms
|
185 |
-
(11, 12), (11, 13), (13, 15), (12, 14), (14, 16), # legs
|
186 |
-
(5, 11), (6, 12) # torso
|
187 |
-
]:
|
188 |
-
i, j = pair
|
189 |
-
if kp[i][2] > 0.3 and kp[j][2] > 0.3: # confidence check
|
190 |
-
pt1 = (int(kp[i][0]), int(kp[i][1]))
|
191 |
-
pt2 = (int(kp[j][0]), int(kp[j][1]))
|
192 |
-
cv2.line(frame, pt1, pt2, (0, 255, 255), 2)
|
193 |
-
if len(kp) > 12:
|
194 |
-
mid_hip = ((kp[11][0] + kp[12][0]) / 2, (kp[11][1] + kp[12][1]) / 2)
|
195 |
-
pt = (float(mid_hip[0]), float(mid_hip[1]))
|
196 |
-
else:
|
197 |
-
current_bottom = bottom_center(person_box)
|
198 |
-
pt = (float(current_bottom[0]), float(current_bottom[1]))
|
199 |
-
else:
|
200 |
-
current_bottom = bottom_center(person_box)
|
201 |
-
pt = (float(current_bottom[0]), float(current_bottom[1]))
|
202 |
-
in_alert_zone = cv2.pointPolygonTest(np.array(alert_zone, np.int32), pt, False) >= 0
|
203 |
-
|
204 |
-
# Draw bottom-center marker.
|
205 |
-
cv2.circle(frame, (int(current_bottom[0]), int(current_bottom[1])), 4, (255, 0, 0), -1) # 🔵 Blue = bottom-center
|
206 |
-
cv2.circle(frame, (int(pt[0]), int(pt[1])), 5, (0, 0, 255), -1) # 🔴 Red = mid-hip (used)
|
207 |
-
|
208 |
-
if not in_alert_zone:
|
209 |
-
status = "Outside Zone"
|
210 |
-
color = (200, 200, 200)
|
211 |
-
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
|
212 |
-
draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
|
213 |
-
continue
|
214 |
-
|
215 |
-
# Faint detection: using a heuristic based on bounding box aspect ratio and integrated keypoints.
|
216 |
-
base_lying = False
|
217 |
-
aspect_ratio = w / float(h) if h > 0 else 0
|
218 |
-
if aspect_ratio > 1.5 and y2 > height * 0.5:
|
219 |
-
base_lying = True
|
220 |
-
|
221 |
-
if flat_keypoints != []:
|
222 |
-
integrated_lying = is_lying_from_keypoints(flat_keypoints, h)
|
223 |
-
else:
|
224 |
-
integrated_lying = False
|
225 |
-
pose_static = base_lying and integrated_lying
|
226 |
-
|
227 |
-
# Velocity-based detection with EMA smoothing
|
228 |
-
alpha = 0.8 # smoothing factor ← UPDATED
|
229 |
-
if track_id not in velocity_static_info:
|
230 |
-
velocity_static_info[track_id] = (current_bottom, frame_index)
|
231 |
-
smoothed_bottom = current_bottom # ← UPDATED
|
232 |
-
velocity_val = 0.0
|
233 |
-
velocity_static = False
|
234 |
-
else:
|
235 |
-
prev_bottom, _ = velocity_static_info[track_id]
|
236 |
-
# Apply EMA smoothing ← UPDATED
|
237 |
-
smoothed_bottom = (
|
238 |
-
alpha * np.array(prev_bottom) + (1 - alpha) * np.array(current_bottom)
|
239 |
-
)
|
240 |
-
velocity_static_info[track_id] = (smoothed_bottom.tolist(), frame_index)
|
241 |
-
|
242 |
-
distance = compute_distance(smoothed_bottom, prev_bottom) # ← UPDATED
|
243 |
-
velocity_val = distance * fps
|
244 |
-
if distance < velocity_threshold:
|
245 |
-
velocity_static = True
|
246 |
-
else:
|
247 |
-
velocity_static_info[track_id] = (current_bottom, frame_index)
|
248 |
-
velocity_static = False
|
249 |
-
|
250 |
-
is_static = pose_static or velocity_static
|
251 |
-
if is_static:
|
252 |
-
if track_id not in lying_start_times:
|
253 |
-
lying_start_times[track_id] = frame_index
|
254 |
-
duration_frames = frame_index - lying_start_times[track_id]
|
255 |
-
else:
|
256 |
-
if track_id in lying_start_times:
|
257 |
-
del lying_start_times[track_id]
|
258 |
-
duration_frames = 0
|
259 |
-
|
260 |
-
if duration_frames >= threshold_frames:
|
261 |
-
status = f"FAINTED ({duration_frames/fps:.1f}s)"
|
262 |
-
color = (0, 0, 255)
|
263 |
-
elif is_static:
|
264 |
-
status = f"Static ({duration_frames/fps:.1f}s)"
|
265 |
-
color = (0, 255, 255)
|
266 |
-
else:
|
267 |
-
status = "Upright"
|
268 |
-
color = (0, 255, 0)
|
269 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
270 |
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
|
271 |
draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
|
272 |
-
|
273 |
-
|
274 |
-
|
275 |
-
|
276 |
-
|
277 |
-
|
278 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
279 |
|
280 |
out.write(frame)
|
281 |
|
|
|
149 |
pts = np.array(alert_zone, np.int32).reshape((-1, 1, 2))
|
150 |
cv2.polylines(frame, [pts], isClosed=True, color=(0, 0, 255), thickness=2)
|
151 |
|
|
|
152 |
results = yolov8spose_model(frame)[0]
|
153 |
+
boxes = results.boxes
|
154 |
+
kpts = results.keypoints.data
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
155 |
|
156 |
+
for i in range(len(boxes)):
|
157 |
+
box = boxes[i].xyxy[0].cpu().numpy()
|
158 |
+
x1, y1, x2, y2 = box.astype(int)
|
159 |
+
conf = boxes[i].conf[0].item()
|
160 |
+
cls = int(boxes[i].cls[0].item())
|
161 |
+
track_id = int(boxes[i].id[0].item()) if boxes[i].id is not None else -1
|
162 |
+
if cls != 0 or conf < 0.5:
|
163 |
+
continue
|
164 |
+
|
165 |
+
flat_keypoints = kpts[i].cpu().numpy().flatten().tolist()
|
166 |
+
kp = np.array(flat_keypoints).reshape(-1, 3)
|
167 |
+
|
168 |
+
for pair in [
|
169 |
+
(5, 6), (5, 7), (7, 9), (6, 8), (8, 10),
|
170 |
+
(11, 12), (11, 13), (13, 15), (12, 14), (14, 16),
|
171 |
+
(5, 11), (6, 12)
|
172 |
+
]:
|
173 |
+
i1, j1 = pair
|
174 |
+
if kp[i1][2] > 0.3 and kp[j1][2] > 0.3:
|
175 |
+
pt1 = (int(kp[i1][0]), int(kp[i1][1]))
|
176 |
+
pt2 = (int(kp[j1][0]), int(kp[j1][1]))
|
177 |
+
cv2.line(frame, pt1, pt2, (0, 255, 255), 2)
|
178 |
+
|
179 |
+
if len(kp) > 12:
|
180 |
+
pt = ((kp[11][0] + kp[12][0]) / 2, (kp[11][1] + kp[12][1]) / 2)
|
181 |
+
else:
|
182 |
+
continue
|
183 |
+
|
184 |
+
pt = (float(pt[0]), float(pt[1]))
|
185 |
+
in_alert_zone = cv2.pointPolygonTest(np.array(alert_zone, np.int32), pt, False) >= 0
|
186 |
+
cv2.circle(frame, (int(pt[0]), int(pt[1])), 5, (0, 0, 255), -1)
|
187 |
+
|
188 |
+
if not in_alert_zone:
|
189 |
+
status = "Outside Zone"
|
190 |
+
color = (200, 200, 200)
|
191 |
+
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
|
192 |
+
draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
|
193 |
+
continue
|
194 |
+
|
195 |
+
aspect_ratio = (x2 - x1) / float(y2 - y1) if (y2 - y1) > 0 else 0
|
196 |
+
base_lying = aspect_ratio > 1.5 and y2 > height * 0.5
|
197 |
+
integrated_lying = is_lying_from_keypoints(flat_keypoints, y2 - y1)
|
198 |
+
pose_static = base_lying and integrated_lying
|
199 |
+
|
200 |
+
current_bottom = bottom_center((x1, y1, x2, y2))
|
201 |
+
|
202 |
+
if len(kp) > 12:
|
203 |
+
pt = ((kp[11][0] + kp[12][0]) / 2, (kp[11][1] + kp[12][1]) / 2)
|
204 |
+
else:
|
205 |
+
continue
|
206 |
+
pt = (float(pt[0]), float(pt[1])) # mid-hip
|
207 |
+
in_alert_zone = cv2.pointPolygonTest(np.array(alert_zone, np.int32), pt, False) >= 0
|
208 |
+
cv2.circle(frame, (int(pt[0]), int(pt[1])), 5, (0, 0, 255), -1) # mid-hip marker
|
209 |
+
cv2.circle(frame, (int(current_bottom[0]), int(current_bottom[1])), 3, (255, 0, 0), -1) # bottom center marker
|
210 |
+
|
211 |
+
if not in_alert_zone:
|
212 |
+
status = "Outside Zone"
|
213 |
+
color = (200, 200, 200)
|
214 |
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
|
215 |
draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
|
216 |
+
continue
|
217 |
+
|
218 |
+
alpha = 0.8
|
219 |
+
if track_id not in velocity_static_info:
|
220 |
+
velocity_static_info[track_id] = (current_bottom, frame_index)
|
221 |
+
smoothed = current_bottom
|
222 |
+
velocity_val = 0.0
|
223 |
+
velocity_static = False
|
224 |
+
else:
|
225 |
+
prev_pt, _ = velocity_static_info[track_id]
|
226 |
+
smoothed = alpha * np.array(prev_pt) + (1 - alpha) * np.array(current_bottom)
|
227 |
+
velocity_static_info[track_id] = (smoothed.tolist(), frame_index)
|
228 |
+
distance = compute_distance(smoothed, prev_pt)
|
229 |
+
velocity_val = distance * fps
|
230 |
+
velocity_static = distance < velocity_threshold
|
231 |
+
is_static = pose_static or velocity_static
|
232 |
+
if is_static:
|
233 |
+
if track_id not in lying_start_times:
|
234 |
+
lying_start_times[track_id] = frame_index
|
235 |
+
duration_frames = frame_index - lying_start_times[track_id]
|
236 |
+
else:
|
237 |
+
lying_start_times.pop(track_id, None)
|
238 |
+
duration_frames = 0
|
239 |
+
|
240 |
+
if duration_frames >= threshold_frames:
|
241 |
+
status = f"FAINTED ({duration_frames/fps:.1f}s)"
|
242 |
+
color = (0, 0, 255)
|
243 |
+
elif is_static:
|
244 |
+
status = f"Static ({duration_frames/fps:.1f}s)"
|
245 |
+
color = (0, 255, 255)
|
246 |
+
else:
|
247 |
+
status = "Upright"
|
248 |
+
color = (0, 255, 0)
|
249 |
+
|
250 |
+
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
|
251 |
+
draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
|
252 |
+
vel_text = f"Vel: {velocity_val:.1f} px/s"
|
253 |
+
text_offset = 15
|
254 |
+
(vt_w, vt_h), vt_baseline = cv2.getTextSize(vel_text, cv2.FONT_HERSHEY_SIMPLEX, 0.4, 1)
|
255 |
+
vel_org = (int(pt[0] - vt_w / 2), int(pt[1] + text_offset + vt_h))
|
256 |
+
cv2.rectangle(frame, (vel_org[0], vel_org[1] - vt_h - vt_baseline),
|
257 |
+
(vel_org[0] + vt_w, vel_org[1] + vt_baseline), (50,50,50), -1)
|
258 |
+
cv2.putText(frame, vel_text, vel_org, cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, cv2.LINE_AA)
|
259 |
|
260 |
out.write(frame)
|
261 |
|