Luigi commited on
Commit
f881a88
·
1 Parent(s): fc238fe

use mid-hip to determine if a person is inside of alert zone but keep using bottom centor for velocity estimation

Browse files
Files changed (2) hide show
  1. README.md +15 -9
  2. app.py +103 -123
README.md CHANGED
@@ -30,7 +30,7 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
30
  - **Load the first frame** of a video into an image editor.
31
  - **Draw an alert zone** on the first frame using red strokes.
32
  - **Preview the extracted alert zone polygon** (displayed in red) before processing.
33
- - The faint detection is applied only for persons whose bottom‑center points fall within the defined alert zone.
34
 
35
  - **Integrated Detection, Tracking, and Pose Estimation:**
36
  The system uses a single unified **Yolov11spose** model, which returns both bounding boxes and pose keypoints with an integrated tracker.
@@ -38,13 +38,17 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
38
  - It extracts keypoints that can be used to verify if a person is lying down, thus improving the accuracy of faint detection.
39
 
40
  - **Velocity-Based Motionlessness Detection:**
41
- The system computes the displacement of each person’s bottomcenter position over time. If the movement stays below a set threshold for a defined duration, the person is considered static.
42
 
43
  - **Timing and Thresholding:**
44
- The demo tracks how long a person remains static (via integrated pose and velocity analysis). If this duration exceeds a user‑defined threshold (between 5 and 600 seconds), the person is flagged as "FAINTED."
45
 
46
  - **Annotated Output:**
47
- The processed video displays annotated bounding boxes and labels (e.g., Upright, Static, FAINTED) overlaid on the original video. The user‑defined alert zone is shown as a red polygon for clear visual confirmation.
 
 
 
 
48
 
49
  ## How It Works
50
 
@@ -59,17 +63,18 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
59
  2. **Unified Detection and Tracking:**
60
  - **Yolov11spose with Integrated Tracker and Pose Estimation:**
61
  The unified model processes each frame to detect persons, track them across frames, and extract keypoints that reflect the persons’ posture.
62
-
63
  3. **Faint Detection Logic:**
64
  - **Pose-Based Analysis:**
65
  The model’s keypoint outputs are used to assess if the person is lying down by comparing the vertical positions of the shoulders and hips.
66
  - **Velocity Analysis:**
67
- The displacement of the person’s bottomcenter point is computed over consecutive frames. If the movement is below a preset velocity threshold, the individual is considered motionless.
68
  - **Alert Zone Confinement:**
69
- The detection and analysis are applied only for persons located inside the user‑defined alert zone.
70
 
71
  4. **Output Generation:**
72
- - Processed frames are annotated with the person’s status (Upright, Static, FAINTED) and then stitched back into a video.
 
73
  - The annotated output video is displayed through the Gradio interface.
74
 
75
  ## Running on Hugging Face Spaces
@@ -86,4 +91,5 @@ This demo is optimized for Hugging Face Spaces and supports GPU acceleration dur
86
  1. **Clone the Repository:**
87
  ```bash
88
  git clone https://github.com/your_username/advanced-faint-detection.git
89
- cd advanced-faint-detection
 
 
30
  - **Load the first frame** of a video into an image editor.
31
  - **Draw an alert zone** on the first frame using red strokes.
32
  - **Preview the extracted alert zone polygon** (displayed in red) before processing.
33
+ - The faint detection is applied only for persons whose **mid-hip keypoint** falls within the defined alert zone.
34
 
35
  - **Integrated Detection, Tracking, and Pose Estimation:**
36
  The system uses a single unified **Yolov11spose** model, which returns both bounding boxes and pose keypoints with an integrated tracker.
 
38
  - It extracts keypoints that can be used to verify if a person is lying down, thus improving the accuracy of faint detection.
39
 
40
  - **Velocity-Based Motionlessness Detection:**
41
+ The system computes the displacement of each person’s **bottom-center point** over time. If the movement stays below a set threshold for a defined duration, the person is considered static.
42
 
43
  - **Timing and Thresholding:**
44
+ The demo tracks how long a person remains static (via integrated pose and velocity analysis). If this duration exceeds a user‑defined threshold (between 1 and 600 seconds), the person is flagged as "FAINTED."
45
 
46
  - **Annotated Output:**
47
+ The processed video displays:
48
+ - Annotated bounding boxes and labels (e.g., Upright, Static, FAINTED).
49
+ - **Red polygon** for the alert zone.
50
+ - **Red dot** = mid-hip reference (used for alert zone inclusion).
51
+ - **Blue dot** = bottom-center point (used for velocity calculation).
52
 
53
  ## How It Works
54
 
 
63
  2. **Unified Detection and Tracking:**
64
  - **Yolov11spose with Integrated Tracker and Pose Estimation:**
65
  The unified model processes each frame to detect persons, track them across frames, and extract keypoints that reflect the persons’ posture.
66
+
67
  3. **Faint Detection Logic:**
68
  - **Pose-Based Analysis:**
69
  The model’s keypoint outputs are used to assess if the person is lying down by comparing the vertical positions of the shoulders and hips.
70
  - **Velocity Analysis:**
71
+ The displacement of the person’s **bottom-center** is computed over consecutive frames. If the movement is below a preset velocity threshold, the individual is considered motionless.
72
  - **Alert Zone Confinement:**
73
+ A person is analyzed only if their **mid-hip** keypoint is within the drawn alert zone polygon.
74
 
75
  4. **Output Generation:**
76
+ - Processed frames are annotated with the person’s status (Upright, Static, FAINTED) and stitched back into a video.
77
+ - Mid-hip and bottom-center points are drawn for visual inspection.
78
  - The annotated output video is displayed through the Gradio interface.
79
 
80
  ## Running on Hugging Face Spaces
 
91
  1. **Clone the Repository:**
92
  ```bash
93
  git clone https://github.com/your_username/advanced-faint-detection.git
94
+ cd advanced-faint-detection
95
+ ```
app.py CHANGED
@@ -149,133 +149,113 @@ def process_video_with_zone(video_file, threshold_secs, velocity_threshold, edit
149
  pts = np.array(alert_zone, np.int32).reshape((-1, 1, 2))
150
  cv2.polylines(frame, [pts], isClosed=True, color=(0, 0, 255), thickness=2)
151
 
152
- # Run the unified model (Yolov8spose) on the frame.
153
  results = yolov8spose_model(frame)[0]
154
-
155
- # Check if there are any detections. We assume that results.boxes holds the unified output.
156
- if results.boxes is not None:
157
- # Iterate over each detection in the unified output.
158
- for det in results.boxes.data:
159
- # The expected format: [x1, y1, x2, y2, confidence, class, keypoints..., track_id]
160
- # Adjust slicing based on your model's output format.
161
- d = det.cpu().numpy()
162
- x1, y1, x2, y2 = d[:4].astype(int)
163
- conf = d[4]
164
- cls = int(d[5])
165
- # Only consider persons (assume class 0 corresponds to person).
166
- if cls != 0 or conf < 0.5:
167
- continue
168
-
169
- # Assume the remaining part (except the last element) are keypoints.
170
- # Last element is taken as the integrated track ID.
171
- num_keypoint_values = len(d) - 6 - 1 # subtract first 6 fields and track_id.
172
- if num_keypoint_values > 0:
173
- flat_keypoints = d[6:6+num_keypoint_values]
174
- else:
175
- flat_keypoints = []
176
- track_id = int(d[-1])
177
-
178
- w = x2 - x1
179
- h = y2 - y1
180
- person_box = [x1, y1, x2, y2]
181
- if flat_keypoints != []:
182
- kp = np.array(flat_keypoints).reshape(-1, 3)
183
- for pair in [
184
- (5, 6), (5, 7), (7, 9), (6, 8), (8, 10), # arms
185
- (11, 12), (11, 13), (13, 15), (12, 14), (14, 16), # legs
186
- (5, 11), (6, 12) # torso
187
- ]:
188
- i, j = pair
189
- if kp[i][2] > 0.3 and kp[j][2] > 0.3: # confidence check
190
- pt1 = (int(kp[i][0]), int(kp[i][1]))
191
- pt2 = (int(kp[j][0]), int(kp[j][1]))
192
- cv2.line(frame, pt1, pt2, (0, 255, 255), 2)
193
- if len(kp) > 12:
194
- mid_hip = ((kp[11][0] + kp[12][0]) / 2, (kp[11][1] + kp[12][1]) / 2)
195
- pt = (float(mid_hip[0]), float(mid_hip[1]))
196
- else:
197
- current_bottom = bottom_center(person_box)
198
- pt = (float(current_bottom[0]), float(current_bottom[1]))
199
- else:
200
- current_bottom = bottom_center(person_box)
201
- pt = (float(current_bottom[0]), float(current_bottom[1]))
202
- in_alert_zone = cv2.pointPolygonTest(np.array(alert_zone, np.int32), pt, False) >= 0
203
-
204
- # Draw bottom-center marker.
205
- cv2.circle(frame, (int(current_bottom[0]), int(current_bottom[1])), 4, (255, 0, 0), -1) # 🔵 Blue = bottom-center
206
- cv2.circle(frame, (int(pt[0]), int(pt[1])), 5, (0, 0, 255), -1) # 🔴 Red = mid-hip (used)
207
-
208
- if not in_alert_zone:
209
- status = "Outside Zone"
210
- color = (200, 200, 200)
211
- cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
212
- draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
213
- continue
214
-
215
- # Faint detection: using a heuristic based on bounding box aspect ratio and integrated keypoints.
216
- base_lying = False
217
- aspect_ratio = w / float(h) if h > 0 else 0
218
- if aspect_ratio > 1.5 and y2 > height * 0.5:
219
- base_lying = True
220
-
221
- if flat_keypoints != []:
222
- integrated_lying = is_lying_from_keypoints(flat_keypoints, h)
223
- else:
224
- integrated_lying = False
225
- pose_static = base_lying and integrated_lying
226
-
227
- # Velocity-based detection with EMA smoothing
228
- alpha = 0.8 # smoothing factor ← UPDATED
229
- if track_id not in velocity_static_info:
230
- velocity_static_info[track_id] = (current_bottom, frame_index)
231
- smoothed_bottom = current_bottom # ← UPDATED
232
- velocity_val = 0.0
233
- velocity_static = False
234
- else:
235
- prev_bottom, _ = velocity_static_info[track_id]
236
- # Apply EMA smoothing ← UPDATED
237
- smoothed_bottom = (
238
- alpha * np.array(prev_bottom) + (1 - alpha) * np.array(current_bottom)
239
- )
240
- velocity_static_info[track_id] = (smoothed_bottom.tolist(), frame_index)
241
-
242
- distance = compute_distance(smoothed_bottom, prev_bottom) # ← UPDATED
243
- velocity_val = distance * fps
244
- if distance < velocity_threshold:
245
- velocity_static = True
246
- else:
247
- velocity_static_info[track_id] = (current_bottom, frame_index)
248
- velocity_static = False
249
-
250
- is_static = pose_static or velocity_static
251
- if is_static:
252
- if track_id not in lying_start_times:
253
- lying_start_times[track_id] = frame_index
254
- duration_frames = frame_index - lying_start_times[track_id]
255
- else:
256
- if track_id in lying_start_times:
257
- del lying_start_times[track_id]
258
- duration_frames = 0
259
-
260
- if duration_frames >= threshold_frames:
261
- status = f"FAINTED ({duration_frames/fps:.1f}s)"
262
- color = (0, 0, 255)
263
- elif is_static:
264
- status = f"Static ({duration_frames/fps:.1f}s)"
265
- color = (0, 255, 255)
266
- else:
267
- status = "Upright"
268
- color = (0, 255, 0)
269
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
270
  cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
271
  draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
272
- vel_text = f"Vel: {velocity_val:.1f} px/s"
273
- text_offset = 15
274
- (vt_w, vt_h), vt_baseline = cv2.getTextSize(vel_text, cv2.FONT_HERSHEY_SIMPLEX, 0.4, 1)
275
- vel_org = (int(current_bottom[0] - vt_w / 2), int(current_bottom[1] + text_offset + vt_h))
276
- cv2.rectangle(frame, (vel_org[0], vel_org[1] - vt_h - vt_baseline),
277
- (vel_org[0] + vt_w, vel_org[1] + vt_baseline), (50,50,50), -1)
278
- cv2.putText(frame, vel_text, vel_org, cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, cv2.LINE_AA)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
279
 
280
  out.write(frame)
281
 
 
149
  pts = np.array(alert_zone, np.int32).reshape((-1, 1, 2))
150
  cv2.polylines(frame, [pts], isClosed=True, color=(0, 0, 255), thickness=2)
151
 
 
152
  results = yolov8spose_model(frame)[0]
153
+ boxes = results.boxes
154
+ kpts = results.keypoints.data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
 
156
+ for i in range(len(boxes)):
157
+ box = boxes[i].xyxy[0].cpu().numpy()
158
+ x1, y1, x2, y2 = box.astype(int)
159
+ conf = boxes[i].conf[0].item()
160
+ cls = int(boxes[i].cls[0].item())
161
+ track_id = int(boxes[i].id[0].item()) if boxes[i].id is not None else -1
162
+ if cls != 0 or conf < 0.5:
163
+ continue
164
+
165
+ flat_keypoints = kpts[i].cpu().numpy().flatten().tolist()
166
+ kp = np.array(flat_keypoints).reshape(-1, 3)
167
+
168
+ for pair in [
169
+ (5, 6), (5, 7), (7, 9), (6, 8), (8, 10),
170
+ (11, 12), (11, 13), (13, 15), (12, 14), (14, 16),
171
+ (5, 11), (6, 12)
172
+ ]:
173
+ i1, j1 = pair
174
+ if kp[i1][2] > 0.3 and kp[j1][2] > 0.3:
175
+ pt1 = (int(kp[i1][0]), int(kp[i1][1]))
176
+ pt2 = (int(kp[j1][0]), int(kp[j1][1]))
177
+ cv2.line(frame, pt1, pt2, (0, 255, 255), 2)
178
+
179
+ if len(kp) > 12:
180
+ pt = ((kp[11][0] + kp[12][0]) / 2, (kp[11][1] + kp[12][1]) / 2)
181
+ else:
182
+ continue
183
+
184
+ pt = (float(pt[0]), float(pt[1]))
185
+ in_alert_zone = cv2.pointPolygonTest(np.array(alert_zone, np.int32), pt, False) >= 0
186
+ cv2.circle(frame, (int(pt[0]), int(pt[1])), 5, (0, 0, 255), -1)
187
+
188
+ if not in_alert_zone:
189
+ status = "Outside Zone"
190
+ color = (200, 200, 200)
191
+ cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
192
+ draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
193
+ continue
194
+
195
+ aspect_ratio = (x2 - x1) / float(y2 - y1) if (y2 - y1) > 0 else 0
196
+ base_lying = aspect_ratio > 1.5 and y2 > height * 0.5
197
+ integrated_lying = is_lying_from_keypoints(flat_keypoints, y2 - y1)
198
+ pose_static = base_lying and integrated_lying
199
+
200
+ current_bottom = bottom_center((x1, y1, x2, y2))
201
+
202
+ if len(kp) > 12:
203
+ pt = ((kp[11][0] + kp[12][0]) / 2, (kp[11][1] + kp[12][1]) / 2)
204
+ else:
205
+ continue
206
+ pt = (float(pt[0]), float(pt[1])) # mid-hip
207
+ in_alert_zone = cv2.pointPolygonTest(np.array(alert_zone, np.int32), pt, False) >= 0
208
+ cv2.circle(frame, (int(pt[0]), int(pt[1])), 5, (0, 0, 255), -1) # mid-hip marker
209
+ cv2.circle(frame, (int(current_bottom[0]), int(current_bottom[1])), 3, (255, 0, 0), -1) # bottom center marker
210
+
211
+ if not in_alert_zone:
212
+ status = "Outside Zone"
213
+ color = (200, 200, 200)
214
  cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
215
  draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
216
+ continue
217
+
218
+ alpha = 0.8
219
+ if track_id not in velocity_static_info:
220
+ velocity_static_info[track_id] = (current_bottom, frame_index)
221
+ smoothed = current_bottom
222
+ velocity_val = 0.0
223
+ velocity_static = False
224
+ else:
225
+ prev_pt, _ = velocity_static_info[track_id]
226
+ smoothed = alpha * np.array(prev_pt) + (1 - alpha) * np.array(current_bottom)
227
+ velocity_static_info[track_id] = (smoothed.tolist(), frame_index)
228
+ distance = compute_distance(smoothed, prev_pt)
229
+ velocity_val = distance * fps
230
+ velocity_static = distance < velocity_threshold
231
+ is_static = pose_static or velocity_static
232
+ if is_static:
233
+ if track_id not in lying_start_times:
234
+ lying_start_times[track_id] = frame_index
235
+ duration_frames = frame_index - lying_start_times[track_id]
236
+ else:
237
+ lying_start_times.pop(track_id, None)
238
+ duration_frames = 0
239
+
240
+ if duration_frames >= threshold_frames:
241
+ status = f"FAINTED ({duration_frames/fps:.1f}s)"
242
+ color = (0, 0, 255)
243
+ elif is_static:
244
+ status = f"Static ({duration_frames/fps:.1f}s)"
245
+ color = (0, 255, 255)
246
+ else:
247
+ status = "Upright"
248
+ color = (0, 255, 0)
249
+
250
+ cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
251
+ draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
252
+ vel_text = f"Vel: {velocity_val:.1f} px/s"
253
+ text_offset = 15
254
+ (vt_w, vt_h), vt_baseline = cv2.getTextSize(vel_text, cv2.FONT_HERSHEY_SIMPLEX, 0.4, 1)
255
+ vel_org = (int(pt[0] - vt_w / 2), int(pt[1] + text_offset + vt_h))
256
+ cv2.rectangle(frame, (vel_org[0], vel_org[1] - vt_h - vt_baseline),
257
+ (vel_org[0] + vt_w, vel_org[1] + vt_baseline), (50,50,50), -1)
258
+ cv2.putText(frame, vel_text, vel_org, cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, cv2.LINE_AA)
259
 
260
  out.write(frame)
261