Spaces:

Luigi
/

Real-Time-Faint-Detection-on-Video

Sleeping

App Files Files Community

Luigi commited on Apr 16

Commit

f881a88

1 Parent(s): fc238fe

use mid-hip to determine if a person is inside of alert zone but keep using bottom centor for velocity estimation

Browse files

Files changed (2) hide show

README.md +15 -9
app.py +103 -123

README.md CHANGED Viewed

@@ -30,7 +30,7 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
   - **Load the first frame** of a video into an image editor.
   - **Draw an alert zone** on the first frame using red strokes.
   - **Preview the extracted alert zone polygon** (displayed in red) before processing.
-  - The faint detection is applied only for persons whose bottom‑center points fall within the defined alert zone.
 - **Integrated Detection, Tracking, and Pose Estimation:**
   The system uses a single unified **Yolov11spose** model, which returns both bounding boxes and pose keypoints with an integrated tracker.
@@ -38,13 +38,17 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
   - It extracts keypoints that can be used to verify if a person is lying down, thus improving the accuracy of faint detection.
 - **Velocity-Based Motionlessness Detection:**
-  The system computes the displacement of each person’s bottom‑center position over time. If the movement stays below a set threshold for a defined duration, the person is considered static.
 - **Timing and Thresholding:**
-  The demo tracks how long a person remains static (via integrated pose and velocity analysis). If this duration exceeds a user‑defined threshold (between 5 and 600 seconds), the person is flagged as "FAINTED."
 - **Annotated Output:**
-  The processed video displays annotated bounding boxes and labels (e.g., Upright, Static, FAINTED) overlaid on the original video. The user‑defined alert zone is shown as a red polygon for clear visual confirmation.
 ## How It Works
@@ -59,17 +63,18 @@ This repository contains a Hugging Face Spaces demo for detecting faint (or post
 2. **Unified Detection and Tracking:**
    - **Yolov11spose with Integrated Tracker and Pose Estimation:**
      The unified model processes each frame to detect persons, track them across frames, and extract keypoints that reflect the persons’ posture.
 3. **Faint Detection Logic:**
    - **Pose-Based Analysis:**
      The model’s keypoint outputs are used to assess if the person is lying down by comparing the vertical positions of the shoulders and hips.
    - **Velocity Analysis:**
-     The displacement of the person’s bottom‑center point is computed over consecutive frames. If the movement is below a preset velocity threshold, the individual is considered motionless.
    - **Alert Zone Confinement:**
-     The detection and analysis are applied only for persons located inside the user‑defined alert zone.
 4. **Output Generation:**
-   - Processed frames are annotated with the person’s status (Upright, Static, FAINTED) and then stitched back into a video.
    - The annotated output video is displayed through the Gradio interface.
 ## Running on Hugging Face Spaces
@@ -86,4 +91,5 @@ This demo is optimized for Hugging Face Spaces and supports GPU acceleration dur
 1. **Clone the Repository:**
    ```bash
    git clone https://github.com/your_username/advanced-faint-detection.git
-   cd advanced-faint-detection

   - **Load the first frame** of a video into an image editor.
   - **Draw an alert zone** on the first frame using red strokes.
   - **Preview the extracted alert zone polygon** (displayed in red) before processing.
+  - The faint detection is applied only for persons whose **mid-hip keypoint** falls within the defined alert zone.
 - **Integrated Detection, Tracking, and Pose Estimation:**
   The system uses a single unified **Yolov11spose** model, which returns both bounding boxes and pose keypoints with an integrated tracker.
   - It extracts keypoints that can be used to verify if a person is lying down, thus improving the accuracy of faint detection.
 - **Velocity-Based Motionlessness Detection:**
+  The system computes the displacement of each person’s **bottom-center point** over time. If the movement stays below a set threshold for a defined duration, the person is considered static.
 - **Timing and Thresholding:**
+  The demo tracks how long a person remains static (via integrated pose and velocity analysis). If this duration exceeds a user‑defined threshold (between 1 and 600 seconds), the person is flagged as "FAINTED."
 - **Annotated Output:**
+  The processed video displays:
+  - Annotated bounding boxes and labels (e.g., Upright, Static, FAINTED).
+  - **Red polygon** for the alert zone.
+  - **Red dot** = mid-hip reference (used for alert zone inclusion).
+  - **Blue dot** = bottom-center point (used for velocity calculation).
 ## How It Works
 2. **Unified Detection and Tracking:**
    - **Yolov11spose with Integrated Tracker and Pose Estimation:**
      The unified model processes each frame to detect persons, track them across frames, and extract keypoints that reflect the persons’ posture.
 3. **Faint Detection Logic:**
    - **Pose-Based Analysis:**
      The model’s keypoint outputs are used to assess if the person is lying down by comparing the vertical positions of the shoulders and hips.
    - **Velocity Analysis:**
+     The displacement of the person’s **bottom-center** is computed over consecutive frames. If the movement is below a preset velocity threshold, the individual is considered motionless.
    - **Alert Zone Confinement:**
+     A person is analyzed only if their **mid-hip** keypoint is within the drawn alert zone polygon.
 4. **Output Generation:**
+   - Processed frames are annotated with the person’s status (Upright, Static, FAINTED) and stitched back into a video.
+   - Mid-hip and bottom-center points are drawn for visual inspection.
    - The annotated output video is displayed through the Gradio interface.
 ## Running on Hugging Face Spaces
 1. **Clone the Repository:**
    ```bash
    git clone https://github.com/your_username/advanced-faint-detection.git
+   cd advanced-faint-detection
+   ```

app.py CHANGED Viewed

@@ -149,133 +149,113 @@ def process_video_with_zone(video_file, threshold_secs, velocity_threshold, edit
         pts = np.array(alert_zone, np.int32).reshape((-1, 1, 2))
         cv2.polylines(frame, [pts], isClosed=True, color=(0, 0, 255), thickness=2)
-        # Run the unified model (Yolov8spose) on the frame.
         results = yolov8spose_model(frame)[0]
-        # Check if there are any detections. We assume that results.boxes holds the unified output.
-        if results.boxes is not None:
-            # Iterate over each detection in the unified output.
-            for det in results.boxes.data:
-                # The expected format: [x1, y1, x2, y2, confidence, class, keypoints..., track_id]
-                # Adjust slicing based on your model's output format.
-                d = det.cpu().numpy()
-                x1, y1, x2, y2 = d[:4].astype(int)
-                conf = d[4]
-                cls = int(d[5])
-                # Only consider persons (assume class 0 corresponds to person).
-                if cls != 0 or conf < 0.5:
-                    continue
-                # Assume the remaining part (except the last element) are keypoints.
-                # Last element is taken as the integrated track ID.
-                num_keypoint_values = len(d) - 6 - 1  # subtract first 6 fields and track_id.
-                if num_keypoint_values > 0:
-                    flat_keypoints = d[6:6+num_keypoint_values]
-                else:
-                    flat_keypoints = []
-                track_id = int(d[-1])
-                w = x2 - x1
-                h = y2 - y1
-                person_box = [x1, y1, x2, y2]
-                if flat_keypoints != []:
-                    kp = np.array(flat_keypoints).reshape(-1, 3)
-                    for pair in [
-                        (5, 6), (5, 7), (7, 9), (6, 8), (8, 10),  # arms
-                        (11, 12), (11, 13), (13, 15), (12, 14), (14, 16),  # legs
-                        (5, 11), (6, 12)  # torso
-                    ]:
-                        i, j = pair
-                        if kp[i][2] > 0.3 and kp[j][2] > 0.3:  # confidence check
-                            pt1 = (int(kp[i][0]), int(kp[i][1]))
-                            pt2 = (int(kp[j][0]), int(kp[j][1]))
-                            cv2.line(frame, pt1, pt2, (0, 255, 255), 2)
-                    if len(kp) > 12:
-                        mid_hip = ((kp[11][0] + kp[12][0]) / 2, (kp[11][1] + kp[12][1]) / 2)
-                        pt = (float(mid_hip[0]), float(mid_hip[1]))
-                    else:
-                        current_bottom = bottom_center(person_box)
-                        pt = (float(current_bottom[0]), float(current_bottom[1]))
-                else:
-                    current_bottom = bottom_center(person_box)
-                    pt = (float(current_bottom[0]), float(current_bottom[1]))
-                in_alert_zone = cv2.pointPolygonTest(np.array(alert_zone, np.int32), pt, False) >= 0
-                # Draw bottom-center marker.
-                cv2.circle(frame, (int(current_bottom[0]), int(current_bottom[1])), 4, (255, 0, 0), -1)  # 🔵 Blue = bottom-center
-                cv2.circle(frame, (int(pt[0]), int(pt[1])), 5, (0, 0, 255), -1)  # 🔴 Red = mid-hip (used)
-                if not in_alert_zone:
-                    status = "Outside Zone"
-                    color = (200, 200, 200)
-                    cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
-                    draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
-                    continue
-                # Faint detection: using a heuristic based on bounding box aspect ratio and integrated keypoints.
-                base_lying = False
-                aspect_ratio = w / float(h) if h > 0 else 0
-                if aspect_ratio > 1.5 and y2 > height * 0.5:
-                    base_lying = True
-                if flat_keypoints != []:
-                    integrated_lying = is_lying_from_keypoints(flat_keypoints, h)
-                else:
-                    integrated_lying = False
-                pose_static = base_lying and integrated_lying
-                # Velocity-based detection with EMA smoothing
-                alpha = 0.8  # smoothing factor  ← UPDATED
-                if track_id not in velocity_static_info:
-                    velocity_static_info[track_id] = (current_bottom, frame_index)
-                    smoothed_bottom = current_bottom  # ← UPDATED
-                    velocity_val = 0.0
-                    velocity_static = False
-                else:
-                    prev_bottom, _ = velocity_static_info[track_id]
-                    # Apply EMA smoothing ← UPDATED
-                    smoothed_bottom = (
-                        alpha * np.array(prev_bottom) + (1 - alpha) * np.array(current_bottom)
-                    )
-                    velocity_static_info[track_id] = (smoothed_bottom.tolist(), frame_index)
-                    distance = compute_distance(smoothed_bottom, prev_bottom)  # ← UPDATED
-                    velocity_val = distance * fps
-                    if distance < velocity_threshold:
-                        velocity_static = True
-                    else:
-                        velocity_static_info[track_id] = (current_bottom, frame_index)
-                        velocity_static = False
-                is_static = pose_static or velocity_static
-                if is_static:
-                    if track_id not in lying_start_times:
-                        lying_start_times[track_id] = frame_index
-                    duration_frames = frame_index - lying_start_times[track_id]
-                else:
-                    if track_id in lying_start_times:
-                        del lying_start_times[track_id]
-                    duration_frames = 0
-                if duration_frames >= threshold_frames:
-                    status = f"FAINTED ({duration_frames/fps:.1f}s)"
-                    color = (0, 0, 255)
-                elif is_static:
-                    status = f"Static ({duration_frames/fps:.1f}s)"
-                    color = (0, 255, 255)
-                else:
-                    status = "Upright"
-                    color = (0, 255, 0)
                 cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
                 draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
-                vel_text = f"Vel: {velocity_val:.1f} px/s"
-                text_offset = 15
-                (vt_w, vt_h), vt_baseline = cv2.getTextSize(vel_text, cv2.FONT_HERSHEY_SIMPLEX, 0.4, 1)
-                vel_org = (int(current_bottom[0] - vt_w / 2), int(current_bottom[1] + text_offset + vt_h))
-                cv2.rectangle(frame, (vel_org[0], vel_org[1] - vt_h - vt_baseline),
-                              (vel_org[0] + vt_w, vel_org[1] + vt_baseline), (50,50,50), -1)
-                cv2.putText(frame, vel_text, vel_org, cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, cv2.LINE_AA)
         out.write(frame)

         pts = np.array(alert_zone, np.int32).reshape((-1, 1, 2))
         cv2.polylines(frame, [pts], isClosed=True, color=(0, 0, 255), thickness=2)
         results = yolov8spose_model(frame)[0]
+        boxes = results.boxes
+        kpts = results.keypoints.data
+        for i in range(len(boxes)):
+            box = boxes[i].xyxy[0].cpu().numpy()
+            x1, y1, x2, y2 = box.astype(int)
+            conf = boxes[i].conf[0].item()
+            cls = int(boxes[i].cls[0].item())
+            track_id = int(boxes[i].id[0].item()) if boxes[i].id is not None else -1
+            if cls != 0 or conf < 0.5:
+                continue
+            flat_keypoints = kpts[i].cpu().numpy().flatten().tolist()
+            kp = np.array(flat_keypoints).reshape(-1, 3)
+            for pair in [
+                (5, 6), (5, 7), (7, 9), (6, 8), (8, 10),
+                (11, 12), (11, 13), (13, 15), (12, 14), (14, 16),
+                (5, 11), (6, 12)
+            ]:
+                i1, j1 = pair
+                if kp[i1][2] > 0.3 and kp[j1][2] > 0.3:
+                    pt1 = (int(kp[i1][0]), int(kp[i1][1]))
+                    pt2 = (int(kp[j1][0]), int(kp[j1][1]))
+                    cv2.line(frame, pt1, pt2, (0, 255, 255), 2)
+            if len(kp) > 12:
+                pt = ((kp[11][0] + kp[12][0]) / 2, (kp[11][1] + kp[12][1]) / 2)
+            else:
+                continue
+            pt = (float(pt[0]), float(pt[1]))
+            in_alert_zone = cv2.pointPolygonTest(np.array(alert_zone, np.int32), pt, False) >= 0
+            cv2.circle(frame, (int(pt[0]), int(pt[1])), 5, (0, 0, 255), -1)
+            if not in_alert_zone:
+                status = "Outside Zone"
+                color = (200, 200, 200)
+                cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
+                draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
+                continue
+            aspect_ratio = (x2 - x1) / float(y2 - y1) if (y2 - y1) > 0 else 0
+            base_lying = aspect_ratio > 1.5 and y2 > height * 0.5
+            integrated_lying = is_lying_from_keypoints(flat_keypoints, y2 - y1)
+            pose_static = base_lying and integrated_lying
+            current_bottom = bottom_center((x1, y1, x2, y2))
+            if len(kp) > 12:
+                pt = ((kp[11][0] + kp[12][0]) / 2, (kp[11][1] + kp[12][1]) / 2)
+            else:
+                continue
+            pt = (float(pt[0]), float(pt[1]))  # mid-hip
+            in_alert_zone = cv2.pointPolygonTest(np.array(alert_zone, np.int32), pt, False) >= 0
+            cv2.circle(frame, (int(pt[0]), int(pt[1])), 5, (0, 0, 255), -1)  # mid-hip marker
+            cv2.circle(frame, (int(current_bottom[0]), int(current_bottom[1])), 3, (255, 0, 0), -1)  # bottom center marker
+            if not in_alert_zone:
+                status = "Outside Zone"
+                color = (200, 200, 200)
                 cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
                 draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
+                continue
+            alpha = 0.8
+            if track_id not in velocity_static_info:
+                velocity_static_info[track_id] = (current_bottom, frame_index)
+                smoothed = current_bottom
+                velocity_val = 0.0
+                velocity_static = False
+            else:
+                prev_pt, _ = velocity_static_info[track_id]
+                smoothed = alpha * np.array(prev_pt) + (1 - alpha) * np.array(current_bottom)
+                velocity_static_info[track_id] = (smoothed.tolist(), frame_index)
+                distance = compute_distance(smoothed, prev_pt)
+                velocity_val = distance * fps
+                velocity_static = distance < velocity_threshold
+            is_static = pose_static or velocity_static
+            if is_static:
+                if track_id not in lying_start_times:
+                    lying_start_times[track_id] = frame_index
+                duration_frames = frame_index - lying_start_times[track_id]
+            else:
+                lying_start_times.pop(track_id, None)
+                duration_frames = 0
+            if duration_frames >= threshold_frames:
+                status = f"FAINTED ({duration_frames/fps:.1f}s)"
+                color = (0, 0, 255)
+            elif is_static:
+                status = f"Static ({duration_frames/fps:.1f}s)"
+                color = (0, 255, 255)
+            else:
+                status = "Upright"
+                color = (0, 255, 0)
+            cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
+            draw_multiline_text(frame, [f"ID {track_id}: {status}"], (x1, max(y1-10, 0)))
+            vel_text = f"Vel: {velocity_val:.1f} px/s"
+            text_offset = 15
+            (vt_w, vt_h), vt_baseline = cv2.getTextSize(vel_text, cv2.FONT_HERSHEY_SIMPLEX, 0.4, 1)
+            vel_org = (int(pt[0] - vt_w / 2), int(pt[1] + text_offset + vt_h))
+            cv2.rectangle(frame, (vel_org[0], vel_org[1] - vt_h - vt_baseline),
+                          (vel_org[0] + vt_w, vel_org[1] + vt_baseline), (50,50,50), -1)
+            cv2.putText(frame, vel_text, vel_org, cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255,255,255), 1, cv2.LINE_AA)
         out.write(frame)