Spaces:

Luigi
/

Real-Time-Faint-Detection-on-Video

Sleeping

App Files Files Community

Luigi commited on Apr 14

Commit

8b12e6b

1 Parent(s): cb0f343

add zero gpu support

Browse files

Files changed (4) hide show

README.md +26 -25
app.py +80 -109
packages.txt +1 -1
requirementx.txt +1 -0

README.md CHANGED Viewed

@@ -12,54 +12,55 @@ short_description: Real-Time Faint Detection on Video
 ---
-# Real-Time Faint Detection on Video
-This repository contains a Hugging Face Spaces demo that implements a real-time faint (or post‑faint) detection system on uploaded video files using ZeroGPU acceleration. The application is built in Python and leverages:
 - **OpenCV** for video processing.
 - **Ultralytics YOLOv8** for person detection.
-- **PyTorch** as the underlying deep learning framework.
 - **Gradio** for a user‑friendly web interface.
 ## Features
-- **Video File Input:** Upload a video file (e.g., MP4) to the demo.
-- **Detection of Lying Persons:** The system uses a YOLO model to detect persons in each frame and applies a simple heuristic (aspect ratio and vertical position) to decide if a person is lying down.
-- **Tracking & Timing:** A basic centroid tracker associates detections across frames, accumulating the duration a person is detected as lying down.
-- **User-defined Threshold:** Set a threshold (from 5 to 600 seconds) via the UI. If a person is motionless longer than this threshold, they are flagged as "FAINTED."
-- **Annotated Output:** The processed video displays bounding boxes and labels (showing IDs, status, and elapsed lying time).
 ## How It Works
 1. **Detection:**
-   The YOLOv8 model (nano version for speed) is used to detect people in each frame. Only detections with a confidence above 0.5 are considered.
-2. **Heuristic for Falling:**
-   A person is assumed to be lying down if:
-   - Their bounding box is significantly wider than tall (aspect ratio > 1.5).
-   - The lower part of the bounding box is in the bottom half of the frame (suggesting the person is on the floor).
-3. **Tracking:**
-   Detected persons are tracked using a simple centroid-matching algorithm. For each track, the demo accumulates the frame count during which the person remains lying down.
-4. **Thresholding:**
-   The frame count is converted into seconds (using the video’s FPS). When the accumulated time exceeds the user‑defined threshold, that track is labeled as "FAINTED."
 5. **Output Generation:**
-   Annotated frames are stitched back into an output video that is returned via the Gradio interface.
 ## Running on Hugging Face Spaces
-This demo is designed for Hugging Face Spaces and is optimized for ZeroGPU. When deployed, the GPU (such as an A100) is only activated during video processing, keeping idle costs low.
-### To deploy:
 1. Fork or clone this repository on Hugging Face Spaces.
-2. The environment will install the required dependencies listed in `requirements.txt`.
-3. Launch the Space and upload a video to test the faint detection functionality.
 ## Running Locally
 1. **Clone the Repository:**
    ```bash
-   git clone https://github.com/your_username/real-time-faint-detection.git
-   cd real-time-faint-detection

 ---
+# Advanced Real-Time Faint Detection on Video
+This repository contains a Hugging Face Spaces demo for detecting faint (or post‑faint) scenarios in video files using an advanced tracking method based on DeepSORT Realtime. The application is built in Python and leverages:
 - **OpenCV** for video processing.
 - **Ultralytics YOLOv8** for person detection.
+- **DeepSORT Realtime** for robust multi‑object tracking.
+- **PyTorch** as the deep learning backend.
 - **Gradio** for a user‑friendly web interface.
 ## Features
+- **Video File Input:** Upload an MP4 video file to the demo.
+- **Detection of Lying Persons:** The demo uses a YOLOv8 model to detect persons. A simple heuristic (aspect ratio and vertical position) is then applied to decide if a person is lying down.
+- **Advanced Tracking:** Integration of DeepSORT Realtime provides robust multi‑person tracking, even in occluded or crowded scenes.
+- **Timing and Thresholding:** The system records the duration that a person is detected as lying down. If they remain motionless longer than a user‑defined threshold (between 5 and 600 seconds), they are flagged as "FAINTED."
+- **Annotated Output:** The processed video displays bounding boxes and labels for each person along with their current status (Upright, Lying Down, or FAINTED).
 ## How It Works
 1. **Detection:**
+   The YOLOv8 model (nano version) detects people in each frame of the video. Only detections with a confidence greater than 0.5 are passed on.
+2. **Advanced Tracking with DeepSORT:**
+   The detections are fed into DeepSORT Realtime, which associates detections across frames and assigns unique IDs to each person. This tracker is robust to occlusions and can maintain consistent identities even in crowded scenes.
+3. **Lying Detection Heuristic:**
+   For each tracked person, a simple heuristic determines if the person is lying down:
+   - The bounding box is much wider than it is tall (aspect ratio > 1.5).
+   - The lower edge of the box is located in the lower half of the frame.
+4. **Timing and Status Update:**
+   The demo records the first frame when a track meets the lying criteria and computes the duration the person remains in that state. When this duration exceeds the threshold set by the user, the system flags the track as "FAINTED".
 5. **Output Generation:**
+   Annotated frames (with bounding boxes and labels) are stitched together into an output video that is returned to the user via the Gradio interface.
 ## Running on Hugging Face Spaces
+This demo is designed for Hugging Face Spaces and supports ZeroGPU acceleration. The GPU (e.g., A100) is activated only during processing, optimizing resource usage.
+### To Deploy:
 1. Fork or clone this repository on Hugging Face Spaces.
+2. The dependencies in `requirements.txt` will be installed automatically.
+3. Launch the Space and upload a video file to test the faint detection functionality.
 ## Running Locally
 1. **Clone the Repository:**
    ```bash
+   git clone https://github.com/your_username/advanced-faint-detection.git
+   cd advanced-faint-detection

app.py CHANGED Viewed

@@ -1,28 +1,24 @@
 import cv2
 import numpy as np
-import time
 import os
 import tempfile
 import gradio as gr
-from ultralytics import YOLO  # ultralytics provides YOLOv8 models
 def process_video(video_file, threshold_secs):
     """
-    Process the uploaded video file to detect motionless (fallen) persons.
-    Parameters:
-      video_file: Path to the uploaded video file.
-      threshold_secs: Duration threshold in seconds that a person must remain
-                      lying down to be flagged as 'FAINTED'.
-    Returns:
-      out_path: Path to the processed video with annotations.
     """
-    # Open the input video file
     cap = cv2.VideoCapture(video_file)
     if not cap.isOpened():
         raise ValueError("Error opening the video file.")
     fps = cap.get(cv2.CAP_PROP_FPS)
     width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
     height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
@@ -30,125 +26,99 @@ def process_video(video_file, threshold_secs):
     out_path = os.path.join(tempfile.gettempdir(), "output.mp4")
     out = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
-    # Load YOLOv8 model (download if not already present)
-    # Here we use the nano model for speed.
     model = YOLO("yolov8n.pt")
-    # Tracking dictionary: key = track id, value = dict with centroid, start time, last update, etc.
-    tracks = {}  # {track_id: {'centroid':(x,y), 'start_time': frame_index, 'last_update': frame_index, 'box':(x1,y1,x2,y2), 'fainted': bool}}
-    next_track_id = 0
     frame_index = 0
-    threshold_frames = threshold_secs * fps  # convert seconds to frame count
-    # Main processing loop (frame-by-frame)
     while True:
         ret, frame = cap.read()
         if not ret:
             break
         frame_index += 1
         # Run YOLO detection on the current frame
         results = model(frame)[0]
         if results.boxes is not None:
-            boxes = results.boxes.xyxy.cpu().numpy()  # each box is [x1, y1, x2, y2]
-            classes = results.boxes.cls.cpu().numpy()  # class IDs (COCO: 0 = person)
             confidences = results.boxes.conf.cpu().numpy()
-        else:
-            boxes, classes, confidences = [], [], []
-        # Filter detections: only keep "person" and with a confidence above 0.5.
-        person_boxes = []
-        for box, cls, conf in zip(boxes, classes, confidences):
-            if int(cls) == 0 and conf > 0.5:
-                person_boxes.append(box)
-        # Use a simple heuristic to decide if the detected person is lying down:
-        # For this demo, we consider a person "fallen" if:
-        #   - the bounding box is much wider than tall (aspect ratio > 1.5), and
-        #   - the bottom of the box is in the lower half of the frame (assume on floor)
-        detections = []  # Each detection: (centroid_x, centroid_y, x1, y1, x2, y2)
-        for box in person_boxes:
-            x1, y1, x2, y2 = box.astype(int)
             w = x2 - x1
             h = y2 - y1
             aspect_ratio = w / float(h) if h > 0 else 0
-            # Heuristic: a person is lying if the box is wide and low in the frame.
             if aspect_ratio > 1.5 and y2 > height * 0.5:
-                cx = int((x1 + x2) / 2)
-                cy = int((y1 + y2) / 2)
-                detections.append((cx, cy, x1, y1, x2, y2))
-        # Simple tracking: match detections to existing tracks based on centroid proximity.
-        updated_track_ids = set()
-        for det in detections:
-            cx, cy, x1, y1, x2, y2 = det
-            matched = None
-            for tid, track in tracks.items():
-                prev_cx, prev_cy = track["centroid"]
-                dist = np.sqrt((cx - prev_cx) ** 2 + (cy - prev_cy) ** 2)
-                if dist < 50:  # if distance is less than 50 pixels, consider it the same person
-                    matched = tid
-                    break
-            if matched is not None:
-                # Update existing track
-                tracks[matched]["centroid"] = (cx, cy)
-                tracks[matched]["last_update"] = frame_index
-                # If this is the first frame the person is detected as lying down, record the start time.
-                if "start_time" not in tracks[matched]:
-                    tracks[matched]["start_time"] = frame_index
-                updated_track_ids.add(matched)
-                tracks[matched]["box"] = (x1, y1, x2, y2)
-            else:
-                # Create a new track for a new detection
-                tracks[next_track_id] = {
-                    "centroid": (cx, cy),
-                    "start_time": frame_index,
-                    "last_update": frame_index,
-                    "box": (x1, y1, x2, y2),
-                    "fainted": False,
-                }
-                updated_track_ids.add(next_track_id)
-                next_track_id += 1
-        # Remove tracks that haven’t been updated in the last 5 frames (lost track)
-        remove_ids = []
-        for tid, track in tracks.items():
-            if frame_index - track["last_update"] > 5:
-                remove_ids.append(tid)
-        for tid in remove_ids:
-            del tracks[tid]
-        # Annotate frame with information from tracks
-        for tid, track in tracks.items():
-            duration_frames = frame_index - track.get("start_time", frame_index)
-            label = "Lying Down"
-            color = (0, 255, 255)  # yellow for lying down
             if duration_frames >= threshold_frames:
-                label = "FAINTED"
                 color = (0, 0, 255)  # red for fainted
-                track["fainted"] = True
-            if "box" in track:
-                x1, y1, x2, y2 = track["box"]
-                cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
-                cv2.putText(
-                    frame,
-                    f"ID {tid}: {label} ({duration_frames / fps:.1f}s)",
-                    (x1, max(y1 - 10, 0)),
-                    cv2.FONT_HERSHEY_SIMPLEX,
-                    0.5,
-                    color,
-                    2,
-                )
-        # Write the annotated frame to the output video
         out.write(frame)
     cap.release()
     out.release()
     return out_path
-# Create a Gradio interface for the demo.
 demo = gr.Interface(
     fn=process_video,
     inputs=[
@@ -156,12 +126,13 @@ demo = gr.Interface(
         gr.Slider(5, 600, value=5, step=1, label="Motionless Duration Threshold (seconds)"),
     ],
     outputs=gr.Video(label="Processed Video"),
-    title="Real-Time Faint Detection on Video",
     description=(
-        "Upload a video file and set a threshold duration (in seconds). "
-        "This demo detects persons lying motionless (using a YOLO-based heuristic) and flags them as 'FAINTED' "
-        "if they remain down longer than the threshold."
     ),
 )
-demo.launch()

+import spaces
 import cv2
 import numpy as np
 import os
 import tempfile
 import gradio as gr
+from ultralytics import YOLO  # for YOLOv8 person detection
+from deep_sort_realtime.deepsort_tracker import DeepSort  # advanced multi-object tracker
+@spaces.GPU
 def process_video(video_file, threshold_secs):
     """
+    Process an uploaded video file to detect persons lying motionless and flag them as "FAINTED"
+    after exceeding the specified threshold duration (in seconds). Uses YOLOv8 for detection
+    and DeepSORT for tracking.
     """
+    # Open the video file
     cap = cv2.VideoCapture(video_file)
     if not cap.isOpened():
         raise ValueError("Error opening the video file.")
     fps = cap.get(cv2.CAP_PROP_FPS)
     width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
     height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
     out_path = os.path.join(tempfile.gettempdir(), "output.mp4")
     out = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
+    # Load the YOLOv8 model for person detection (using the nano variant for speed)
     model = YOLO("yolov8n.pt")
+    # Initialize DeepSORT tracker
+    tracker = DeepSort(max_age=30, n_init=3, embedder="mobilenet", half=True)
+    # Dictionary to keep track of when a given track was first detected as "lying"
+    lying_start_times = {}
     frame_index = 0
+    threshold_frames = threshold_secs * fps  # convert seconds to number of frames
     while True:
         ret, frame = cap.read()
         if not ret:
             break
         frame_index += 1
         # Run YOLO detection on the current frame
         results = model(frame)[0]
+        # Prepare detections for DeepSORT tracker:
+        # DeepSORT expects each detection in the format:
+        #   [ [x, y, w, h], confidence, class_id ]
+        detections = []
         if results.boxes is not None:
+            boxes = results.boxes.xyxy.cpu().numpy()  # [x1, y1, x2, y2]
+            classes = results.boxes.cls.cpu().numpy()
             confidences = results.boxes.conf.cpu().numpy()
+            for box, cls, conf in zip(boxes, classes, confidences):
+                if int(cls) == 0 and conf > 0.5:  # COCO class 0 is "person"
+                    # Convert bounding box coordinates to int
+                    x1, y1, x2, y2 = box.astype(int)
+                    w = int(x2 - x1)
+                    h = int(y2 - y1)
+                    # DeepSORT requires the bbox to be nested in a list
+                    detections.append([[int(x1), int(y1), w, h], float(conf), 0])
+        # Update tracker using DeepSORT.
+        # The tracker internally handles matching detections across frames.
+        tracks = tracker.update_tracks(detections, frame=frame)
+        # Process each track for lying detection using our heuristic:
+        # Heuristic: a person is considered "lying down" if:
+        # - The bounding box is significantly wider than tall (aspect ratio > 1.5)
+        # - The lower edge of the bounding box is in the lower half of the frame
+        for track in tracks:
+            if not track.is_confirmed():
+                continue
+            track_id = track.track_id
+            bbox = track.to_tlbr()  # returns [x1, y1, x2, y2]
+            x1, y1, x2, y2 = [int(coord) for coord in bbox]
             w = x2 - x1
             h = y2 - y1
             aspect_ratio = w / float(h) if h > 0 else 0
+            lying = False
             if aspect_ratio > 1.5 and y2 > height * 0.5:
+                lying = True
+            # Update or reset the lying start time per track
+            if lying:
+                if track_id not in lying_start_times:
+                    lying_start_times[track_id] = frame_index
+                duration_frames = frame_index - lying_start_times[track_id]
+            else:
+                if track_id in lying_start_times:
+                    del lying_start_times[track_id]
+                duration_frames = 0
+            # Decide label and color based on duration
             if duration_frames >= threshold_frames:
+                label = f"ID {track_id}: FAINTED ({duration_frames/fps:.1f}s)"
                 color = (0, 0, 255)  # red for fainted
+            elif lying:
+                label = f"ID {track_id}: Lying Down ({duration_frames/fps:.1f}s)"
+                color = (0, 255, 255)  # yellow for lying down
+            else:
+                label = f"ID {track_id}: Upright"
+                color = (0, 255, 0)  # green for normal upright posture
+            # Annotate the frame
+            cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
+            cv2.putText(frame, label, (x1, max(y1 - 10, 0)),
+                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
         out.write(frame)
     cap.release()
     out.release()
     return out_path
+# Create Gradio interface for the demo.
 demo = gr.Interface(
     fn=process_video,
     inputs=[
         gr.Slider(5, 600, value=5, step=1, label="Motionless Duration Threshold (seconds)"),
     ],
     outputs=gr.Video(label="Processed Video"),
+    title="Advanced Real-Time Faint Detection on Video",
     description=(
+        "Upload a video file and set a threshold duration (in seconds). This demo uses YOLOv8 for person detection and "
+        "DeepSORT for advanced tracking. It flags persons as 'FAINTED' if they remain lying motionless (determined by a "
+        "heuristic) for longer than the threshold."
     ),
 )
+if __name__ == "__main__":
+    demo.launch()

packages.txt CHANGED Viewed

	@@ -1 +1 @@
1	- opencv~~-python~~


1	+ python3-opencv

requirementx.txt CHANGED Viewed

@@ -3,3 +3,4 @@ opencv-python
 ultralytics
 torch
 numpy

 ultralytics
 torch
 numpy
+deep-sort-realtime