Luigi commited on
Commit
8b12e6b
·
1 Parent(s): cb0f343

add zero gpu support

Browse files
Files changed (4) hide show
  1. README.md +26 -25
  2. app.py +80 -109
  3. packages.txt +1 -1
  4. requirementx.txt +1 -0
README.md CHANGED
@@ -12,54 +12,55 @@ short_description: Real-Time Faint Detection on Video
12
  ---
13
 
14
 
15
- # Real-Time Faint Detection on Video
16
 
17
- This repository contains a Hugging Face Spaces demo that implements a real-time faint (or post‑faint) detection system on uploaded video files using ZeroGPU acceleration. The application is built in Python and leverages:
18
 
19
  - **OpenCV** for video processing.
20
  - **Ultralytics YOLOv8** for person detection.
21
- - **PyTorch** as the underlying deep learning framework.
 
22
  - **Gradio** for a user‑friendly web interface.
23
 
24
  ## Features
25
 
26
- - **Video File Input:** Upload a video file (e.g., MP4) to the demo.
27
- - **Detection of Lying Persons:** The system uses a YOLO model to detect persons in each frame and applies a simple heuristic (aspect ratio and vertical position) to decide if a person is lying down.
28
- - **Tracking & Timing:** A basic centroid tracker associates detections across frames, accumulating the duration a person is detected as lying down.
29
- - **User-defined Threshold:** Set a threshold (from 5 to 600 seconds) via the UI. If a person is motionless longer than this threshold, they are flagged as "FAINTED."
30
- - **Annotated Output:** The processed video displays bounding boxes and labels (showing IDs, status, and elapsed lying time).
31
 
32
  ## How It Works
33
 
34
  1. **Detection:**
35
- The YOLOv8 model (nano version for speed) is used to detect people in each frame. Only detections with a confidence above 0.5 are considered.
36
-
37
- 2. **Heuristic for Falling:**
38
- A person is assumed to be lying down if:
39
- - Their bounding box is significantly wider than tall (aspect ratio > 1.5).
40
- - The lower part of the bounding box is in the bottom half of the frame (suggesting the person is on the floor).
41
 
42
- 3. **Tracking:**
43
- Detected persons are tracked using a simple centroid-matching algorithm. For each track, the demo accumulates the frame count during which the person remains lying down.
44
 
45
- 4. **Thresholding:**
46
- The frame count is converted into seconds (using the video’s FPS). When the accumulated time exceeds the user‑defined threshold, that track is labeled as "FAINTED."
 
 
 
 
 
47
 
48
  5. **Output Generation:**
49
- Annotated frames are stitched back into an output video that is returned via the Gradio interface.
50
 
51
  ## Running on Hugging Face Spaces
52
 
53
- This demo is designed for Hugging Face Spaces and is optimized for ZeroGPU. When deployed, the GPU (such as an A100) is only activated during video processing, keeping idle costs low.
54
 
55
- ### To deploy:
56
  1. Fork or clone this repository on Hugging Face Spaces.
57
- 2. The environment will install the required dependencies listed in `requirements.txt`.
58
- 3. Launch the Space and upload a video to test the faint detection functionality.
59
 
60
  ## Running Locally
61
 
62
  1. **Clone the Repository:**
63
  ```bash
64
- git clone https://github.com/your_username/real-time-faint-detection.git
65
- cd real-time-faint-detection
 
12
  ---
13
 
14
 
15
+ # Advanced Real-Time Faint Detection on Video
16
 
17
+ This repository contains a Hugging Face Spaces demo for detecting faint (or post‑faint) scenarios in video files using an advanced tracking method based on DeepSORT Realtime. The application is built in Python and leverages:
18
 
19
  - **OpenCV** for video processing.
20
  - **Ultralytics YOLOv8** for person detection.
21
+ - **DeepSORT Realtime** for robust multi‑object tracking.
22
+ - **PyTorch** as the deep learning backend.
23
  - **Gradio** for a user‑friendly web interface.
24
 
25
  ## Features
26
 
27
+ - **Video File Input:** Upload an MP4 video file to the demo.
28
+ - **Detection of Lying Persons:** The demo uses a YOLOv8 model to detect persons. A simple heuristic (aspect ratio and vertical position) is then applied to decide if a person is lying down.
29
+ - **Advanced Tracking:** Integration of DeepSORT Realtime provides robust multi‑person tracking, even in occluded or crowded scenes.
30
+ - **Timing and Thresholding:** The system records the duration that a person is detected as lying down. If they remain motionless longer than a user‑defined threshold (between 5 and 600 seconds), they are flagged as "FAINTED."
31
+ - **Annotated Output:** The processed video displays bounding boxes and labels for each person along with their current status (Upright, Lying Down, or FAINTED).
32
 
33
  ## How It Works
34
 
35
  1. **Detection:**
36
+ The YOLOv8 model (nano version) detects people in each frame of the video. Only detections with a confidence greater than 0.5 are passed on.
 
 
 
 
 
37
 
38
+ 2. **Advanced Tracking with DeepSORT:**
39
+ The detections are fed into DeepSORT Realtime, which associates detections across frames and assigns unique IDs to each person. This tracker is robust to occlusions and can maintain consistent identities even in crowded scenes.
40
 
41
+ 3. **Lying Detection Heuristic:**
42
+ For each tracked person, a simple heuristic determines if the person is lying down:
43
+ - The bounding box is much wider than it is tall (aspect ratio > 1.5).
44
+ - The lower edge of the box is located in the lower half of the frame.
45
+
46
+ 4. **Timing and Status Update:**
47
+ The demo records the first frame when a track meets the lying criteria and computes the duration the person remains in that state. When this duration exceeds the threshold set by the user, the system flags the track as "FAINTED".
48
 
49
  5. **Output Generation:**
50
+ Annotated frames (with bounding boxes and labels) are stitched together into an output video that is returned to the user via the Gradio interface.
51
 
52
  ## Running on Hugging Face Spaces
53
 
54
+ This demo is designed for Hugging Face Spaces and supports ZeroGPU acceleration. The GPU (e.g., A100) is activated only during processing, optimizing resource usage.
55
 
56
+ ### To Deploy:
57
  1. Fork or clone this repository on Hugging Face Spaces.
58
+ 2. The dependencies in `requirements.txt` will be installed automatically.
59
+ 3. Launch the Space and upload a video file to test the faint detection functionality.
60
 
61
  ## Running Locally
62
 
63
  1. **Clone the Repository:**
64
  ```bash
65
+ git clone https://github.com/your_username/advanced-faint-detection.git
66
+ cd advanced-faint-detection
app.py CHANGED
@@ -1,28 +1,24 @@
 
1
  import cv2
2
  import numpy as np
3
- import time
4
  import os
5
  import tempfile
6
  import gradio as gr
7
- from ultralytics import YOLO # ultralytics provides YOLOv8 models
 
8
 
 
9
  def process_video(video_file, threshold_secs):
10
  """
11
- Process the uploaded video file to detect motionless (fallen) persons.
12
-
13
- Parameters:
14
- video_file: Path to the uploaded video file.
15
- threshold_secs: Duration threshold in seconds that a person must remain
16
- lying down to be flagged as 'FAINTED'.
17
-
18
- Returns:
19
- out_path: Path to the processed video with annotations.
20
  """
21
- # Open the input video file
22
  cap = cv2.VideoCapture(video_file)
23
  if not cap.isOpened():
24
  raise ValueError("Error opening the video file.")
25
-
26
  fps = cap.get(cv2.CAP_PROP_FPS)
27
  width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
28
  height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
@@ -30,125 +26,99 @@ def process_video(video_file, threshold_secs):
30
  out_path = os.path.join(tempfile.gettempdir(), "output.mp4")
31
  out = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
32
 
33
- # Load YOLOv8 model (download if not already present)
34
- # Here we use the nano model for speed.
35
  model = YOLO("yolov8n.pt")
36
 
37
- # Tracking dictionary: key = track id, value = dict with centroid, start time, last update, etc.
38
- tracks = {} # {track_id: {'centroid':(x,y), 'start_time': frame_index, 'last_update': frame_index, 'box':(x1,y1,x2,y2), 'fainted': bool}}
39
- next_track_id = 0
 
 
 
40
  frame_index = 0
41
- threshold_frames = threshold_secs * fps # convert seconds to frame count
42
 
43
- # Main processing loop (frame-by-frame)
44
  while True:
45
  ret, frame = cap.read()
46
  if not ret:
47
  break
48
  frame_index += 1
49
-
50
  # Run YOLO detection on the current frame
51
  results = model(frame)[0]
 
 
 
 
52
  if results.boxes is not None:
53
- boxes = results.boxes.xyxy.cpu().numpy() # each box is [x1, y1, x2, y2]
54
- classes = results.boxes.cls.cpu().numpy() # class IDs (COCO: 0 = person)
55
  confidences = results.boxes.conf.cpu().numpy()
56
- else:
57
- boxes, classes, confidences = [], [], []
58
-
59
- # Filter detections: only keep "person" and with a confidence above 0.5.
60
- person_boxes = []
61
- for box, cls, conf in zip(boxes, classes, confidences):
62
- if int(cls) == 0 and conf > 0.5:
63
- person_boxes.append(box)
64
 
65
- # Use a simple heuristic to decide if the detected person is lying down:
66
- # For this demo, we consider a person "fallen" if:
67
- # - the bounding box is much wider than tall (aspect ratio > 1.5), and
68
- # - the bottom of the box is in the lower half of the frame (assume on floor)
69
- detections = [] # Each detection: (centroid_x, centroid_y, x1, y1, x2, y2)
70
- for box in person_boxes:
71
- x1, y1, x2, y2 = box.astype(int)
 
 
 
 
 
 
 
 
72
  w = x2 - x1
73
  h = y2 - y1
74
  aspect_ratio = w / float(h) if h > 0 else 0
75
- # Heuristic: a person is lying if the box is wide and low in the frame.
76
  if aspect_ratio > 1.5 and y2 > height * 0.5:
77
- cx = int((x1 + x2) / 2)
78
- cy = int((y1 + y2) / 2)
79
- detections.append((cx, cy, x1, y1, x2, y2))
80
-
81
- # Simple tracking: match detections to existing tracks based on centroid proximity.
82
- updated_track_ids = set()
83
- for det in detections:
84
- cx, cy, x1, y1, x2, y2 = det
85
- matched = None
86
- for tid, track in tracks.items():
87
- prev_cx, prev_cy = track["centroid"]
88
- dist = np.sqrt((cx - prev_cx) ** 2 + (cy - prev_cy) ** 2)
89
- if dist < 50: # if distance is less than 50 pixels, consider it the same person
90
- matched = tid
91
- break
92
- if matched is not None:
93
- # Update existing track
94
- tracks[matched]["centroid"] = (cx, cy)
95
- tracks[matched]["last_update"] = frame_index
96
- # If this is the first frame the person is detected as lying down, record the start time.
97
- if "start_time" not in tracks[matched]:
98
- tracks[matched]["start_time"] = frame_index
99
- updated_track_ids.add(matched)
100
- tracks[matched]["box"] = (x1, y1, x2, y2)
101
- else:
102
- # Create a new track for a new detection
103
- tracks[next_track_id] = {
104
- "centroid": (cx, cy),
105
- "start_time": frame_index,
106
- "last_update": frame_index,
107
- "box": (x1, y1, x2, y2),
108
- "fainted": False,
109
- }
110
- updated_track_ids.add(next_track_id)
111
- next_track_id += 1
112
 
113
- # Remove tracks that haven’t been updated in the last 5 frames (lost track)
114
- remove_ids = []
115
- for tid, track in tracks.items():
116
- if frame_index - track["last_update"] > 5:
117
- remove_ids.append(tid)
118
- for tid in remove_ids:
119
- del tracks[tid]
 
 
120
 
121
- # Annotate frame with information from tracks
122
- for tid, track in tracks.items():
123
- duration_frames = frame_index - track.get("start_time", frame_index)
124
- label = "Lying Down"
125
- color = (0, 255, 255) # yellow for lying down
126
  if duration_frames >= threshold_frames:
127
- label = "FAINTED"
128
  color = (0, 0, 255) # red for fainted
129
- track["fainted"] = True
130
-
131
- if "box" in track:
132
- x1, y1, x2, y2 = track["box"]
133
- cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
134
- cv2.putText(
135
- frame,
136
- f"ID {tid}: {label} ({duration_frames / fps:.1f}s)",
137
- (x1, max(y1 - 10, 0)),
138
- cv2.FONT_HERSHEY_SIMPLEX,
139
- 0.5,
140
- color,
141
- 2,
142
- )
143
- # Write the annotated frame to the output video
144
  out.write(frame)
145
-
146
  cap.release()
147
  out.release()
148
  return out_path
149
 
150
 
151
- # Create a Gradio interface for the demo.
152
  demo = gr.Interface(
153
  fn=process_video,
154
  inputs=[
@@ -156,12 +126,13 @@ demo = gr.Interface(
156
  gr.Slider(5, 600, value=5, step=1, label="Motionless Duration Threshold (seconds)"),
157
  ],
158
  outputs=gr.Video(label="Processed Video"),
159
- title="Real-Time Faint Detection on Video",
160
  description=(
161
- "Upload a video file and set a threshold duration (in seconds). "
162
- "This demo detects persons lying motionless (using a YOLO-based heuristic) and flags them as 'FAINTED' "
163
- "if they remain down longer than the threshold."
164
  ),
165
  )
166
 
167
- demo.launch()
 
 
1
+ import spaces
2
  import cv2
3
  import numpy as np
 
4
  import os
5
  import tempfile
6
  import gradio as gr
7
+ from ultralytics import YOLO # for YOLOv8 person detection
8
+ from deep_sort_realtime.deepsort_tracker import DeepSort # advanced multi-object tracker
9
 
10
+ @spaces.GPU
11
  def process_video(video_file, threshold_secs):
12
  """
13
+ Process an uploaded video file to detect persons lying motionless and flag them as "FAINTED"
14
+ after exceeding the specified threshold duration (in seconds). Uses YOLOv8 for detection
15
+ and DeepSORT for tracking.
 
 
 
 
 
 
16
  """
17
+ # Open the video file
18
  cap = cv2.VideoCapture(video_file)
19
  if not cap.isOpened():
20
  raise ValueError("Error opening the video file.")
21
+
22
  fps = cap.get(cv2.CAP_PROP_FPS)
23
  width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
24
  height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
 
26
  out_path = os.path.join(tempfile.gettempdir(), "output.mp4")
27
  out = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
28
 
29
+ # Load the YOLOv8 model for person detection (using the nano variant for speed)
 
30
  model = YOLO("yolov8n.pt")
31
 
32
+ # Initialize DeepSORT tracker
33
+ tracker = DeepSort(max_age=30, n_init=3, embedder="mobilenet", half=True)
34
+
35
+ # Dictionary to keep track of when a given track was first detected as "lying"
36
+ lying_start_times = {}
37
+
38
  frame_index = 0
39
+ threshold_frames = threshold_secs * fps # convert seconds to number of frames
40
 
 
41
  while True:
42
  ret, frame = cap.read()
43
  if not ret:
44
  break
45
  frame_index += 1
46
+
47
  # Run YOLO detection on the current frame
48
  results = model(frame)[0]
49
+ # Prepare detections for DeepSORT tracker:
50
+ # DeepSORT expects each detection in the format:
51
+ # [ [x, y, w, h], confidence, class_id ]
52
+ detections = []
53
  if results.boxes is not None:
54
+ boxes = results.boxes.xyxy.cpu().numpy() # [x1, y1, x2, y2]
55
+ classes = results.boxes.cls.cpu().numpy()
56
  confidences = results.boxes.conf.cpu().numpy()
57
+ for box, cls, conf in zip(boxes, classes, confidences):
58
+ if int(cls) == 0 and conf > 0.5: # COCO class 0 is "person"
59
+ # Convert bounding box coordinates to int
60
+ x1, y1, x2, y2 = box.astype(int)
61
+ w = int(x2 - x1)
62
+ h = int(y2 - y1)
63
+ # DeepSORT requires the bbox to be nested in a list
64
+ detections.append([[int(x1), int(y1), w, h], float(conf), 0])
65
 
66
+ # Update tracker using DeepSORT.
67
+ # The tracker internally handles matching detections across frames.
68
+ tracks = tracker.update_tracks(detections, frame=frame)
69
+
70
+ # Process each track for lying detection using our heuristic:
71
+ # Heuristic: a person is considered "lying down" if:
72
+ # - The bounding box is significantly wider than tall (aspect ratio > 1.5)
73
+ # - The lower edge of the bounding box is in the lower half of the frame
74
+ for track in tracks:
75
+ if not track.is_confirmed():
76
+ continue
77
+
78
+ track_id = track.track_id
79
+ bbox = track.to_tlbr() # returns [x1, y1, x2, y2]
80
+ x1, y1, x2, y2 = [int(coord) for coord in bbox]
81
  w = x2 - x1
82
  h = y2 - y1
83
  aspect_ratio = w / float(h) if h > 0 else 0
84
+ lying = False
85
  if aspect_ratio > 1.5 and y2 > height * 0.5:
86
+ lying = True
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
+ # Update or reset the lying start time per track
89
+ if lying:
90
+ if track_id not in lying_start_times:
91
+ lying_start_times[track_id] = frame_index
92
+ duration_frames = frame_index - lying_start_times[track_id]
93
+ else:
94
+ if track_id in lying_start_times:
95
+ del lying_start_times[track_id]
96
+ duration_frames = 0
97
 
98
+ # Decide label and color based on duration
 
 
 
 
99
  if duration_frames >= threshold_frames:
100
+ label = f"ID {track_id}: FAINTED ({duration_frames/fps:.1f}s)"
101
  color = (0, 0, 255) # red for fainted
102
+ elif lying:
103
+ label = f"ID {track_id}: Lying Down ({duration_frames/fps:.1f}s)"
104
+ color = (0, 255, 255) # yellow for lying down
105
+ else:
106
+ label = f"ID {track_id}: Upright"
107
+ color = (0, 255, 0) # green for normal upright posture
108
+
109
+ # Annotate the frame
110
+ cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
111
+ cv2.putText(frame, label, (x1, max(y1 - 10, 0)),
112
+ cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
113
+
 
 
 
114
  out.write(frame)
115
+
116
  cap.release()
117
  out.release()
118
  return out_path
119
 
120
 
121
+ # Create Gradio interface for the demo.
122
  demo = gr.Interface(
123
  fn=process_video,
124
  inputs=[
 
126
  gr.Slider(5, 600, value=5, step=1, label="Motionless Duration Threshold (seconds)"),
127
  ],
128
  outputs=gr.Video(label="Processed Video"),
129
+ title="Advanced Real-Time Faint Detection on Video",
130
  description=(
131
+ "Upload a video file and set a threshold duration (in seconds). This demo uses YOLOv8 for person detection and "
132
+ "DeepSORT for advanced tracking. It flags persons as 'FAINTED' if they remain lying motionless (determined by a "
133
+ "heuristic) for longer than the threshold."
134
  ),
135
  )
136
 
137
+ if __name__ == "__main__":
138
+ demo.launch()
packages.txt CHANGED
@@ -1 +1 @@
1
- opencv-python
 
1
+ python3-opencv
requirementx.txt CHANGED
@@ -3,3 +3,4 @@ opencv-python
3
  ultralytics
4
  torch
5
  numpy
 
 
3
  ultralytics
4
  torch
5
  numpy
6
+ deep-sort-realtime