ekabaruh commited on
Commit
9867c2c
·
verified ·
1 Parent(s): f38a2ce

Upload 8 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/one-by-one-person-detection.mp4 filter=lfs diff=lfs merge=lfs -text
37
+ assets/people-detection.mp4 filter=lfs diff=lfs merge=lfs -text
38
+ assets/store-aisle-detection.mp4 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,10 +1,57 @@
1
  ---
2
- title: Real Time People Detection
3
- emoji: 📊
4
- colorFrom: gray
5
- colorTo: gray
6
- sdk: docker
 
 
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Real-time People Detection
3
+ emoji: 👁️
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: streamlit
7
+ sdk_version: 1.28.0
8
+ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
+ # Real-time People Detection
13
+
14
+ This Streamlit application demonstrates real-time people detection using YOLOv8, optimized for Hugging Face Spaces.
15
+
16
+ ## Features
17
+
18
+ - Real-time people detection in video streams
19
+ - Uses YOLOv8n - the smallest and fastest YOLOv8 model
20
+ - Interactive controls for detection threshold
21
+ - Adjustable inference rate for performance optimization
22
+ - Performance metrics display (FPS, inference time)
23
+
24
+ ## How to Use
25
+
26
+ 1. Select your model (YOLOv8n is default for best performance)
27
+ 2. Adjust the detection threshold (higher values = fewer but more confident detections)
28
+ 3. Set your target inference FPS (lower values use less resources)
29
+ 4. Select a demo video
30
+ 5. Click "Start" to begin detection
31
+ 6. Use "Stop" to halt the detection process
32
+
33
+ ## Demo Videos
34
+
35
+ The application includes several demo videos:
36
+ - One Person - Simple video with a single person
37
+ - Store Aisle - People walking in a store
38
+ - People Detection - Various people in different settings
39
+
40
+ ## Performance Notes
41
+
42
+ - The YOLOv8n model provides the best balance of speed and accuracy for this application
43
+ - Reducing the target inference FPS will improve performance while maintaining a smooth display
44
+ - The actual inference rate may be lower than the target on resource-constrained environments
45
+ - Detection results are reused between frames when running below the video framerate
46
+
47
+ ## Deployment on Hugging Face Spaces
48
+
49
+ This application is designed to be deployed on Hugging Face Spaces. The deployment automatically uses:
50
+
51
+ - Streamlit for the web interface
52
+ - YOLOv8n model for efficient people detection
53
+ - Demo videos for testing (webcam access is disabled in Spaces)
54
+
55
+ ## License
56
+
57
+ MIT
app.py ADDED
@@ -0,0 +1,713 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Real-time People Detection Streamlit application.
3
+
4
+ This is the main entry point for the Hugging Face Space application.
5
+ """
6
+
7
+ import os
8
+ import time
9
+ from pathlib import Path
10
+ from typing import Tuple, Dict, Any, Optional, List
11
+
12
+ import cv2
13
+ import numpy as np
14
+ import streamlit as st
15
+ from PIL import Image
16
+ import torch
17
+ from ultralytics import YOLO
18
+
19
+
20
+ # Constants
21
+ ASSETS_DIR = Path(__file__).parent / "assets"
22
+ DEMO_VIDEOS = {
23
+ "One Person": ASSETS_DIR / "one-by-one-person-detection.mp4",
24
+ "Store Aisle": ASSETS_DIR / "store-aisle-detection.mp4",
25
+ "People Detection": ASSETS_DIR / "people-detection.mp4"
26
+ }
27
+ FRAME_WIDTH = 640
28
+ FRAME_HEIGHT = 480
29
+
30
+
31
+ class PeopleDetector:
32
+ """
33
+ A class for detecting people in images using a pre-trained YOLOv8n model.
34
+
35
+ Attributes:
36
+ model_name: Name or path of the YOLOv8 model to use
37
+ threshold: Confidence threshold for detection
38
+ device: Device to run inference on (cuda/cpu)
39
+ model: The detection model
40
+ """
41
+
42
+ def __init__(
43
+ self,
44
+ model_name: str = "yolov8n.pt",
45
+ threshold: float = 0.5,
46
+ device: Optional[str] = None,
47
+ ):
48
+ """
49
+ Initialize the people detector with a pre-trained model.
50
+
51
+ Args:
52
+ model_name: YOLOv8 model name to use ('yolov8n.pt' is the smallest one)
53
+ threshold: Confidence threshold for detection (0.0 to 1.0)
54
+ device: Device to run inference on (cuda/cpu). If None, will use cuda if available.
55
+ """
56
+ self.model_name = model_name
57
+ self.threshold = threshold
58
+
59
+ # Determine the device to use
60
+ if device is None:
61
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
62
+ else:
63
+ self.device = device
64
+
65
+ # Load the YOLOv8 model
66
+ self.model = YOLO(model_name)
67
+
68
+ # Person class ID is 0 in COCO (YOLOv8 uses COCO classes)
69
+ self.person_class_id = 0
70
+
71
+ def detect(self, image: np.ndarray) -> Tuple[List[Dict[str, Any]], float]:
72
+ """
73
+ Detect people in an image.
74
+
75
+ Args:
76
+ image: Input image as numpy array (BGR format from OpenCV)
77
+
78
+ Returns:
79
+ Tuple containing:
80
+ - List of detection results with keys 'box', 'score', and 'label'
81
+ - Inference time in seconds
82
+ """
83
+ # Start timing
84
+ start_time = time.time()
85
+
86
+ # Run inference with YOLOv8
87
+ results = self.model(image, conf=self.threshold, device=self.device)
88
+
89
+ # Extract detections of people only
90
+ detections = []
91
+
92
+ # Process the results
93
+ for result in results:
94
+ boxes = result.boxes
95
+
96
+ # Extract coordinates, confidence and class
97
+ for i, box in enumerate(boxes):
98
+ cls = int(box.cls.item())
99
+ conf = float(box.conf.item())
100
+
101
+ # Check if it's a person (class 0)
102
+ if cls == self.person_class_id:
103
+ # Get bounding box
104
+ x1, y1, x2, y2 = map(int, box.xyxy.tolist()[0])
105
+
106
+ detections.append({
107
+ 'box': (x1, y1, x2, y2),
108
+ 'score': conf,
109
+ 'label': 'person'
110
+ })
111
+
112
+ # Calculate inference time
113
+ inference_time = time.time() - start_time
114
+
115
+ return detections, inference_time
116
+
117
+ def update_threshold(self, threshold: float) -> None:
118
+ """
119
+ Update the detection confidence threshold.
120
+
121
+ Args:
122
+ threshold: New threshold value (0.0 to 1.0)
123
+ """
124
+ self.threshold = threshold
125
+
126
+
127
+ class VideoSource:
128
+ """
129
+ A class for handling video input from different sources (webcam or file).
130
+
131
+ Attributes:
132
+ source: Camera index (int) or video file path (str)
133
+ width: Frame width to set (if possible)
134
+ height: Frame height to set (if possible)
135
+ fps_buffer_size: Number of frames to average for FPS calculation
136
+ """
137
+
138
+ def __init__(
139
+ self,
140
+ source: Any = 0,
141
+ width: int = 640,
142
+ height: int = 480,
143
+ fps_buffer_size: int = 30,
144
+ ):
145
+ """
146
+ Initialize the video source.
147
+
148
+ Args:
149
+ source: Camera index (int) or video file path (str)
150
+ width: Width to set for the captured frames
151
+ height: Height to set for the captured frames
152
+ fps_buffer_size: Number of frames to use for FPS averaging
153
+ """
154
+ self.source = source
155
+ self.width = width
156
+ self.height = height
157
+ self.fps_buffer_size = fps_buffer_size
158
+
159
+ self.cap = None
160
+ self.frame_times = []
161
+ self.is_running = False
162
+
163
+ def start(self) -> bool:
164
+ """
165
+ Start the video capture.
166
+
167
+ Returns:
168
+ bool: True if capture was started successfully, False otherwise
169
+ """
170
+ if self.is_running:
171
+ return True
172
+
173
+ self.cap = cv2.VideoCapture(self.source)
174
+
175
+ if not self.cap.isOpened():
176
+ return False
177
+
178
+ # Try to set properties if it's a webcam
179
+ if isinstance(self.source, int):
180
+ self.cap.set(cv2.CAP_PROP_FRAME_WIDTH, self.width)
181
+ self.cap.set(cv2.CAP_PROP_FRAME_HEIGHT, self.height)
182
+
183
+ self.is_running = True
184
+ self.frame_times = []
185
+ return True
186
+
187
+ def stop(self) -> None:
188
+ """Stop the video capture and release resources."""
189
+ if self.is_running and self.cap is not None:
190
+ self.cap.release()
191
+ self.is_running = False
192
+
193
+ def read_frame(self) -> Tuple[bool, Optional[np.ndarray]]:
194
+ """
195
+ Read a single frame from the video source.
196
+
197
+ Returns:
198
+ Tuple containing:
199
+ - Boolean indicating if frame was successfully read
200
+ - Image as numpy array (or None if no frame was read)
201
+ """
202
+ if not self.is_running or self.cap is None:
203
+ return False, None
204
+
205
+ # Record time for FPS calculation
206
+ current_time = time.time()
207
+
208
+ # Read frame
209
+ ret, frame = self.cap.read()
210
+
211
+ if ret:
212
+ # Update FPS buffer
213
+ self.frame_times.append(current_time)
214
+ if len(self.frame_times) > self.fps_buffer_size:
215
+ self.frame_times.pop(0)
216
+
217
+ return ret, frame
218
+
219
+ def get_fps(self) -> float:
220
+ """
221
+ Calculate the current FPS based on actual frame timings.
222
+
223
+ Returns:
224
+ float: Current frames per second
225
+ """
226
+ if len(self.frame_times) < 2:
227
+ return 0.0
228
+
229
+ # Calculate FPS from time differences
230
+ time_diff = self.frame_times[-1] - self.frame_times[0]
231
+ if time_diff > 0:
232
+ return (len(self.frame_times) - 1) / time_diff
233
+ return 0.0
234
+
235
+
236
+ def draw_detections(
237
+ image: np.ndarray,
238
+ detections: List[Dict[str, Any]],
239
+ color: Tuple[int, int, int] = (0, 255, 0),
240
+ thickness: int = 2,
241
+ font_scale: float = 0.5,
242
+ ) -> np.ndarray:
243
+ """
244
+ Draw bounding boxes and labels for detected people.
245
+
246
+ Args:
247
+ image: Input image to draw on
248
+ detections: List of detection results from PeopleDetector
249
+ color: BGR color tuple for bounding boxes
250
+ thickness: Line thickness for bounding boxes
251
+ font_scale: Font scale for text labels
252
+
253
+ Returns:
254
+ np.ndarray: Image with drawn detections
255
+ """
256
+ annotated_image = image.copy()
257
+
258
+ for detection in detections:
259
+ # Extract bounding box coordinates
260
+ x_min, y_min, x_max, y_max = detection['box']
261
+
262
+ # Draw bounding box
263
+ cv2.rectangle(
264
+ annotated_image,
265
+ (x_min, y_min),
266
+ (x_max, y_max),
267
+ color,
268
+ thickness
269
+ )
270
+
271
+ # Create label with confidence score
272
+ label = f"Person: {detection['score']:.2f}"
273
+
274
+ # Calculate text size and position
275
+ (text_width, text_height), _ = cv2.getTextSize(
276
+ label, cv2.FONT_HERSHEY_SIMPLEX, font_scale, thickness
277
+ )
278
+
279
+ # Draw label background
280
+ cv2.rectangle(
281
+ annotated_image,
282
+ (x_min, y_min - text_height - 5),
283
+ (x_min + text_width, y_min),
284
+ color,
285
+ -1 # Filled rectangle
286
+ )
287
+
288
+ # Draw text
289
+ cv2.putText(
290
+ annotated_image,
291
+ label,
292
+ (x_min, y_min - 5),
293
+ cv2.FONT_HERSHEY_SIMPLEX,
294
+ font_scale,
295
+ (0, 0, 0), # Black text
296
+ thickness
297
+ )
298
+
299
+ return annotated_image
300
+
301
+
302
+ def add_performance_stats(
303
+ image: np.ndarray,
304
+ fps: float,
305
+ inference_time: float,
306
+ people_count: int,
307
+ inference_fps: float = 0.0,
308
+ bg_color: Tuple[int, int, int] = (0, 0, 0),
309
+ text_color: Tuple[int, int, int] = (255, 255, 255),
310
+ font_scale: float = 0.5,
311
+ thickness: int = 1,
312
+ ) -> np.ndarray:
313
+ """
314
+ Add performance statistics to the image.
315
+
316
+ Args:
317
+ image: Input image to add stats to
318
+ fps: Current FPS value
319
+ inference_time: Model inference time in seconds
320
+ people_count: Number of people detected
321
+ inference_fps: Inference FPS (model predictions per second)
322
+ bg_color: Background color for stats box
323
+ text_color: Text color for stats
324
+ font_scale: Font scale for text
325
+ thickness: Line thickness for text
326
+
327
+ Returns:
328
+ np.ndarray: Image with added performance stats
329
+ """
330
+ stats_image = image.copy()
331
+
332
+ # Create stats text
333
+ fps_text = f"FPS: {fps:.1f}"
334
+ inference_text = f"Inference: {inference_time*1000:.1f}ms"
335
+ count_text = f"People: {people_count}"
336
+ inf_fps_text = f"Inference FPS: {inference_fps:.1f}"
337
+
338
+ # Get text sizes
339
+ (fps_width, fps_height), _ = cv2.getTextSize(
340
+ fps_text, cv2.FONT_HERSHEY_SIMPLEX, font_scale, thickness
341
+ )
342
+ (inf_width, inf_height), _ = cv2.getTextSize(
343
+ inference_text, cv2.FONT_HERSHEY_SIMPLEX, font_scale, thickness
344
+ )
345
+ (count_width, count_height), _ = cv2.getTextSize(
346
+ count_text, cv2.FONT_HERSHEY_SIMPLEX, font_scale, thickness
347
+ )
348
+ (inf_fps_width, inf_fps_height), _ = cv2.getTextSize(
349
+ inf_fps_text, cv2.FONT_HERSHEY_SIMPLEX, font_scale, thickness
350
+ )
351
+
352
+ # Calculate background box dimensions
353
+ box_width = max(fps_width, inf_width, count_width, inf_fps_width) + 20
354
+ box_height = fps_height + inf_height + count_height + inf_fps_height + 30
355
+
356
+ # Draw background box
357
+ cv2.rectangle(
358
+ stats_image,
359
+ (10, 10),
360
+ (10 + box_width, 10 + box_height),
361
+ bg_color,
362
+ -1 # Filled rectangle
363
+ )
364
+
365
+ # Draw text
366
+ y_offset = 10 + fps_height + 5
367
+ cv2.putText(
368
+ stats_image,
369
+ fps_text,
370
+ (20, y_offset),
371
+ cv2.FONT_HERSHEY_SIMPLEX,
372
+ font_scale,
373
+ text_color,
374
+ thickness
375
+ )
376
+
377
+ y_offset += inf_height + 5
378
+ cv2.putText(
379
+ stats_image,
380
+ inference_text,
381
+ (20, y_offset),
382
+ cv2.FONT_HERSHEY_SIMPLEX,
383
+ font_scale,
384
+ text_color,
385
+ thickness
386
+ )
387
+
388
+ y_offset += count_height + 5
389
+ cv2.putText(
390
+ stats_image,
391
+ count_text,
392
+ (20, y_offset),
393
+ cv2.FONT_HERSHEY_SIMPLEX,
394
+ font_scale,
395
+ text_color,
396
+ thickness
397
+ )
398
+
399
+ y_offset += inf_fps_height + 5
400
+ cv2.putText(
401
+ stats_image,
402
+ inf_fps_text,
403
+ (20, y_offset),
404
+ cv2.FONT_HERSHEY_SIMPLEX,
405
+ font_scale,
406
+ text_color,
407
+ thickness
408
+ )
409
+
410
+ return stats_image
411
+
412
+
413
+ class PeopleDetectionApp:
414
+ """
415
+ Streamlit application for real-time people detection.
416
+
417
+ This class handles the Streamlit UI components and orchestrates
418
+ the video capture and detection processes.
419
+ """
420
+
421
+ def __init__(self):
422
+ """Initialize the Streamlit application components."""
423
+ # Set page config
424
+ st.set_page_config(
425
+ page_title="Real-time People Detection",
426
+ page_icon="👁️",
427
+ layout="wide",
428
+ )
429
+
430
+ # Initialize session state
431
+ if "video_source" not in st.session_state:
432
+ st.session_state.video_source = None
433
+ if "detector" not in st.session_state:
434
+ st.session_state.detector = None
435
+ if "is_running" not in st.session_state:
436
+ st.session_state.is_running = False
437
+ if "frame_placeholder" not in st.session_state:
438
+ st.session_state.frame_placeholder = None
439
+ if "last_inference_time" not in st.session_state:
440
+ st.session_state.last_inference_time = 0.0
441
+ if "last_inference_timestamp" not in st.session_state:
442
+ st.session_state.last_inference_timestamp = 0.0
443
+ if "frame_count" not in st.session_state:
444
+ st.session_state.frame_count = 0
445
+ if "last_frame" not in st.session_state:
446
+ st.session_state.last_frame = None
447
+ if "last_detections" not in st.session_state:
448
+ st.session_state.last_detections = []
449
+
450
+ def create_ui(self):
451
+ """Create the Streamlit UI components."""
452
+ # Page header
453
+ st.title("Real-time People Detection")
454
+ st.markdown(
455
+ "This application detects people in video streams using YOLOv8."
456
+ )
457
+
458
+ # Sidebar for controls
459
+ with st.sidebar:
460
+ st.header("Settings")
461
+
462
+ # Model selection
463
+ model_name = st.selectbox(
464
+ "Select detection model",
465
+ options=[
466
+ "yolov8n.pt", # Nano model (smallest)
467
+ ],
468
+ index=0,
469
+ )
470
+
471
+ # Detection threshold
472
+ detection_threshold = st.slider(
473
+ "Detection threshold",
474
+ min_value=0.1,
475
+ max_value=1.0,
476
+ value=0.5,
477
+ step=0.05,
478
+ )
479
+
480
+ # Target inference FPS
481
+ target_fps = st.slider(
482
+ "Target inference FPS",
483
+ min_value=1,
484
+ max_value=30,
485
+ value=10,
486
+ step=1,
487
+ help="Control how many frames per second are sent to the model for inference. Lower values use less resources but may appear less smooth."
488
+ )
489
+
490
+ # For Hugging Face Space, we only provide demo videos (no webcam)
491
+ source_type = "Demo Video"
492
+
493
+ # Let user select which demo video to use
494
+ demo_selection = st.selectbox(
495
+ "Select demo video",
496
+ options=list(DEMO_VIDEOS.keys()),
497
+ index=0,
498
+ )
499
+ video_path = str(DEMO_VIDEOS[demo_selection])
500
+ source = video_path
501
+
502
+ # Control buttons
503
+ col1, col2 = st.columns(2)
504
+
505
+ with col1:
506
+ start_button = st.button(
507
+ "Start" if not st.session_state.is_running else "Restart",
508
+ use_container_width=True,
509
+ )
510
+
511
+ with col2:
512
+ stop_button = st.button(
513
+ "Stop",
514
+ use_container_width=True,
515
+ disabled=not st.session_state.is_running,
516
+ )
517
+
518
+ # Main area for video display
519
+ video_column, stats_column = st.columns([3, 1])
520
+
521
+ with video_column:
522
+ st.subheader("Detection Feed")
523
+ # Create a placeholder for the video frame
524
+ frame_placeholder = st.empty()
525
+ st.session_state.frame_placeholder = frame_placeholder
526
+
527
+ with stats_column:
528
+ st.subheader("Performance Stats")
529
+ # Create placeholders for stats
530
+ fps_text = st.empty()
531
+ inference_text = st.empty()
532
+ people_count = st.empty()
533
+ inference_fps_text = st.empty()
534
+
535
+ # Handle button actions
536
+ if start_button:
537
+ self.start_detection(source, model_name, detection_threshold, target_fps)
538
+
539
+ if stop_button:
540
+ self.stop_detection()
541
+
542
+ # Return stats placeholders for updating
543
+ return fps_text, inference_text, people_count, inference_fps_text
544
+
545
+ def start_detection(self, source, model_name, threshold, target_fps):
546
+ """
547
+ Start the detection process.
548
+
549
+ Args:
550
+ source: Video source (camera ID or file path)
551
+ model_name: YOLOv8 model to use
552
+ threshold: Detection confidence threshold
553
+ target_fps: Target frames per second for inference
554
+ """
555
+ # Stop existing detection if running
556
+ self.stop_detection()
557
+
558
+ # Initialize video source
559
+ video_source = VideoSource(
560
+ source=source,
561
+ width=FRAME_WIDTH,
562
+ height=FRAME_HEIGHT,
563
+ )
564
+
565
+ # Initialize detector
566
+ detector = PeopleDetector(
567
+ model_name=model_name,
568
+ threshold=threshold,
569
+ )
570
+
571
+ # Start video capture
572
+ if not video_source.start():
573
+ st.error(f"Failed to open video source: {source}")
574
+ return
575
+
576
+ # Store objects in session state
577
+ st.session_state.video_source = video_source
578
+ st.session_state.detector = detector
579
+ st.session_state.is_running = True
580
+ st.session_state.target_fps = target_fps
581
+ st.session_state.last_inference_timestamp = time.time()
582
+ st.session_state.frame_count = 0
583
+ st.session_state.last_frame = None
584
+ st.session_state.last_detections = []
585
+
586
+ def stop_detection(self):
587
+ """Stop the detection process and release resources."""
588
+ if st.session_state.video_source is not None:
589
+ st.session_state.video_source.stop()
590
+ st.session_state.video_source = None
591
+
592
+ st.session_state.detector = None
593
+ st.session_state.is_running = False
594
+ st.session_state.last_frame = None
595
+ st.session_state.last_detections = []
596
+
597
+ def update_frame(self, fps_text, inference_text, people_count, inference_fps_text):
598
+ """
599
+ Update the video frame and stats.
600
+
601
+ Args:
602
+ fps_text: Streamlit element for FPS display
603
+ inference_text: Streamlit element for inference time display
604
+ people_count: Streamlit element for people count display
605
+ inference_fps_text: Streamlit element for inference FPS display
606
+ """
607
+ if not st.session_state.is_running:
608
+ return
609
+
610
+ video_source = st.session_state.video_source
611
+ detector = st.session_state.detector
612
+ target_fps = st.session_state.target_fps
613
+
614
+ if video_source is None or detector is None:
615
+ return
616
+
617
+ # Read a new frame
618
+ ret, frame = video_source.read_frame()
619
+
620
+ if not ret:
621
+ # If we've reached the end of a video file, restart it
622
+ if not isinstance(video_source.source, int):
623
+ # Restart video
624
+ video_source.stop()
625
+ if video_source.start():
626
+ ret, frame = video_source.read_frame()
627
+ if not ret:
628
+ st.error("Failed to restart video")
629
+ self.stop_detection()
630
+ return
631
+ else:
632
+ st.error("Failed to restart video source")
633
+ self.stop_detection()
634
+ return
635
+ else:
636
+ st.error("Failed to read frame from camera")
637
+ self.stop_detection()
638
+ return
639
+
640
+ # Calculate current FPS
641
+ fps = video_source.get_fps()
642
+
643
+ # Determine if we should run inference on this frame
644
+ current_time = time.time()
645
+ time_since_last_inference = current_time - st.session_state.last_inference_timestamp
646
+ inference_interval = 1.0 / target_fps
647
+
648
+ # Use cached detections or run new detection
649
+ detections = []
650
+ inference_time = 0
651
+
652
+ # Run a new detection if enough time has passed
653
+ if time_since_last_inference >= inference_interval:
654
+ detections, inference_time = detector.detect(frame)
655
+
656
+ # Update cache
657
+ st.session_state.last_frame = frame.copy()
658
+ st.session_state.last_detections = detections
659
+ st.session_state.last_inference_time = inference_time
660
+ st.session_state.last_inference_timestamp = current_time
661
+ else:
662
+ # Use cached detections
663
+ detections = st.session_state.last_detections
664
+ inference_time = st.session_state.last_inference_time
665
+
666
+ # Draw detections on the frame
667
+ frame_with_detections = draw_detections(frame, detections)
668
+
669
+ # Calculate inference FPS
670
+ if time_since_last_inference > 0:
671
+ inference_fps = 1.0 / time_since_last_inference
672
+ else:
673
+ inference_fps = 0.0
674
+
675
+ # Add performance stats to the frame
676
+ frame_with_stats = add_performance_stats(
677
+ frame_with_detections,
678
+ fps,
679
+ inference_time,
680
+ len(detections),
681
+ inference_fps
682
+ )
683
+
684
+ # Display the frame
685
+ st.session_state.frame_placeholder.image(
686
+ frame_with_stats,
687
+ channels="BGR",
688
+ use_column_width=True
689
+ )
690
+
691
+ # Update stats
692
+ fps_text.metric("FPS", f"{fps:.1f}")
693
+ inference_text.metric("Inference Time", f"{inference_time*1000:.1f} ms")
694
+ people_count.metric("People Detected", len(detections))
695
+ inference_fps_text.metric("Inference FPS", f"{inference_fps:.1f}")
696
+
697
+ # Increment frame counter
698
+ st.session_state.frame_count += 1
699
+
700
+
701
+ def main():
702
+ """Main entry point for the application."""
703
+ app = PeopleDetectionApp()
704
+ fps_text, inference_text, people_count, inference_fps_text = app.create_ui()
705
+
706
+ # Infinite loop for updating the video frame
707
+ while st.session_state.is_running:
708
+ app.update_frame(fps_text, inference_text, people_count, inference_fps_text)
709
+ time.sleep(0.01) # Small delay to prevent overloading the CPU
710
+
711
+
712
+ if __name__ == "__main__":
713
+ main()
assets/one-by-one-person-detection.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5964aa259099a482a8b360ffc2c57b5a30f84d5919236a4dad01f8e929ac07c
3
+ size 3291918
assets/people-detection.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18ffe8672d741e3e29c9d891d22c59d453720b086c25b35c88b393d55f92f693
3
+ size 5482579
assets/store-aisle-detection.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3526fee39cac70d6366471e4324a6d63b337c8f8ab99c015521bee0d60ed6e04
3
+ size 9214573
packages.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ libgl1-mesa-glx
2
+ libglib2.0-0
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ opencv-python-headless>=4.8.0
2
+ numpy>=1.24.0
3
+ streamlit>=1.25.0
4
+ torch>=2.0.0
5
+ torchvision>=0.15.0
6
+ pillow>=10.0.0
7
+ ultralytics>=8.0.0
yolov8n.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f59b3d833e2ff32e194b5bb8e08d211dc7c5bdf144b90d2c8412c47ccfc83b36
3
+ size 6549796