metadata
title: SmolVLM2 Video Highlights
emoji: π¬
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
π¬ SmolVLM2 HuggingFace Segment-Based Video Highlights API
Generate intelligent video highlights using HuggingFace's segment-based approach
This is a FastAPI service that uses HuggingFace's proven segment-based classification method with SmolVLM2-256M-Video-Instruct for reliable, consistent highlight generation.
π Features
- Segment-Based Analysis: Processes videos in fixed 5-second segments for consistent AI classification
- Dual Criteria Generation: Creates two different highlight criteria sets and selects the most selective one
- SmolVLM2-256M-Video-Instruct: Faster processing with specialized video understanding
- Visual Effects: Optional fade transitions between segments for professional-quality output
- REST API: Upload videos and download processed highlights with job tracking
- Background Processing: Non-blocking video processing with real-time status updates
π API Endpoints
POST /upload-video- Upload video for processingGET /job-status/{job_id}- Check processing statusGET /download/{filename}- Download generated highlightsGET /docs- Interactive API documentation
π± Usage
Via API
# Upload video with optional parameters
curl -X POST \
-F "video=@your_video.mp4" \
-F "segment_length=5.0" \
-F "model_name=HuggingFaceTB/SmolVLM2-256M-Video-Instruct" \
-F "with_effects=true" \
https://your-space-url.hf.space/upload-video
# Check processing status
curl https://your-space-url.hf.space/job-status/YOUR_JOB_ID
# Download highlights and analysis
curl -O https://your-space-url.hf.space/download/HIGHLIGHTS.mp4
curl -O https://your-space-url.hf.space/download/ANALYSIS.json
Via Android App
Use the provided Android client code to integrate with your mobile app.
βοΈ Configuration
Default settings:
- Segment Length: 5 seconds (fixed segments for consistent classification)
- Model: SmolVLM2-256M-Video-Instruct (faster processing)
- Effects: Enabled (fade transitions between segments)
- Dual Criteria: Two prompt variations for robust selection
π οΈ Technology Stack
- SmolVLM2-256M-Video-Instruct: Efficient vision-language model optimized for video understanding
- HuggingFace Transformers: Latest transformer models and inference
- FastAPI: Modern web framework for APIs
- FFmpeg: Video processing with advanced filter support
- PyTorch: Deep learning framework with device optimization
π― Perfect For
- Social media content creators
- Educational video processing
- Meeting/lecture summarization
- Sports highlight generation
- Entertainment content curation
οΏ½οΏ½ License
Apache 2.0 - Free for commercial and personal use
π€ Contributing
Built with β€οΈ using Hugging Face Transformers and open-source AI models.