Video Object Segmentation with SAM2
A web application that allows you to upload videos, click on objects, and segment them out using Meta's SAM2 (Segment Anything Model 2) AI model.
Features
- 📤 Upload video files (MP4, AVI, MOV, MKV)
- 🖼️ Preview first frame of the video
- 🎯 Click on objects to select them for segmentation
- ✂️ AI-powered object segmentation using SAM2
- 🎥 Download segmented video results
- 🎨 Beautiful, responsive user interface
Requirements
- Python 3.8-3.12 (tested and compatible)
- PyTorch 2.2.0+ with CUDA (recommended for GPU acceleration)
- Flask
- OpenCV
- NumPy
- Segment Anything Model 2
Python Version Compatibility:
- ✅ Python 3.8, 3.9, 3.10, 3.11, 3.12 all supported
- 🔄 Automatic torch version selection based on Python version
- 💡 Python 3.12 users: Use the updated requirements (torch 2.2.0+)
Why use uv?
We recommend using uv for this project because:
✅ Faster dependency resolution: uv is significantly faster than pip ✅ Better virtual environment management: Cleaner and more reliable venvs ✅ Deterministic builds: More consistent dependency resolution ✅ Modern Python tooling: Built with Rust for performance ✅ Better compatibility: Handles complex dependency trees better
If you're working on Python projects, uv is a great modern alternative to pip + virtualenv!
Installation
1. Clone the repository
git clone https://github.com/yourusername/video-segmentation-sam2.git
cd video-segmentation-sam2
2. Install dependencies (using uv - recommended)
First, install uv (a fast Python package installer and virtual environment manager):
# Install uv
pip install uv
# Create virtual environment and install dependencies
uv venv .venv
source .venv/bin/activate # On Windows: .\.venv\Scripts\activate
uv pip install -r requirements.txt
3. Install SAM2 manually
SAM2 needs to be installed manually from GitHub:
# Clone the segment-anything repository
git clone https://github.com/facebookresearch/segment-anything.git
cd segment-anything
# Install SAM2 in development mode
pip install -e .
# Download the model checkpoint (ViT-B recommended)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
mv sam_vit_b_01ec64.pth ..
cd ..
# Clean up
rm -rf segment-anything
This will:
- Create a virtual environment using uv
- Install all Python dependencies
- Install SAM2 from source
- Download the ViT-B model checkpoint
- Set up necessary directories
3. Alternative: Standard pip installation
If you prefer not to use uv:
pip install -r requirements.txt
python setup.py
3. Download SAM2 model weights
The application now uses ViT-B (smaller, faster model) by default. You need the file sam_vit_b_01ec64.pth in the root directory.
Download from: https://github.com/facebookresearch/segment-anything
Or use the following command:
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
Model Options:
vit_b(default): Fastest, good for testing -sam_vit_b_01ec64.pthvit_l: Medium size/performance -sam_vit_l_0b3195.pthvit_h: Best accuracy, largest -sam_vit_h_4b8939.pth
You can change the model by modifying SAM_MODEL_SIZE in app.py.
4. Run the application
python app.py
The application will start on http://localhost:5000
Usage
- Upload a video: Click the "Select Video File" button and choose a video file
- Select object: Click on the object you want to segment in the preview image
- Add more points: Click additional points to help the AI better understand the object
- Segment: Click "Segment Object" to start the segmentation process
- Download: Once processing is complete, preview and download your segmented video
Configuration
You can configure the application by editing the .env file:
FLASK_ENV=development
UPLOAD_FOLDER=uploads
SEGMENTED_FOLDER=segmented
ALLOWED_EXTENSIONS=.mp4,.avi,.mov,.mkv
Technical Details
Backend
- Flask: Web framework
- SAM2: Segment Anything Model 2 for object segmentation
- OpenCV: Video processing and frame manipulation
- PyTorch: Deep learning framework for running SAM2
Frontend
- HTML5/CSS3: Responsive user interface
- JavaScript: Interactive point selection and AJAX requests
- Base64 encoding: For preview image transfer
Processing Pipeline
- Video upload and first frame extraction
- User selects points on the object to segment
- SAM2 processes each frame with the selected points
- Masks are applied to each frame
- Processed frames are combined into a new video
Performance Considerations
- GPU recommended: SAM2 runs much faster with CUDA-enabled GPU
- Video length: Longer videos will take more time to process
- Resolution: Higher resolution videos require more processing power
- Points selection: More points can help with complex objects but may slow down processing
Troubleshooting
Common Issues
Issue: SAM2 model not found
- Solution: Download the model checkpoint and place it in the root directory
Issue: CUDA out of memory
- Solution: Reduce video resolution or use smaller batch sizes
Issue: Slow processing on CPU
- Solution: Use a machine with GPU or reduce video resolution
Issue: Video format not supported
- Solution: Convert your video to MP4 format
License
This project is licensed under the MIT License. The SAM2 model is provided by Meta Research under its own license.
Acknowledgements
- Meta Research for the Segment Anything Model
- Flask team for the web framework
- OpenCV team for computer vision tools
Future Improvements
- Add support for multiple object segmentation
- Implement background removal options
- Add video trimming functionality
- Support for real-time preview
- Batch processing of multiple videos
- Advanced segmentation parameters (threshold, etc.)
- Cloud deployment options