# Video Object Segmentation with SAM2 A web application that allows you to upload videos, click on objects, and segment them out using Meta's SAM2 (Segment Anything Model 2) AI model. ## Features - 📤 Upload video files (MP4, AVI, MOV, MKV) - 🖼️ Preview first frame of the video - 🎯 Click on objects to select them for segmentation - ✂️ AI-powered object segmentation using SAM2 - 🎥 Download segmented video results - 🎨 Beautiful, responsive user interface ## Requirements - **Python 3.8-3.12** (tested and compatible) - PyTorch 2.2.0+ with CUDA (recommended for GPU acceleration) - Flask - OpenCV - NumPy - Segment Anything Model 2 **Python Version Compatibility:** - ✅ Python 3.8, 3.9, 3.10, 3.11, 3.12 all supported - 🔄 Automatic torch version selection based on Python version - 💡 Python 3.12 users: Use the updated requirements (torch 2.2.0+) ## Why use uv? We recommend using **uv** for this project because: ✅ **Faster dependency resolution**: uv is significantly faster than pip ✅ **Better virtual environment management**: Cleaner and more reliable venvs ✅ **Deterministic builds**: More consistent dependency resolution ✅ **Modern Python tooling**: Built with Rust for performance ✅ **Better compatibility**: Handles complex dependency trees better If you're working on Python projects, uv is a great modern alternative to pip + virtualenv! ## Installation ### 1. Clone the repository ```bash git clone https://github.com/yourusername/video-segmentation-sam2.git cd video-segmentation-sam2 ``` ### 2. Install dependencies (using uv - recommended) First, install uv (a fast Python package installer and virtual environment manager): ```bash # Install uv pip install uv # Create virtual environment and install dependencies uv venv .venv source .venv/bin/activate # On Windows: .\.venv\Scripts\activate uv pip install -r requirements.txt ``` ### 3. Install SAM2 manually SAM2 needs to be installed manually from GitHub: ```bash # Clone the segment-anything repository git clone https://github.com/facebookresearch/segment-anything.git cd segment-anything # Install SAM2 in development mode pip install -e . # Download the model checkpoint (ViT-B recommended) wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth mv sam_vit_b_01ec64.pth .. cd .. # Clean up rm -rf segment-anything ``` This will: - Create a virtual environment using uv - Install all Python dependencies - Install SAM2 from source - Download the ViT-B model checkpoint - Set up necessary directories ### 3. Alternative: Standard pip installation If you prefer not to use uv: ```bash pip install -r requirements.txt python setup.py ``` ### 3. Download SAM2 model weights The application now uses **ViT-B** (smaller, faster model) by default. You need the file `sam_vit_b_01ec64.pth` in the root directory. Download from: [https://github.com/facebookresearch/segment-anything](https://github.com/facebookresearch/segment-anything) Or use the following command: ```bash wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth ``` **Model Options:** - `vit_b` (default): Fastest, good for testing - `sam_vit_b_01ec64.pth` - `vit_l`: Medium size/performance - `sam_vit_l_0b3195.pth` - `vit_h`: Best accuracy, largest - `sam_vit_h_4b8939.pth` You can change the model by modifying `SAM_MODEL_SIZE` in `app.py`. ### 4. Run the application ```bash python app.py ``` The application will start on `http://localhost:5000` ## Usage 1. **Upload a video**: Click the "Select Video File" button and choose a video file 2. **Select object**: Click on the object you want to segment in the preview image 3. **Add more points**: Click additional points to help the AI better understand the object 4. **Segment**: Click "Segment Object" to start the segmentation process 5. **Download**: Once processing is complete, preview and download your segmented video ## Configuration You can configure the application by editing the `.env` file: ```env FLASK_ENV=development UPLOAD_FOLDER=uploads SEGMENTED_FOLDER=segmented ALLOWED_EXTENSIONS=.mp4,.avi,.mov,.mkv ``` ## Technical Details ### Backend - **Flask**: Web framework - **SAM2**: Segment Anything Model 2 for object segmentation - **OpenCV**: Video processing and frame manipulation - **PyTorch**: Deep learning framework for running SAM2 ### Frontend - **HTML5/CSS3**: Responsive user interface - **JavaScript**: Interactive point selection and AJAX requests - **Base64 encoding**: For preview image transfer ### Processing Pipeline 1. Video upload and first frame extraction 2. User selects points on the object to segment 3. SAM2 processes each frame with the selected points 4. Masks are applied to each frame 5. Processed frames are combined into a new video ## Performance Considerations - **GPU recommended**: SAM2 runs much faster with CUDA-enabled GPU - **Video length**: Longer videos will take more time to process - **Resolution**: Higher resolution videos require more processing power - **Points selection**: More points can help with complex objects but may slow down processing ## Troubleshooting ### Common Issues **Issue: SAM2 model not found** - Solution: Download the model checkpoint and place it in the root directory **Issue: CUDA out of memory** - Solution: Reduce video resolution or use smaller batch sizes **Issue: Slow processing on CPU** - Solution: Use a machine with GPU or reduce video resolution **Issue: Video format not supported** - Solution: Convert your video to MP4 format ## License This project is licensed under the MIT License. The SAM2 model is provided by Meta Research under its own license. ## Acknowledgements - Meta Research for the Segment Anything Model - Flask team for the web framework - OpenCV team for computer vision tools ## Future Improvements - [ ] Add support for multiple object segmentation - [ ] Implement background removal options - [ ] Add video trimming functionality - [ ] Support for real-time preview - [ ] Batch processing of multiple videos - [ ] Advanced segmentation parameters (threshold, etc.) - [ ] Cloud deployment options