210 lines
6.0 KiB
Markdown
210 lines
6.0 KiB
Markdown
# Video Object Segmentation with SAM2
|
|
|
|
A web application that allows you to upload videos, click on objects, and segment them out using Meta's SAM2 (Segment Anything Model 2) AI model.
|
|
|
|
## Features
|
|
|
|
- 📤 Upload video files (MP4, AVI, MOV, MKV)
|
|
- 🖼️ Preview first frame of the video
|
|
- 🎯 Click on objects to select them for segmentation
|
|
- ✂️ AI-powered object segmentation using SAM2
|
|
- 🎥 Download segmented video results
|
|
- 🎨 Beautiful, responsive user interface
|
|
|
|
## Requirements
|
|
|
|
- **Python 3.8-3.12** (tested and compatible)
|
|
- PyTorch 2.2.0+ with CUDA (recommended for GPU acceleration)
|
|
- Flask
|
|
- OpenCV
|
|
- NumPy
|
|
- Segment Anything Model 2
|
|
|
|
**Python Version Compatibility:**
|
|
- ✅ Python 3.8, 3.9, 3.10, 3.11, 3.12 all supported
|
|
- 🔄 Automatic torch version selection based on Python version
|
|
- 💡 Python 3.12 users: Use the updated requirements (torch 2.2.0+)
|
|
|
|
## Why use uv?
|
|
|
|
We recommend using **uv** for this project because:
|
|
|
|
✅ **Faster dependency resolution**: uv is significantly faster than pip
|
|
✅ **Better virtual environment management**: Cleaner and more reliable venvs
|
|
✅ **Deterministic builds**: More consistent dependency resolution
|
|
✅ **Modern Python tooling**: Built with Rust for performance
|
|
✅ **Better compatibility**: Handles complex dependency trees better
|
|
|
|
If you're working on Python projects, uv is a great modern alternative to pip + virtualenv!
|
|
|
|
## Installation
|
|
|
|
### 1. Clone the repository
|
|
|
|
```bash
|
|
git clone https://github.com/yourusername/video-segmentation-sam2.git
|
|
cd video-segmentation-sam2
|
|
```
|
|
|
|
### 2. Install dependencies (using uv - recommended)
|
|
|
|
First, install uv (a fast Python package installer and virtual environment manager):
|
|
|
|
```bash
|
|
# Install uv
|
|
pip install uv
|
|
|
|
# Create virtual environment and install dependencies
|
|
uv venv .venv
|
|
source .venv/bin/activate # On Windows: .\.venv\Scripts\activate
|
|
uv pip install -r requirements.txt
|
|
```
|
|
|
|
### 3. Install SAM2 manually
|
|
|
|
SAM2 needs to be installed manually from GitHub:
|
|
|
|
```bash
|
|
# Clone the segment-anything repository
|
|
git clone https://github.com/facebookresearch/segment-anything.git
|
|
cd segment-anything
|
|
|
|
# Install SAM2 in development mode
|
|
pip install -e .
|
|
|
|
# Download the model checkpoint (ViT-B recommended)
|
|
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
|
|
mv sam_vit_b_01ec64.pth ..
|
|
cd ..
|
|
|
|
# Clean up
|
|
rm -rf segment-anything
|
|
```
|
|
|
|
This will:
|
|
- Create a virtual environment using uv
|
|
- Install all Python dependencies
|
|
- Install SAM2 from source
|
|
- Download the ViT-B model checkpoint
|
|
- Set up necessary directories
|
|
|
|
### 3. Alternative: Standard pip installation
|
|
|
|
If you prefer not to use uv:
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
python setup.py
|
|
```
|
|
|
|
### 3. Download SAM2 model weights
|
|
|
|
The application now uses **ViT-B** (smaller, faster model) by default. You need the file `sam_vit_b_01ec64.pth` in the root directory.
|
|
|
|
Download from: [https://github.com/facebookresearch/segment-anything](https://github.com/facebookresearch/segment-anything)
|
|
|
|
Or use the following command:
|
|
|
|
```bash
|
|
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
|
|
```
|
|
|
|
**Model Options:**
|
|
- `vit_b` (default): Fastest, good for testing - `sam_vit_b_01ec64.pth`
|
|
- `vit_l`: Medium size/performance - `sam_vit_l_0b3195.pth`
|
|
- `vit_h`: Best accuracy, largest - `sam_vit_h_4b8939.pth`
|
|
|
|
You can change the model by modifying `SAM_MODEL_SIZE` in `app.py`.
|
|
|
|
### 4. Run the application
|
|
|
|
```bash
|
|
python app.py
|
|
```
|
|
|
|
The application will start on `http://localhost:5000`
|
|
|
|
## Usage
|
|
|
|
1. **Upload a video**: Click the "Select Video File" button and choose a video file
|
|
2. **Select object**: Click on the object you want to segment in the preview image
|
|
3. **Add more points**: Click additional points to help the AI better understand the object
|
|
4. **Segment**: Click "Segment Object" to start the segmentation process
|
|
5. **Download**: Once processing is complete, preview and download your segmented video
|
|
|
|
## Configuration
|
|
|
|
You can configure the application by editing the `.env` file:
|
|
|
|
```env
|
|
FLASK_ENV=development
|
|
UPLOAD_FOLDER=uploads
|
|
SEGMENTED_FOLDER=segmented
|
|
ALLOWED_EXTENSIONS=.mp4,.avi,.mov,.mkv
|
|
```
|
|
|
|
## Technical Details
|
|
|
|
### Backend
|
|
|
|
- **Flask**: Web framework
|
|
- **SAM2**: Segment Anything Model 2 for object segmentation
|
|
- **OpenCV**: Video processing and frame manipulation
|
|
- **PyTorch**: Deep learning framework for running SAM2
|
|
|
|
### Frontend
|
|
|
|
- **HTML5/CSS3**: Responsive user interface
|
|
- **JavaScript**: Interactive point selection and AJAX requests
|
|
- **Base64 encoding**: For preview image transfer
|
|
|
|
### Processing Pipeline
|
|
|
|
1. Video upload and first frame extraction
|
|
2. User selects points on the object to segment
|
|
3. SAM2 processes each frame with the selected points
|
|
4. Masks are applied to each frame
|
|
5. Processed frames are combined into a new video
|
|
|
|
## Performance Considerations
|
|
|
|
- **GPU recommended**: SAM2 runs much faster with CUDA-enabled GPU
|
|
- **Video length**: Longer videos will take more time to process
|
|
- **Resolution**: Higher resolution videos require more processing power
|
|
- **Points selection**: More points can help with complex objects but may slow down processing
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Issue: SAM2 model not found**
|
|
- Solution: Download the model checkpoint and place it in the root directory
|
|
|
|
**Issue: CUDA out of memory**
|
|
- Solution: Reduce video resolution or use smaller batch sizes
|
|
|
|
**Issue: Slow processing on CPU**
|
|
- Solution: Use a machine with GPU or reduce video resolution
|
|
|
|
**Issue: Video format not supported**
|
|
- Solution: Convert your video to MP4 format
|
|
|
|
## License
|
|
|
|
This project is licensed under the MIT License. The SAM2 model is provided by Meta Research under its own license.
|
|
|
|
## Acknowledgements
|
|
|
|
- Meta Research for the Segment Anything Model
|
|
- Flask team for the web framework
|
|
- OpenCV team for computer vision tools
|
|
|
|
## Future Improvements
|
|
|
|
- [ ] Add support for multiple object segmentation
|
|
- [ ] Implement background removal options
|
|
- [ ] Add video trimming functionality
|
|
- [ ] Support for real-time preview
|
|
- [ ] Batch processing of multiple videos
|
|
- [ ] Advanced segmentation parameters (threshold, etc.)
|
|
- [ ] Cloud deployment options |