2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00
2025-12-11 23:26:21 +01:00

Video Object Segmentation with SAM2

A web application that allows you to upload videos, click on objects, and segment them out using Meta's SAM2 (Segment Anything Model 2) AI model.

Features

  • 📤 Upload video files (MP4, AVI, MOV, MKV)
  • 🖼️ Preview first frame of the video
  • 🎯 Click on objects to select them for segmentation
  • ✂️ AI-powered object segmentation using SAM2
  • 🎥 Download segmented video results
  • 🎨 Beautiful, responsive user interface

Requirements

  • Python 3.8-3.12 (tested and compatible)
  • PyTorch 2.2.0+ with CUDA (recommended for GPU acceleration)
  • Flask
  • OpenCV
  • NumPy
  • Segment Anything Model 2

Python Version Compatibility:

  • Python 3.8, 3.9, 3.10, 3.11, 3.12 all supported
  • 🔄 Automatic torch version selection based on Python version
  • 💡 Python 3.12 users: Use the updated requirements (torch 2.2.0+)

Why use uv?

We recommend using uv for this project because:

Faster dependency resolution: uv is significantly faster than pip Better virtual environment management: Cleaner and more reliable venvs Deterministic builds: More consistent dependency resolution Modern Python tooling: Built with Rust for performance Better compatibility: Handles complex dependency trees better

If you're working on Python projects, uv is a great modern alternative to pip + virtualenv!

Installation

1. Clone the repository

git clone https://github.com/yourusername/video-segmentation-sam2.git
cd video-segmentation-sam2

First, install uv (a fast Python package installer and virtual environment manager):

# Install uv
pip install uv

# Create virtual environment and install dependencies
uv venv .venv
source .venv/bin/activate  # On Windows: .\.venv\Scripts\activate
uv pip install -r requirements.txt

3. Install SAM2 manually

SAM2 needs to be installed manually from GitHub:

# Clone the segment-anything repository
git clone https://github.com/facebookresearch/segment-anything.git
cd segment-anything

# Install SAM2 in development mode
pip install -e .

# Download the model checkpoint (ViT-B recommended)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
mv sam_vit_b_01ec64.pth ..
cd ..

# Clean up
rm -rf segment-anything

This will:

  • Create a virtual environment using uv
  • Install all Python dependencies
  • Install SAM2 from source
  • Download the ViT-B model checkpoint
  • Set up necessary directories

3. Alternative: Standard pip installation

If you prefer not to use uv:

pip install -r requirements.txt
python setup.py

3. Download SAM2 model weights

The application now uses ViT-B (smaller, faster model) by default. You need the file sam_vit_b_01ec64.pth in the root directory.

Download from: https://github.com/facebookresearch/segment-anything

Or use the following command:

wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth

Model Options:

  • vit_b (default): Fastest, good for testing - sam_vit_b_01ec64.pth
  • vit_l: Medium size/performance - sam_vit_l_0b3195.pth
  • vit_h: Best accuracy, largest - sam_vit_h_4b8939.pth

You can change the model by modifying SAM_MODEL_SIZE in app.py.

4. Run the application

python app.py

The application will start on http://localhost:5000

Usage

  1. Upload a video: Click the "Select Video File" button and choose a video file
  2. Select object: Click on the object you want to segment in the preview image
  3. Add more points: Click additional points to help the AI better understand the object
  4. Segment: Click "Segment Object" to start the segmentation process
  5. Download: Once processing is complete, preview and download your segmented video

Configuration

You can configure the application by editing the .env file:

FLASK_ENV=development
UPLOAD_FOLDER=uploads
SEGMENTED_FOLDER=segmented
ALLOWED_EXTENSIONS=.mp4,.avi,.mov,.mkv

Technical Details

Backend

  • Flask: Web framework
  • SAM2: Segment Anything Model 2 for object segmentation
  • OpenCV: Video processing and frame manipulation
  • PyTorch: Deep learning framework for running SAM2

Frontend

  • HTML5/CSS3: Responsive user interface
  • JavaScript: Interactive point selection and AJAX requests
  • Base64 encoding: For preview image transfer

Processing Pipeline

  1. Video upload and first frame extraction
  2. User selects points on the object to segment
  3. SAM2 processes each frame with the selected points
  4. Masks are applied to each frame
  5. Processed frames are combined into a new video

Performance Considerations

  • GPU recommended: SAM2 runs much faster with CUDA-enabled GPU
  • Video length: Longer videos will take more time to process
  • Resolution: Higher resolution videos require more processing power
  • Points selection: More points can help with complex objects but may slow down processing

Troubleshooting

Common Issues

Issue: SAM2 model not found

  • Solution: Download the model checkpoint and place it in the root directory

Issue: CUDA out of memory

  • Solution: Reduce video resolution or use smaller batch sizes

Issue: Slow processing on CPU

  • Solution: Use a machine with GPU or reduce video resolution

Issue: Video format not supported

  • Solution: Convert your video to MP4 format

License

This project is licensed under the MIT License. The SAM2 model is provided by Meta Research under its own license.

Acknowledgements

  • Meta Research for the Segment Anything Model
  • Flask team for the web framework
  • OpenCV team for computer vision tools

Future Improvements

  • Add support for multiple object segmentation
  • Implement background removal options
  • Add video trimming functionality
  • Support for real-time preview
  • Batch processing of multiple videos
  • Advanced segmentation parameters (threshold, etc.)
  • Cloud deployment options
Description
No description provided
Readme 332 MiB
Languages
HTML 58.6%
Python 41.4%