About Segment Anything | Meta AI
Segment Anything | Meta AI — Segment Anything (SAM) is a promptable segmentation system developed by Meta AI that offers zero-shot generalization to unfamiliar objects and images without requiring additional training. It allows users to "cut out" any object in any image with a single click. SAM uses a variety of input prompts to perform a wide range of segmentation tasks. The model was trained on millions of images and masks collected through a model-in-the-loop "data engine."
Top use cases
- Cutting out objects in images with a single click
- Tracking object masks in videos
- Enabling image editing applications
- Lifting object masks to 3D
- Creative tasks like collaging
- Text-to-object segmentation
Built for
Key features
- Promptable segmentation with zero-shot generalization
- Interactive point and box prompts
- Automatic segmentation of entire images
- Integration with other AI systems
- Extensible outputs for use in other applications
Pros & cons
Pros
- Zero-shot generalization to unfamiliar objects and images
- Flexible promptable design
- Efficient model design for web-browser use
- Large dataset for training (SA-1B)
- Integration with other AI systems
Cons
- Currently only supports images or individual video frames
- Requires a GPU for efficient image encoder inference
- Does not produce mask labels, only object masks
- Text prompts are explored in the paper but not released
Company information
- Segment Anything | Meta AI Support Email & Customer service contact & Refund contact etc. Here is the Segment Anything | Meta AI support email for customer service: [email protected] .
- Segment Anything | Meta AI Company Segment Anything | Meta AI Company name: Meta .
- Segment Anything | Meta AI Github Segment Anything | Meta AI Github Link: https://github.com/facebookresearch/segment-anything
Frequently asked questions
What type of prompts are supported?
Foreground/background points, bounding box, and mask prompts are supported. Text prompts are explored in our paper but the capability is not released.
What is the structure of the model?
A ViT-H image encoder that runs once per image and outputs an image embedding. A prompt encoder that embeds input prompts such as clicks or boxes. A lightweight transformer based mask decoder that predicts object masks from the image embedding and prompt embeddings.
What data was the model trained on?
The model was trained on our SA-1B dataset. See our dataset viewer.
Does the model produce mask labels?
No, the model predicts object masks only and does not generate labels.
Does the model work on videos?
Currently the model only supports images or individual frames from videos.
Related tools

A unified platform for data, AI, CRM, development, and security.


A platform to compare AI coding models and generate multi-file apps side-by-side.


AI community platform for open-source ML models, datasets, and applications.

