Segment Anything | Meta AI — Best Image Analysis AI Tool (April 2026)

About Segment Anything | Meta AI

Segment Anything | Meta AI — Segment Anything (SAM) is a promptable segmentation system developed by Meta AI that offers zero-shot generalization to unfamiliar objects and images without requiring additional training. It allows users to "cut out" any object in any image with a single click. SAM uses a variety of input prompts to perform a wide range of segmentation tasks. The model was trained on millions of images and masks collected through a model-in-the-loop "data engine."

Top use cases

Cutting out objects in images with a single click
Tracking object masks in videos
Enabling image editing applications
Lifting object masks to 3D
Creative tasks like collaging
Text-to-object segmentation

Built for

AI researchersComputer vision engineersImage processing specialistsAR/VR developersRobotics engineers

Key features

Promptable segmentation with zero-shot generalization
Interactive point and box prompts
Automatic segmentation of entire images
Integration with other AI systems
Extensible outputs for use in other applications

Pros & cons

Pros

Zero-shot generalization to unfamiliar objects and images
Flexible promptable design
Efficient model design for web-browser use
Large dataset for training (SA-1B)
Integration with other AI systems

Cons

Currently only supports images or individual video frames
Requires a GPU for efficient image encoder inference
Does not produce mask labels, only object masks
Text prompts are explored in the paper but not released

Frequently asked questions

What type of prompts are supported?

Foreground/background points, bounding box, and mask prompts are supported. Text prompts are explored in our paper but the capability is not released.

What is the structure of the model?

A ViT-H image encoder that runs once per image and outputs an image embedding. A prompt encoder that embeds input prompts such as clicks or boxes. A lightweight transformer based mask decoder that predicts object masks from the image embedding and prompt embeddings.

What data was the model trained on?

The model was trained on our SA-1B dataset. See our dataset viewer.

Does the model produce mask labels?

No, the model predicts object masks only and does not generate labels.

Does the model work on videos?

Currently the model only supports images or individual frames from videos.

Browse all