Auto labeling with Segment Anything Model
Segment Anything Model (opens in a new tab) is new segmentation model from Meta. Trained with 11M images and 1B segmentation masks, it can segment objects in the image without being trained on specific objects. For this reason, Segment Anything is a good candidate for auto labeling, even for new objects.
- Select Brain Button on the left side to activate auto labeling.
- Select one of the Segment Anything Models from the dropdown menu Model. The model accuracy and speed are different depending on the model. The Segment Anything Model (ViT-B) is the fastest and the least accurate. The Segment Anything Model (ViT-H) is the slowest and the most accurate. Quant indicates the quantization of the model.
- Use Auto segmentation marking tools to mark the object.
- +Point: Add a point that belongs to the object.
- -Point: Remove a point that you want to exclude from the object.
- +Rect: Draw a rectangle that contains the object. Segment Anything will automatically segment the object.
- Clear: Clear all auto segmentation markings.
- Finish Object (f): Finish the current marking. After finishing the object, you can enter the label name and save the object.
- In the first time of running any model, AnyLabeling needs to download the model from the server. Therefore, it may takes a while depending on the network speed.
- The first AI inference also takes time. Please be patient.
- A background task is running to calculate "encoder" for Segment Anything Model. Therefore, it may take a shorter time for auto segmentation in the next images.
Segment Anything Model is divided into two parts: a heavy Encoder and a light-weight Decoder. The Encoder extracts image embedding from an input image. Based on the embedding, and input prompt (points, box, masks), the Decoder produces output mask(s). The decoder can run in single or multiple-mask mode.
Segment Anything in AnyLabeling
In the web demo, Meta runs the Encoder in their server, and the Decoder can be run in real time in the users' browser, where they can input points, and boxes and receive the output immediately. In AnyLabeling, we also run Encoder only once for each image. After that, based on the changes in the prompt from the user (points, boxes), the Decoder is run to produce an output mask. We added Postprocessing step to find the contours and produce shapes (polygons, rectangles, etc.) for labeling.
To reduce the dependencies, instead of using segment_anything (opens in a new tab) package from Meta, we rewrote the code to use ONNX Runtime and NumPy only. The source code for running ONNX model can be found here (opens in a new tab) and the ONNX models were uploaded to AnyLabeling Assets (opens in a new tab).