MediaPipe Selfie Segmentation

Our new model ZOO works with DepthAI V3. Find out more in our documentation.

1+ Likes

Model Details

Model Description

The MediaPipe Selfie Segmentation model lets you segment the portrait of a person, and can be used for replacing or modifying the background in an image. The model outputs two categories, background at index 0 and person at index 1.

Developed by: Google
Shared by:
Model type: Segmentation model
License:
Resources for more information:

Training Details

Training Data

The majority of dataset images were captured on a diverse set of front and back-facing smartphone cameras. These images were captured in a real-world environment with different light, noise, and motion conditions via an AR (Augmented Reality) application.

Testing Details

Metrics

The performance of the model is evaluated by computing the ratio of the intersection of the predicted mask with the ground truth mask, and their union for the person class. Typical errors occur along the boundary of the true segmentation mask and may move it by a few pixels or lose thin features. The evaluation dataset consists of 1594 images, 100 images from each of 17 geographical subregions (except 2 subregions Melanesia + Micronesia + Polynesia, and Middle Africa). Results are taken from .

Region	IOU (%) with 95% confidence interval
Western Africa (worst)	94.71 +/- 1.57%
Eastern Asia (best)	97.27 +/- 0.49%
Average	95.99 +/- 0.87%

Technical Specifications

Input/Output Details

Input:
- Name: input
  - Info: NCHW BGR un-normalized image
Output:
- Name: output
  - Info: Class of the segmented object: 1 - person, 0 - background.

Model Architecture

It is a Convolutional Neural Network based on a MobileNetV3-like structure with custom decoder blocks to achieve real-time performance in segmenting prominent human figures in a scene.

Please consult the for more information on model architecture.

Throughput

Model variant: mediapipe-selfie-segmentation:256x144

• Input shape: [1, 3, 144, 256] • Output shape: [1, 1, 144, 256]

• Params (M): 0.106 • GFLOPs: 0.041

Platform	Precision	Throughput (infs/sec)	Power Consumption (W)
RVC2	FP16	121.08	N/A
RVC4	INT8	672.99	2.47

* Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).

* Parameters and FLOPs are obtained from the package.

Quantization

RVC4 version of the model was quantized using a custom dataset. This was created by taking 40-image subset of dataset.

Utilization

Models converted for RVC Platforms can be used for inference on OAK devices. DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)). Below, we present the most crucial utilization steps for the particular model. Please consult the docs for more information.

Install DAIv3 and depthai-nodes libraries:

pip install depthai
pip install depthai-nodes

Define model:

model_description = dai.NNModelDescription(
    "luxonis/mediapipe-selfie-segmentation:256x144"
)

nn = pipeline.create(ParsingNeuralNetwork).build(
    <CameraNode>, model_description
)

Inspect model head(s):

SegmentationParser that outputs message (segmentation mask for 2 classes - background and foreground).

Get parsed output(s):

while pipeline.isRuning():
    parser_output: dai.ImgFrame = parser_output_queue.get()

Example

You can quickly run the model using our script. It automatically downloads the model, creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool. To try it out, run:

python3 main.py \
    --model luxonis/mediapipe-selfie-segmentation:256x144 \
    -overlay

Selfie Segmentation model.
License	Apache 2.0 Commercial use
Downloads	2295
Tasks	Semantic Segmentation
Model Types	ONNX

Model Variants

Name	Version	Available For	Created At	Deploy
		RVC2, RVC4	About 1 year ago