Our new model ZOO works with DepthAI V3. Find out more in our documentation.
1+ Likes
Model Details
Model Description
The MediaPipe Selfie Segmentation model lets you segment the portrait of a person, and can be used for replacing or modifying the background in an image. The model outputs two categories, background at index 0 and person at index 1.
Developed by: Google
Shared by:
Model type: Segmentation model
License:
Resources for more information:
Training Details
Training Data
The majority of dataset images were captured on a diverse set of front and back-facing smartphone cameras. These images were captured in a real-world environment with different light, noise, and motion conditions via an AR (Augmented Reality) application.
Testing Details
Metrics
The performance of the model is evaluated by computing the ratio of the intersection of the predicted mask with the ground truth mask, and their union for the person class. Typical errors occur along the boundary of the true segmentation mask and may move it by a few pixels or lose thin features.
The evaluation dataset consists of 1594 images, 100 images from each of 17 geographical subregions (except 2 subregions Melanesia + Micronesia + Polynesia, and Middle Africa).
Results are taken from .
Region
IOU (%) with 95% confidence interval
Western Africa (worst)
94.71 +/- 1.57%
Eastern Asia (best)
97.27 +/- 0.49%
Average
95.99 +/- 0.87%
Technical Specifications
Input/Output Details
Input:
Name: input
Info: NCHW BGR un-normalized image
Output:
Name: output
Info: Class of the segmented object: 1 - person, 0 - background.
Model Architecture
It is a Convolutional Neural Network based on a MobileNetV3-like structure with custom decoder blocks to achieve real-time performance in segmenting prominent human figures in a scene.
Please consult the for more information on model architecture.
Throughput
Model variant: mediapipe-selfie-segmentation:256x144
* Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).
* Parameters and FLOPs are obtained from the package.
Quantization
RVC4 version of the model was quantized using a custom dataset.
This was created by taking 40-image subset of dataset.
Utilization
Models converted for RVC Platforms can be used for inference on OAK devices.
DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)).
Below, we present the most crucial utilization steps for the particular model.
Please consult the docs for more information.
SegmentationParser that outputs message (segmentation mask for 2 classes - background and foreground).
Get parsed output(s):
while pipeline.isRuning():
parser_output: dai.ImgFrame = parser_output_queue.get()
Example
You can quickly run the model using our script.
It automatically downloads the model, creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool.
To try it out, run: