X

Name	Version	Available For	Created At	Deploy
		RVC2, RVC3, RVC4	Over 1 year ago

Model Details

Model Description

SCRFD is an fast efficient high accuracy face-detection model that can detect faces of different scales. For this reason, it is suitable for detecting faces near the camera or extremely away from it. SCRFD performs well in crowded scenes and challenging lighting conditions. Besides face detection, it also detects 5 keypoints on every face (2 for the eyes, 1 for the nose, and 2 for each end of the mouth).

Developed by: InsightFace
Shared by:
Model type: Computer Vision
License:
Resources for more information:

Training Details

Training Data

The model was trained on . It is split into training, validation and test sets. WIDERFace is a face detection benchmark dataset, of which images are selected from the publicly available . They choose 32,203 images and label 393,703 faces with a high degree of variability in scale, pose, and occlusion as depicted in the sample images. WIDERFace dataset is organized based on 61 event classes. For each event class, they randomly select 40%/10%/50% data as training, validation, and testing sets. Based on the detection rate of EdgeBox (Zitnick & Dolla ́r, 2014), three levels of difficulty (i.e. Easy, Medium and Hard) are defined by incrementally incorporating hard samples.

Testing Details

Metrics

Evaluation of the model on WIDERFace dataset for all three categories: Easy, Medium and Hard. Results are taken from .

Category	mAP
Easy	95.16
Medium	93.87
Hard	83.05

Technical Specifications

Input/Output Details

Input:
- Name: input.1
  - Info: NCHW BGR un-normalized image
Output:
- Name: Multiple (please consult NN archive config.json)
  - Info: Classification scores, bounding boxes, and keypoints for a multitude of detections.

Model Architecture

Backbone: ResNet backbone
Neck: Path Aggregation Feature Pyramid Network (PAFPN) neck
Head: Simple head consisting of stacked 3 × 3 convolutional layers

Please consult the for more information on model architecture.

Throughput

Model variant: scrfd-face-detection:10g-640x640

• Input shape: [1, 3, 640, 640] • Output shapes:

[[1, 12800, 1], [1, 3200, 1], [1, 800, 1], [1, 12800, 4], [1, 3200, 4], [1, 800, 4], [1, 12800, 10], [1, 3200, 10], [1, 800, 10]]

• Params (M): 4.226 • GFLOPs: 13.414

Platform	Precision	Throughput (infs/sec)	Power Consumption (W)
RVC2	FP16	11.38	N/A
RVC4	INT8	347.03	4.50

* Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).

* Parameters and FLOPs are obtained from the package.

Quantization

RVC4 version of the model was quantized using a custom dataset. This was created by taking a 40-image subset of the dataset.

Utilization

Models converted for RVC Platforms can be used for inference on OAK devices. DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)). Below, we present the most crucial utilization steps for the particular model. Please consult the docs for more information.

Install DAIv3 and depthai-nodes libraries:

pip install depthai
pip install depthai-nodes

Define model:

model_description = dai.NNModelDescription(
    "luxonis/scrfd-face-detection:10g-640x640"
)

nn = pipeline.create(ParsingNeuralNetwork).build(
    <CameraNode>, model_description
)

Inspect model head(s):

SCRFDParser that outputs message (bounding boxes and confidence scores for every detected face).

Get parsed output(s):

while pipeline.isRuning():
    parser_output: ImgDetectionsExtended = parser_output_queue.get()

Example

You can quickly run the model using our script. It automatically downloads the model, creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool. To try it out, run:

python3 main.py \
    --model luxonis/scrfd-face-detection:10g-640x640

Face detection model.
License	Apache 2.0 Commercial use
Downloads	3618
Tasks	Object Detection
Model Types	ONNX