ArcFace | Luxonis

Our new model ZOO works with DepthAI V3. Find out more in our documentation.

0 Likes

Model Details

Model Description

The ArcFace model is a deep face recognition model with ResNet100 backbone and Additive Angular Margin Loss (ArcFace). ArcFace is a novel supervisor signal called additive angular margin which used as an additive term in the softmax loss to enhance the discriminative power of softmax loss.

Developed by: Jiankang Deng et al.
Shared by:
Model type: Computer Vision
License:
Resources for more information:

Training Details

Training Data

The model was trained using the MS1MV3 and IBUG500K datasets with a ResNet100 backbone, as these datasets were found to deliver the best performance on the evaluation benchmarks (LFW and YTF).

Training Datasets:
- The dataset (also called MS1M-RetinaFace) is a cleaned version of the MSCeleb1M dataset (); all its face images have been pre-processed by the Retina-Face detector () and are of size 112 × 112 pixels. It consists of 93K identities and 5.1 million Images/Videos
- The IBUG500K dataset proposed in the ArcFace consists of 493K identities and 11.96 million Images/Videos
Evaluation Datasets:
- The is a public benchmark for face verification, focusing on unconstrained face recognition. It contains over 13,000 face images collected from the web, each labeled with the individual's name. Among them, 1,680 people have two or more distinct photos.
- The is a public benchmark of face videos designed for studying the problem of unconstrained face recognition in videos. The dataset contains 3,425 videos of 1,595 different people. All the videos were downloaded from YouTube. An average of 2.15 videos are available for each subject. The shortest clip duration is 48 frames, the longest clip is 6,070 frames, and the average length of a video clip is 181.3 frames.

Testing Details

Metrics

These results showcase the performance of ArcFace on the LFW, and YTF benchmarks since they are the most widely used benchmark for unconstrained face verification on images and videos. The results are obtained from the .

Evaluation Dataset	Training Dataset	Recognition accuracy (%)
LFW	MS1MV3	99.83
LFW	IBUG500K	99.83
YTF	MS1MV3	98.02
YTF	IBUG500K	98.01

Technical Specifications

Input/Output Details

Input:
- Name: images
  - Info: NCHW, BGR un-normalized image
Output:
- Name: output
  - Info: NF, the output embeddings of the model

Model Architecture

The ArcFace is built around the Additive Angular Margin Loss to enhance class separability and discriminative power. It uses Sub-center ArcFace, where each class has multiple sub-centers to manage label noise by grouping clean and challenging samples separately. This improves model robustness in real-world conditions. Additionally, ArcFace can generate identity-preserved face images using gradients and Batch Normalization, without needing extra generative components. ArcFace delivers strong performance in both face recognition and face synthesis tasks.

Throughput

Model variant: arcface:lfw-112x112

• Input shape: [1, 3, 112, 112] • Output shape: [1, 512]

Platform	Precision	Throughput (infs/sec)	Power Consumption (W)
RVC2	FP16	7.61	N/A
RVC4	FP16	110.98	6.91

* Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).

* Parameters and FLOPs are obtained from the package.

Utilization

Models converted for RVC Platforms can be used for inference on OAK devices. DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)). Below, we present the most crucial utilization steps for the particular model. Please consult the docs for more information.

Install DAIv3 and depthai-nodes libraries:

pip install depthai
pip install depthai-nodes

Define model:

model_description = dai.NNModelDescription(
    "luxonis/arcface:lfw-112x112"
)

nn = pipeline.create(ParsingNeuralNetwork).build(
    <CameraNode>, model_description
)

Inspect model head(s):

EmbeddingsParser that outputs dai.NNData containing the output embeddings of the model.

Get parsed output(s):

while pipeline.isRuning():
    parser_output: dai.NNData = parser_output_queue.get()
    embeddings = message.getTensor("output")  # embeddings of shape (1, 512)

Example

You can check out the . You can run it with:

python3 main.py \
    -det luxonis/scrfd-face-detection:10g-640x640 \
    -rec luxonis/arcface:lfw-112x112

A deep face recognition model with ResNet100 backbone and ArcFace loss
License	MIT Commercial use
Downloads	835
Tasks	Image Embedding
Model Types	ONNX

Model Variants

Name	Version	Available For	Created At	Deploy
		RVC2, RVC4	9 months ago