L2CS-Net | Luxonis

Our new model ZOO works with DepthAI V3. Find out more in our documentation.

0 Likes

Model Details

Model Description

The L2CS-Net Gaze Estimation model is a CNN-based gaze estimation framework designed to predict 3D gaze directions in unconstrained environments. Unlike traditional methods that regress gaze angles simultaneously, L2CS-Net predicts each angle (yaw, pitch) separately, enhancing accuracy and generalization. It employs a ResNet-50 backbone and a dual-loss approach with classification and regression components to improve prediction robustness. Achieving state-of-the-art accuracy on the MPIIGaze and Gaze360 datasets, L2CS-Net is effective in diverse conditions, making it suitable for applications in human-robot interaction and virtual reality

Developed by: Ahmed A. Abdelrahman et.al.
Shared by:
Model type: Computer Vision
License:
Resources for more information:

Training Details

Training Data

The training dataset consists of two main datasets:

MPIIGaze - This dataset consists of 213,659 images from 15 participants, captured in daily life settings over several months. The images feature diverse backgrounds, lighting conditions, and head poses.
Gaze360 - Known for its wide range of 3D gaze annotations, Gaze360 contains data from 238 subjects of various ages, genders, and ethnicities. This dataset includes images captured in both indoor and outdoor settings using a Ladybug multi-camera system, providing coverage of a 360-degree gaze range.

Testing Details

Metrics

Dataset	Mean Angular Error in °
MPIIGaze	3.92
Gaze360 (Front 180°)	10.41
Gaze360 (Front-facing)	9.02

Please consult the for more information.

Technical Specifications

Input/Output Details

Input:
- Name: input
Output:
- Name: pitch and yaw
  - Info: Pitch value.
- Name: yaw
  - Info: Yaw value.

Model Architecture

L2CS-Net uses a CNN-based architecture with a ResNet-50 backbone to extract spatial gaze features from images. The model has two distinct fully-connected layers, each dedicated to predicting one of the 3D gaze angles (yaw and pitch) independently. This approach allows for separate angle regression, enhancing accuracy.

The model includes a dual-loss function for each gaze angle:

Classification with cross-entropy loss and a softmax layer to classify gaze angles into bins.
Regression using mean-squared error to fine-tune predictions.

Throughput

Model variant: l2cs-net:448x448

• Input shape: [1, 3, 448, 448] • Output shapes: [[1], [1]]

• Params (M): 23.850 • GFLOPs: 16.462

Platform	Precision	Throughput (infs/sec)	Power Consumption (W)
RVC2	FP16	4.30	N/A
RVC4	FP16	99.64	5.58

* Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).

* Parameters and FLOPs are obtained from the package.

Utilization

Models converted for RVC Platforms can be used for inference on OAK devices. DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)). Below, we present the most crucial utilization steps for the particular model. Please consult the docs for more information.

Install DAIv3 and depthai-nodes libraries:

pip install depthai
pip install depthai-nodes

Define model:

model_description = dai.NNModelDescription(
    "luxonis/l2cs-net:448x448"
)

nn = pipeline.create(ParsingNeuralNetwork).build(
    <CameraNode>, model_description
)

Inspect model head(s):

RegressionParser that outputs message (pitch value).
RegressionParser that outputs message (yaw value).

The model is multi-headed. You can set up the queues as follows:

pitch_parser_output_queue = nn.getOutput(0).createOutputQueue()
yaw_parser_output_queue = nn.getOutput(1).createOutputQueue()

Get parsed output(s):

while pipeline.isRuning():
    pitch_parser_output: Predictions = pitch_parser_output_queue.get()
    yaw_parser_output: Predictions = yaw_parser_output_queue.get()

Simple post processing needs to be done in order to get out the coordinates:

pitch = pitch * np.pi / 180
yaw = yaw * np.pi / 180
line_length = 50
dx = -line_length * np.sin(pitch) * np.cos(yaw)
dy = -line_length * np.sin(yaw)

The final dx and dy are the offsets from the origin pointing in the direction the person is looking.

Example

You can quickly run the model using our example.

The example demonstrates how to build a 2-stage DepthAI pipeline consisting of a face detection model and a head pose estimation model. It automatically downloads the model(s), creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool.

To try it out, run:

python3 main.py

Gaze estimation model.
License	MIT Commercial use
Downloads	179
Tasks	Regression
Model Types	ONNX

Model Variants

Name	Version	Available For	Created At	Deploy
		RVC2, RVC4	7 months ago