Our new model ZOO works with DepthAI V3. Find out more in our documentation.
0 Likes
Model Details
Model Description
The L2CS-Net Gaze Estimation model is a CNN-based gaze estimation framework designed to predict 3D gaze directions in unconstrained environments. Unlike traditional methods that regress gaze angles simultaneously, L2CS-Net predicts each angle (yaw, pitch) separately, enhancing accuracy and generalization. It employs a ResNet-50 backbone and a dual-loss approach with classification and regression components to improve prediction robustness. Achieving state-of-the-art accuracy on the MPIIGaze and Gaze360 datasets, L2CS-Net is effective in diverse conditions, making it suitable for applications in human-robot interaction and virtual reality
Developed by: Ahmed A. Abdelrahman et.al.
Shared by:
Model type: Computer Vision
License:
Resources for more information:
Training Details
Training Data
The training dataset consists of two main datasets:
MPIIGaze - This dataset consists of 213,659 images from 15 participants, captured in daily life settings over several months. The images feature diverse backgrounds, lighting conditions, and head poses.
Gaze360 - Known for its wide range of 3D gaze annotations, Gaze360 contains data from 238 subjects of various ages, genders, and ethnicities. This dataset includes images captured in both indoor and outdoor settings using a Ladybug multi-camera system, providing coverage of a 360-degree gaze range.
Testing Details
Metrics
Dataset
Mean Angular Error in °
MPIIGaze
3.92
Gaze360 (Front 180°)
10.41
Gaze360 (Front-facing)
9.02
Please consult the for more information.
Technical Specifications
Input/Output Details
Input:
Name: input
Output:
Name: pitch and yaw
Info: Pitch value.
Name: yaw
Info: Yaw value.
Model Architecture
L2CS-Net uses a CNN-based architecture with a ResNet-50 backbone to extract spatial gaze features from images. The model has two distinct fully-connected layers, each dedicated to predicting one of the 3D gaze angles (yaw and pitch) independently. This approach allows for separate angle regression, enhancing accuracy.
The model includes a dual-loss function for each gaze angle:
Classification with cross-entropy loss and a softmax layer to classify gaze angles into bins.
Regression using mean-squared error to fine-tune predictions.
* Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).
* Parameters and FLOPs are obtained from the package.
Utilization
Utilization
Models converted for RVC Platforms can be used for inference on OAK devices.
DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)).
Below, we present the most crucial utilization steps for the particular model.
Please consult the docs for more information.
The final dx and dy are the offsets from the origin pointing in the direction the person is looking.
Example
You can quickly run the model using our example.
The example demonstrates how to build a 2-stage DepthAI pipeline consisting of a face detection model and a head pose estimation model.
It automatically downloads the model(s), creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool.