Luxonis
    Our new model ZOO works with DepthAI V3. Find out more in our documentation.
    0 Likes
    Model Details
    Model Description
    The L2CS-Net Gaze Estimation model is a CNN-based gaze estimation framework designed to predict 3D gaze directions in unconstrained environments. Unlike traditional methods that regress gaze angles simultaneously, L2CS-Net predicts each angle (yaw, pitch) separately, enhancing accuracy and generalization. It employs a ResNet-50 backbone and a dual-loss approach with classification and regression components to improve prediction robustness. Achieving state-of-the-art accuracy on the MPIIGaze and Gaze360 datasets, L2CS-Net is effective in diverse conditions, making it suitable for applications in human-robot interaction and virtual reality​
    • Developed by: Ahmed A. Abdelrahman et.al.
    • Shared by:
    • Model type: Computer Vision
    • License:
    • Resources for more information:
    Training Details
    Training Data
    The training dataset consists of two main datasets:
    1. MPIIGaze - This dataset consists of 213,659 images from 15 participants, captured in daily life settings over several months. The images feature diverse backgrounds, lighting conditions, and head poses.
    2. Gaze360 - Known for its wide range of 3D gaze annotations, Gaze360 contains data from 238 subjects of various ages, genders, and ethnicities. This dataset includes images captured in both indoor and outdoor settings using a Ladybug multi-camera system, providing coverage of a 360-degree gaze range.
    Testing Details
    Metrics
    DatasetMean Angular Error in °
    MPIIGaze3.92
    Gaze360 (Front 180°)10.41
    Gaze360 (Front-facing)9.02
    Please consult the for more information.
    Technical Specifications
    Input/Output Details
    • Input:
      • Name: input
    • Output:
      • Name: pitch and yaw
        • Info: Pitch value.
      • Name: yaw
        • Info: Yaw value.
    Model Architecture
    L2CS-Net uses a CNN-based architecture with a ResNet-50 backbone to extract spatial gaze features from images. The model has two distinct fully-connected layers, each dedicated to predicting one of the 3D gaze angles (yaw and pitch) independently. This approach allows for separate angle regression, enhancing accuracy.
    The model includes a dual-loss function for each gaze angle:
    1. Classification with cross-entropy loss and a softmax layer to classify gaze angles into bins.
    2. Regression using mean-squared error to fine-tune predictions.
    Throughput
    Model variant: l2cs-net:448x448
    • Input shape: [1, 3, 448, 448] • Output shapes: [[1], [1]]
    • Params (M): 23.850 • GFLOPs: 16.462
    PlatformPrecisionThroughput (infs/sec)Power Consumption (W)
    RVC2FP164.30N/A
    RVC4FP1699.645.58
    * Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).
    * Parameters and FLOPs are obtained from the package.
    Utilization
    Utilization
    Models converted for RVC Platforms can be used for inference on OAK devices. DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)). Below, we present the most crucial utilization steps for the particular model. Please consult the docs for more information.
    Install DAIv3 and depthai-nodes libraries:
    pip install depthai
    pip install depthai-nodes
    
    Define model:
    model_description = dai.NNModelDescription(
        "luxonis/l2cs-net:448x448"
    )
    
    nn = pipeline.create(ParsingNeuralNetwork).build(
        <CameraNode>, model_description
    )
    
    Inspect model head(s):
    • RegressionParser that outputs message (pitch value).
    • RegressionParser that outputs message (yaw value).
    The model is multi-headed. You can set up the queues as follows:
    pitch_parser_output_queue = nn.getOutput(0).createOutputQueue()
    yaw_parser_output_queue = nn.getOutput(1).createOutputQueue()
    
    Get parsed output(s):
    while pipeline.isRuning():
        pitch_parser_output: Predictions = pitch_parser_output_queue.get()
        yaw_parser_output: Predictions = yaw_parser_output_queue.get()
    
    Simple post processing needs to be done in order to get out the coordinates:
    pitch = pitch * np.pi / 180
    yaw = yaw * np.pi / 180
    line_length = 50
    dx = -line_length * np.sin(pitch) * np.cos(yaw)
    dy = -line_length * np.sin(yaw)
    
    The final dx and dy are the offsets from the origin pointing in the direction the person is looking.
    Example
    You can quickly run the model using our example.
    The example demonstrates how to build a 2-stage DepthAI pipeline consisting of a face detection model and a head pose estimation model. It automatically downloads the model(s), creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool.
    To try it out, run:
    python3 main.py
    
    L2CS-Net
    Gaze estimation model.
    License
    MIT
    Commercial use
    Downloads
    179
    Tasks
    Regression
    Model Types
    ONNX
    Model Variants
    NameVersionAvailable ForCreated AtDeploy
    RVC2, RVC47 months ago
    Luxonis - Robotic vision made simple.
    XYouTubeLinkedInGitHub