Luxonis
    Our new model ZOO works with DepthAI V3. Find out more in our documentation.
    0 Likes
    Model Details
    Model Description
    FoundationStereo is a large foundation model for stereo depth estimation that achieves strong zero-shot generalization without per-domain fine-tuning, trained on a large-scale (1M image pairs) high-fidelity synthetic dataset with high diversity and photorealism.
    • Developed by:
    • Shared by:
    • Model type: Computer Vision
    • License:
    • Resources for more information:
    Training Details
    Training Data
    The model was trained on the which consists of 1M synthetic stereo images with structured indoor/outdoor scenes as well as more randomized scenes with challenging flying objects and higher geometry and texture diversity.
    Testing Details
    Metrics
    These results showcase the performance of FoundationStereo on two popular public benchmarks, the Middlebury, ETH3D. The results are obtained from the and the pages respectively.
    Evaluation DatasetAvgErrBad1.0Bad2.0
    ETH3D0.090.260.08
    Middlebury0.784.391.84
    For the evaluation, the three following popular metrics are considered:
    • AvgErr: average error (the smaller the better).
    • Bad1.0: percentage of pixels with disparity error larger than 1 pixels (the smaller the better).
    • Bad2.0: percentage of pixels with disparity error larger than 2 pixels (the smaller the better).
    Technical Specifications
    Input/Output Details
    • Input:
      • Name: left
      • Info: NCHW, BGR normalized image
      • Name: right
      • Info: NCHW, BGR normalized image
    • Output:
      • Name: disp
      • Info: NCHW, stereo disparity map
    Model Architecture
    FoundationStereo couples a lightweight EdgeNeXt-S “side-tuning” CNN to a frozen DepthAnything V2 ViT so that pixel-sharp local cues are blended with rich monocular depth priors; it then forms a hybrid 4-D cost volume that keeps both group-wise correlations and those fused features. An Attentive Hybrid Cost Filter denoises the volume, after which a soft-argmin gives an initial disparity map that three coarse-to-fine ConvGRU iterations polish into crisp, detail-preserving depth.
    Throughput
    Model variant: foundation-stereo:640x416
    • Input shapes: [[1, 3, 416, 640], [1, 3, 416, 640]] • Output shape: [1, 1, 416, 640]
    • Params (M): 371.687 • GFLOPs: 7144.733
    Model variant: foundation-stereo:1280x800
    • Input shapes: [[1, 3, 800, 1280], [1, 3, 800, 1280]] • Output shape: [1, 1, 800, 1280]
    • Params (M): 371.687 • GFLOPs: 31306.855
    * Parameters and FLOPs are obtained from the package.
    Utilization
    This script demonstrates how to generate neural stereo disparity using the FoundationStereo ONNX model on OAK devices with DepthAI v3.
    Install required libraries:
    pip install -r requirements.txt
    
    Load ONNX model:
    onnx_session = load_onnx_model(ONNX_MODEL_PATH)
    
    ...
    left = preprocess_image(rectified_left, (INFERENCE_H, INFERENCE_W))
    right = preprocess_image(rectified_right, (INFERENCE_H, INFERENCE_W))
    
    nn_disparity = run_onnx_inference(onnx_session, left, right)[0][0, 0]
    
    After the interference is running press F to get generate Foundation Stereo Disparity.
    Example
    Check out the complete example .
    Foundation Stereo
    A foundation model for stereo depth estimation designed to achieve strong zero-shot generalization.
    License
    Not Defined
    Downloads
    54
    Tasks
    Depth Estimation
    Model Types
    ONNX
    Model Variants
    NameVersionAvailable ForCreated AtDeploy
    27 days ago
    27 days ago
    Luxonis - Robotic vision made simple.
    XYouTubeLinkedInGitHub