CREStereo | Luxonis

Our new model ZOO works with DepthAI V3. Find out more in our documentation.

0 Likes

Model Details

Model Description

The CREStereo model is an advanced stereo matching network with a cascaded recurrent architecture and Adaptive Group Correlation Layer (AGCL). CREStereo employs a hierarchical coarse-to-fine refinement strategy using recurrent updates to iteratively enhance disparity estimates to tackle the problem of practical stereo matching. The model rank 1st on both Middlebury and ETH3D benchmarks, outperforming existing state-of-the-art methods.

Developed by: Jiankun Li et al.
Shared by: AND
Model type: Computer Vision
License:
Resources for more information:

Training Details

Training Data

The CREStereo model was trained using a diverse set of synthetic and real-world datasets, as these datasets were found to significantly improve performance on major stereo matching benchmarks (Middlebury, ETH3D, and KITTI).

Training Datasets:
- .
- .
- .
- .
- .
Evaluation Datasets:
- The dataset features 23 high-resolution stereo image pairs taken under various lighting conditions. These images, captured using large-baseline stereo cameras, can exhibit disparities of over 600 pixels.
- The benchmark includes 27 monochrome stereo image pairs, with disparities measured using a laser scanner, and encompasses a mix of indoor and outdoor scenes.

Testing Details

Metrics

These results showcase the performance of CREStereo on two popular public benchmarks, the Middlebury, ETH3D. The results are obtained from the .

Evaluation Dataset	AvgErr	Bad1.0
Middlebury	1.15	8.25
ETH3D	0.13	0.98

For the evaluation, the two following popular metrics are considered:

AvgErr: average error (the smaller the better).
Bad1.0: percentage of pixels with disparity error larger than 1 pixels (the smaller the better).

Technical Specifications

Input/Output Details

Inputs:
- Name: left
  - Tensor: float32[1,3,H,W] - H,W can be in many resolutions (120x160, 240x320, 360x640)
  - Info: NCHW, BGR un-normalized image
- Name: right
  - Tensor: float32[1,3,H,W] - H,W can be in many resolution (120x160, 240x320, 360x640)
  - Info: NCHW, BGR un-normalized image
Output:
- Name: output
  - Tensor: float32[1,2,H,W] - H,W can be in many resolution (120x160, 240x320, 360x640)
  - Info: NCHW, stereo disparity map

Model Architecture

The CREStereo model features a hierarchical network with a cascaded recurrent structure for coarse-to-fine disparity refinement. It uses a Feature Extraction Network to create a multi-level pyramid and Recurrent Update Modules (RUMs) with Adaptive Group Correlation Layers (AGCL) for iterative updates. More details can be found in the .

Throughput

Platform	Model Variant	Throughput (FP16) [infs/sec]	Throughput (Quant) [infs/sec]
RVC2*	iter2_120x160	3.18	non compatible
RVC2*	iter2_240x320	1.44	non compatible
RVC4**	iter5_240x320	non compatible	27.93
RVC4**	iter4_360x640	non compatible	8.92

* Benchmarked with 2 threads, using 8 shaves; ** Benchmarked on DSP runtime on default mode.

Utilization

Models converted for RVC Platforms can be used for inference on OAK devices. DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)). Below, we present the most crucial utilization steps for the particular model. Please consult the docs for more information.

Install DAIv3 and depthai-nodes libraries:

pip install depthai
pip install depthai-nodes

Set up the DAI pipeline using the code below:

import depthai as dai
from depthai_nodes.ml.messages import Map2D
from depthai_nodes.parser_generator import ParserGenerator

# Get the model from the HubAI
model_description = dai.NNModelDescription(
    "luxonis/crestereo:iter2-320x240", platform="RVC2"
)
archivePath = dai.getModelFromZoo(model_description)
nn_archive = dai.NNArchive(archivePath)

# Get the inputs shape
inputs = nn_archive.getConfig().model.inputs
inputs_shapes = [input.shape[2:][::-1] for input in inputs]

with dai.Pipeline() as pipeline:
    # Set up cameras
    left = pipeline.create(dai.node.MonoCamera)
    left.setFps(2)
    left.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
    left.setBoardSocket(dai.CameraBoardSocket.CAM_B)

    right = pipeline.create(dai.node.MonoCamera)
    right.setFps(2)
    right.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
    right.setBoardSocket(dai.CameraBoardSocket.CAM_C)

    network = pipeline.create(dai.node.NeuralNetwork)
    network.setFromModelZoo(model_description, useCached=True)

    manip_left = pipeline.create(dai.node.ImageManip)
    manip_left.initialConfig.setResize(inputs_shapes[0][0], inputs_shapes[0][1])
    manip_left.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p)

    manip_right = pipeline.create(dai.node.ImageManip)
    manip_right.initialConfig.setResize(inputs_shapes[1][0], inputs_shapes[1][1])
    manip_right.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p)

    # Set up the parser
    parsers = pipeline.create(ParserGenerator).build(nn_archive)
    parser = parsers[0]

    # Linking
    left.out.link(manip_left.inputImage)
    right.out.link(manip_right.inputImage)
    manip_left.out.link(network.inputs["left"])
    manip_right.out.link(network.inputs["right"])
    network.out.link(parser.input)

    # Set up queues
    parser_queue = parser.out.createOutputQueue()

    pipeline.start()

    while pipeline.isRunning():
        parser_output: Map2D = parser_queue.get()
        output_map = parser_output.map

Example

Check out the complete example .

An advanced method for stereo matching based on a cascaded recurrent stereo matching network.
License	Apache 2.0 Commercial use
Downloads	598
Tasks	Depth Estimation
Model Types	ONNX

Model Variants

Name	Version	Available For	Created At	Deploy
		RVC4	7 months ago
		RVC4	7 months ago
		RVC2	7 months ago
		RVC2	7 months ago