Luxonis
    Our new model ZOO works with DepthAI V3. Find out more in our documentation.
    0 Likes
    Model Details
    Model Description
    The CREStereo model is an advanced stereo matching network with a cascaded recurrent architecture and Adaptive Group Correlation Layer (AGCL). CREStereo employs a hierarchical coarse-to-fine refinement strategy using recurrent updates to iteratively enhance disparity estimates to tackle the problem of practical stereo matching. The model rank 1st on both Middlebury and ETH3D benchmarks, outperforming existing state-of-the-art methods.
    • Developed by: Jiankun Li et al.
    • Shared by: AND
    • Model type: Computer Vision
    • License:
    • Resources for more information:
    Training Details
    Training Data
    The CREStereo model was trained using a diverse set of synthetic and real-world datasets, as these datasets were found to significantly improve performance on major stereo matching benchmarks (Middlebury, ETH3D, and KITTI).
    • Training Datasets:
      • .
      • .
      • .
      • .
      • .
    • Evaluation Datasets:
      • The dataset features 23 high-resolution stereo image pairs taken under various lighting conditions. These images, captured using large-baseline stereo cameras, can exhibit disparities of over 600 pixels.
      • The benchmark includes 27 monochrome stereo image pairs, with disparities measured using a laser scanner, and encompasses a mix of indoor and outdoor scenes.
    Testing Details
    Metrics
    These results showcase the performance of CREStereo on two popular public benchmarks, the Middlebury, ETH3D. The results are obtained from the .
    Evaluation DatasetAvgErrBad1.0
    Middlebury1.158.25
    ETH3D0.130.98
    For the evaluation, the two following popular metrics are considered:
    • AvgErr: average error (the smaller the better).
    • Bad1.0: percentage of pixels with disparity error larger than 1 pixels (the smaller the better).
    Technical Specifications
    Input/Output Details
    • Inputs:
      • Name: left
        • Tensor: float32[1,3,H,W] - H,W can be in many resolutions (120x160, 240x320, 360x640)
        • Info: NCHW, BGR un-normalized image
      • Name: right
        • Tensor: float32[1,3,H,W] - H,W can be in many resolution (120x160, 240x320, 360x640)
        • Info: NCHW, BGR un-normalized image
    • Output:
      • Name: output
        • Tensor: float32[1,2,H,W] - H,W can be in many resolution (120x160, 240x320, 360x640)
        • Info: NCHW, stereo disparity map
    Model Architecture
    The CREStereo model features a hierarchical network with a cascaded recurrent structure for coarse-to-fine disparity refinement. It uses a Feature Extraction Network to create a multi-level pyramid and Recurrent Update Modules (RUMs) with Adaptive Group Correlation Layers (AGCL) for iterative updates. More details can be found in the .
    Throughput
    PlatformModel VariantThroughput (FP16)
    [infs/sec]
    Throughput (Quant)
    [infs/sec]
    RVC2*iter2_120x1603.18 non compatible
    RVC2*iter2_240x3201.44 non compatible
    RVC4**iter5_240x320 non compatible 27.93
    RVC4**iter4_360x640 non compatible 8.92
    * Benchmarked with 2 threads, using 8 shaves; ** Benchmarked on DSP runtime on default mode.
    Utilization
    Models converted for RVC Platforms can be used for inference on OAK devices. DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)). Below, we present the most crucial utilization steps for the particular model. Please consult the docs for more information.
    Install DAIv3 and depthai-nodes libraries:
    pip install depthai
    pip install depthai-nodes
    
    Set up the DAI pipeline using the code below:
    import depthai as dai
    from depthai_nodes.ml.messages import Map2D
    from depthai_nodes.parser_generator import ParserGenerator
    
    # Get the model from the HubAI
    model_description = dai.NNModelDescription(
        "luxonis/crestereo:iter2-320x240", platform="RVC2"
    )
    archivePath = dai.getModelFromZoo(model_description)
    nn_archive = dai.NNArchive(archivePath)
    
    # Get the inputs shape
    inputs = nn_archive.getConfig().model.inputs
    inputs_shapes = [input.shape[2:][::-1] for input in inputs]
    
    with dai.Pipeline() as pipeline:
        # Set up cameras
        left = pipeline.create(dai.node.MonoCamera)
        left.setFps(2)
        left.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
        left.setBoardSocket(dai.CameraBoardSocket.CAM_B)
    
        right = pipeline.create(dai.node.MonoCamera)
        right.setFps(2)
        right.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
        right.setBoardSocket(dai.CameraBoardSocket.CAM_C)
    
        network = pipeline.create(dai.node.NeuralNetwork)
        network.setFromModelZoo(model_description, useCached=True)
    
        manip_left = pipeline.create(dai.node.ImageManip)
        manip_left.initialConfig.setResize(inputs_shapes[0][0], inputs_shapes[0][1])
        manip_left.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p)
    
        manip_right = pipeline.create(dai.node.ImageManip)
        manip_right.initialConfig.setResize(inputs_shapes[1][0], inputs_shapes[1][1])
        manip_right.initialConfig.setFrameType(dai.ImgFrame.Type.BGR888p)
    
        # Set up the parser
        parsers = pipeline.create(ParserGenerator).build(nn_archive)
        parser = parsers[0]
    
        # Linking
        left.out.link(manip_left.inputImage)
        right.out.link(manip_right.inputImage)
        manip_left.out.link(network.inputs["left"])
        manip_right.out.link(network.inputs["right"])
        network.out.link(parser.input)
    
        # Set up queues
        parser_queue = parser.out.createOutputQueue()
    
        pipeline.start()
    
        while pipeline.isRunning():
            parser_output: Map2D = parser_queue.get()
            output_map = parser_output.map
    
    Example
    Check out the complete example .
    CREStereo
    An advanced method for stereo matching based on a cascaded recurrent stereo matching network.
    License
    Apache 2.0
    Commercial use
    Downloads
    598
    Tasks
    Depth Estimation
    Model Types
    ONNX
    Model Variants
    NameVersionAvailable ForCreated AtDeploy
    RVC47 months ago
    RVC47 months ago
    RVC27 months ago
    RVC27 months ago
    Luxonis - Robotic vision made simple.
    XYouTubeLinkedInGitHub