Luxonis
    Our new model ZOO works with DepthAI V3. Find out more in our documentation.
    1+ Likes
    Model Details
    Model Description
    For applications in mobile robotics and augmented reality, it is critical that models can run on hardware-constrained computers. To this end, XFeat was designed as an agnostic solution focusing on both accuracy and efficiency in an image-matching pipeline. It has compact descriptors (64D) and simple architecture components that facilitate deployment on embedded devices. Performance is comparable to known deep local features such as SuperPoint while being significantly faster and more lightweight. Also, XFeat exhibits much better robustness to viewpoint and illumination changes than classic local features such as ORB and SIFT.
    • Developed by: VeRLab: Laboratory of Computer Vison and Robotics
    • Shared by:
    • Model type: keypoint-detection model
    • License:
    • Resources for more information:
    Training Details
    Training Data
    The model is trained on a mix of scenes using synthetically warped pairs using raw images (without labels) from COCO in the proportion of 6 : 4 respectively.
    Testing Details
    Metrics
    Megadepth-1500 relative camera pose estimation:
    MetricValue
    AUC@5°50.20
    AUC@10°65.40
    AUC@20°77.10
    ACC@10°.85.1
    Technical Specifications
    Input/Output Details
    • Input:
      • Name: images
        • Tensor: float32[1,3,352,640]
        • Info: NCHW BGR un-normalized image
    • Output:
      • Name: Multiple (please consult NN archive config.json)
        • Tensor: Multiple (please consult NN archive config.json)
        • Info: Features, keypoints, heatmaps that needed to be additionally post-processed.
    Model Architecture
    • Backbone: composed of six convolutional blocks that progressively reduce the spatial resolution while increasing the depth of the network. The backbone includes a combination of basic layers with 2D convolutions, ReLU activations, and Batch Normalization, following a modular design to balance depth and computational efficiency.
    • Descriptor Head: feature pyramid and basic layers.
    • Keypoint Head: It uses a minimalist approach with 1×1 convolutions on an 8×8 grid-transformed image to efficiently regress keypoint coordinates.
    Please consult the for more information on model architecture.
    Throughput
    PlatformThroughput [infs/sec]
    RVC2*55.96
    RVC4**174.84
    * Benchmarked with 2 threads; ** Benchmarked on DSP fixed point runtime with balanced mode.
    Quantization
    RVC4 version of the model was quantized using a HubAI General dataset.
    Utilization
    Models converted for RVC Platforms can be used for inference on OAK devices. DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)). Below, we present the most crucial utilization steps for the particular model. Please consult the docs for more information.
    Install DAIv3 and depthai-nodes libraries:
    pip install depthai
    pip install depthai-nodes
    
    Define model:
    model_description = dai.NNModelDescription(
        "luxonis/xfeat:mono-320x240"
    )
    
    Mono version
    nn = pipeline.create(ParsingNeuralNetwork).build(
        <CameraNode>, model_description
    )
    
    It requires setting the reference frame to which all upcoming frames will be matched. This can be achieved by triggering (inside a pipeline) e.g. by key-press:
    # Get the parser
    parser: XFeatMonoParser = nn.getParser(0)
    
    # Inside pipeline loop
    if cv2.waitKey(1) == ord('s'):
        parser.setTrigger()
    
    Stereo version
    The stereo version requires two cameras (left and right). Therefore the usage is slightly different.
    First, adjust the model description to get the stereo version and download NN archive:
    model_description = dai.NNModelDescription(
        "luxonis/xfeat:stereo-320x240"
    )
    nn_archive_path = dai.getModelFromZoo(model_description)
    nn_archive = dai.NNArchive(nn_archive_path)
    
    Load the NN archive and manually extract the model input shape:
    input_shape = nn_archive.getConfig().model.inputs[0].shape[2:][::-1]
    
    You need two cameras for stereo mode and two NN nodes:
    left_cam = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_B)
    right_cam = pipeline.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_C)
    
    left_network = pipeline.create(dai.node.NeuralNetwork).build(
        left_cam.requestOutput(input_shape,type=dai.ImgFrame.Type.BGR888p),
        nn_archive
    )
    left_network.setNumInferenceThreads(2)
    
    right_network = pipeline.create(dai.node.NeuralNetwork).build(
        right_cam.requestOutput(input_shape,type=dai.ImgFrame.Type.BGR888p),
    	nn_archive
    )
    right_network.setNumInferenceThreads(2)
    
    Here we create a parser with ParserGenerator node because we already have neural network nodes ready:
    parsers = pipeline.create(ParserGenerator).build(nn_archive)
    parser: XFeatStereoParser = parsers[0]
    
    Inspect model head(s):
    • XFeatMonoParser or XFeatStereoParser that outputs message (detected features over time with age=0 for left image and age=1 for right image).
    Get parsed output(s):
    while pipeline.isRuning():
        parser_output: dai.TrackedFeatures = parser_output_queue.get()
    
    Example
    You can quickly run the model using our example.
    We offer two modes of operation: mono and stereo. In mono mode we use a single camera as an input and match the frames to the reference image. In stereo mode we use two cameras to match the frames to each other. In mono mode we visualize the matches between the frames and the reference frame which can be set by pressing s key in the visualizer. In stereo mode we visualize the matches between the frames from the left and right camera.
    It automatically downloads the model, creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool.
    To try it out, run:
    python3 main.py
    
    XFeat
    Feature points detector and descriptor.
    License
    Apache 2.0
    Commercial use
    Downloads
    393
    Tasks
    Feature Detection
    Model Types
    ONNX
    Model Variants
    NameVersionAvailable ForCreated AtDeploy
    RVC2, RVC48 months ago
    RVC2, RVC48 months ago
    RVC2, RVC48 months ago
    RVC2, RVC48 months ago
    Luxonis - Robotic vision made simple.
    XYouTubeLinkedInGitHub