Luxonis
    Our new model ZOO works with DepthAI V3. Find out more in our documentation.
    Model Details
    Model Description
    ConvNeXt is a family of modern pure convolutional neural networks (ConvNets), which incorporate ideas inspired by Vision Transformers (ViTs), while maintaining the simplicity and efficiency of standard ConvNets.
    • Developed by: Facebook AI Research (FAIR)
    • Shared by:
    • Model type: Computer vision
    • License:
    • Resources for more information:
      • Paper:
    Training Details
    Training Data
    ConvNeXt models are trained on for classification tasks, with some variants pre-trained on then fine-tuned on ImageNet-1K.
    Testing Details
    Metrics
    Model VariantTop-1 Acc.
    ConvNeXt-T82.90%
    ConvNeXt-S84.60%
    ConvNeXt-B85.80%
    ConvNeXt-L86.60%
    ConvNeXt-XL87.00%
    The evaluation results above are on ImageNet-1K with the models trained at 224x224 resolution. For more information please check the .
    Technical Specifications
    Input/Output Details
    • Input:
      • Name: input
      • Info: NCHW BGR 0-255 image.
    • Output:
      • Name: output
      • Info: A feature vector for 1000 classes (non-softmaxed)
    Model Architecture
    The ConvNeXt architecture consists of:
    1. Stem (“patchify” stem): A convolution layer of size 4×4 with stride 4 that downsamples the input (mirroring the patch embedding idea in ViTs) instead of the older ResNet large kernel + pooling approach.
    2. Body / Stages: Four stages with repeated ConvNeXt blocks. Key components of ConvNeXt blocks include: Depthwise convolution, Pointwise convolutions, Layer Normalization (inspired by ViTs).
    3. Head: After the body, a global average pooling, followed by a fully connected layer to produce class logits, then softmax for classification.
    Throughput
    Model variant: convnext:base-384x384
    • Input shape: [1, 3, 384, 384] • Output shape: [1, 1000]
    • Params (M): 88.591 • GFLOPs: 45.849
    PlatformPrecisionThroughput (infs/sec)Power Consumption (W)
    RVC4FP1653.588.16
    Model variant: convnext:large-224x224
    • Input shape: [1, 3, 224, 224] • Output shape: [1, 1000]
    • Params (M): 197.767 • GFLOPs: 34.732
    PlatformPrecisionThroughput (infs/sec)Power Consumption (W)
    RVC4FP1647.9710.55
    • Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).
    • Parameters and FLOPs are obtained from the package.
    Utilization
    Models converted for RVC Platforms can be used for inference on OAK devices. DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)). Below, we present the most crucial utilization steps for the particular model. Please consult the docs for more information.
    Install DAIv3 and depthai-nodes libraries:
    pip install depthai
    pip install depthai-nodes
    
    Define model:
    model_description = dai.NNModelDescription(
        "luxonis/convnext:base-384x384"
    )
    
    nn = pipeline.create(ParsingNeuralNetwork).build(
        <CameraNode>, model_description
    )
    
    Inspect model head(s):
    • ClassificationParser that outputs message (detected classes and scores).
    Get parsed output(s):
    while pipeline.isRuning():
        parser_output: Classifications = parser_output_queue.get()
    
    Example
    You can quickly run the model using our script. It automatically downloads the model, creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool. To try it out, run:
    python3 main.py \
        --model luxonis/convnext:base-384x384
    
    ConvNeXt
    ConvNeXt modernizes classic CNNs with ViT-style design.
    License
    MIT
    Commercial use
    Downloads
    275
    Tasks
    Classification
    Model Types
    ONNX
    Model Variants
    NameVersionAvailable ForCreated AtDeploy
    RVC48 months ago
    RVC48 months ago
    Luxonis - Robotic vision made simple.
    XYouTubeLinkedInGitHub