Our new model ZOO works with DepthAI V3. Find out more in our documentation.
Model Details
Model Description
ConvNeXt is a family of modern pure convolutional neural networks (ConvNets), which incorporate ideas inspired by Vision Transformers (ViTs), while maintaining the simplicity and efficiency of standard ConvNets.
Developed by:: Facebook AI Research (FAIR)
Shared by:
Model type: Computer vision
License:
Resources for more information:
Paper:
Training Details
Training Data
ConvNeXt models are trained on for classification tasks, with some variants pre-trained on then fine-tuned on ImageNet-1K.
Testing Details
Metrics
Model Variant
Top-1 Acc.
ConvNeXt-T
82.90%
ConvNeXt-S
84.60%
ConvNeXt-B
85.80%
ConvNeXt-L
86.60%
ConvNeXt-XL
87.00%
The evaluation results above are on ImageNet-1K with the models trained at 224x224 resolution.
For more information please check the .
Technical Specifications
Input/Output Details
Input:
Name: input
Info: NCHW BGR 0-255 image.
Output:
Name: output
Info: A feature vector for 1000 classes (non-softmaxed)
Model Architecture
The ConvNeXt architecture consists of:
Stem (“patchify” stem): A convolution layer of size 4×4 with stride 4 that downsamples the input (mirroring the patch embedding idea in ViTs) instead of the older ResNet large kernel + pooling approach.
Body / Stages: Four stages with repeated ConvNeXt blocks. Key components of ConvNeXt blocks include: Depthwise convolution, Pointwise convolutions, Layer Normalization (inspired by ViTs).
Head: After the body, a global average pooling, followed by a fully connected layer to produce class logits, then softmax for classification.
Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).
Parameters and FLOPs are obtained from the package.
Utilization
Models converted for RVC Platforms can be used for inference on OAK devices.
DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)).
Below, we present the most crucial utilization steps for the particular model.
Please consult the docs for more information.
ClassificationParser that outputs message (detected classes and scores).
Get parsed output(s):
while pipeline.isRuning():
parser_output: Classifications = parser_output_queue.get()
Example
You can quickly run the model using our script.
It automatically downloads the model, creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool.
To try it out, run: