Our new model ZOO works with DepthAI V3. Find out more in our documentation.
Model Details
Model Description
DINOv3 is a family of vision foundation models trained using self-supervised learning (SSL), which produces high-quality dense features and achieves outstanding performance across diverse vision tasks without fine-tuning.
Developed by: Facebook AI Research (FAIR)
Shared by:
Model type: Computer vision - Vision Foundation Model
License:
Resources for more information:
Paper:
Website:
Training Details
Training Data
DINOv3 models are trained on LVD-1689M (a curated web dataset with 1.7 billion images) for general-purpose visual representations. Some variants are also trained on SAT-493M (493 million satellite images) for satellite imagery applications.
Testing Details
Metrics
DiNOV3 models are evaluated on various downstream tasks, including:
Image Classification
Object Detection
Semantic Segmentation
Instance Segmentation
Depth Estimation (Monocular)
Video Classification
Video Tracking
Indicative, we detail some results on a set of global and dense benchmarks: classification (IN-ReAL, IN-R, ObjectNet), retrieval (Oxford-H), segmentation (ADE20k), depth (NYU), tracking (DAVIS at 960px), and keypoint matching (NAVI, SPair).
Model
IN-Real
IN-R
Obj.
OX-H
ADE20k
NYU↓
DAVIS
NAVI
SPair
DINOv3-S
87.0
60.4
50.9
49.5
47.0
0.403
72.7
56.3
50.4
DINOv3-S+
88.0
68.8
54.6
50.0
48.8
0.399
75.5
57.3
55.2
DINOv3-B
89.3
76.7
64.1
55.6
51.8
0.354
78.2
59.4
57.0
DINOv3-L
90.3
86.7
71.2
65.8
53.8
0.337
79.4
63.7
62.7
DINOv3-H+
90.3
90.0
78.6
64.5
54.8
0.352
79.3
63.3
56.3
For more comprehensive evaluation results, please refer to the and the .
Technical Specifications
Input/Output Details
Input:
Name: image
Info: NCHW BGR 0-255 image.
Output:
Name: embeddings
Info: Dense visual features suitable for various downstream tasks
Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).
Parameters and FLOPs are approximate estimates. For exact values, use the package.
Utilization
Models converted for RVC Platforms can be used for inference on OAK devices.
DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)).
Below, we present the most crucial utilization steps for the particular model.
Please consult the docs for more information.