Our new model ZOO works with DepthAI V3. Find out more in our documentation.
0 Likes
Model Details
Model Description
The Omni-Scale Network (OSNet) is a deep convolutional neural network (CNN) designed for person re-identification (re-ID), a task that requires discriminative features to distinguish between individuals across different camera views. OSNet excels in learning omni-scale features—features that capture both homogeneous and heterogeneous spatial scales—crucial for recognizing people in varied poses, clothing, and environments.
Developed by: Kaiyang Zhou et al.
Shared by:
Model type: Computer Vision
License:
Resources for more information:
Training Details
Training Data
The model has been trained and evaluated (mainly) on four different datasets:
dataset. The Market1501 dataset consists of 32217 images, of which the 12936 are training images, the 3368 are query images, and the 15913 are gallery images.
dataset. The DukeMTMC dataset consists of 36411 images, of which the 16522 are training images, the 2228 are query images, and the 17661 are gallery images.
dataset. The MSMT17 dataset consists of 124068 images, of which the 30248 are training images, the 11659 are query images, and the 82161 are gallery images.
dataset. The CUHK03 dataset consists of 14097 images, of which the 7365 are training images, the 1400 are query images, and the 5332 are gallery images.
Testing Details
Metrics
These results showcase the performance of OSNet on the four datasets Market1501, DukeMTMC, MSMT17, and CUHK03. The results are obtained from the project's and the .
Same-domain ReID
The models below are trained and evaluated on the same (single) dataset.
Dataset
R1 (%)
mAP (%)
Market1501
94.2
82.6
DukeMTMC
87.0
70.2
MSMT17
74.9
43.8
CUHK03
72.3
67.8
Multi-source domain generalization
The models below are trained using multiple source datasets.
Source Datasets
Target Dataset
R1 (%)
mAP (%)
MSMT17+DukeMTMC+CUHK03
Market1501
72.5
44.2
MSMT17+Market1501+CUHK03
DukeMTMC
65.2
47.0
MSMT17+DukeMTMC+Market1501
CUHK03
23.9
23.3
DukeMTMC+Market1501+CUHK03
MSMT17
33.2
12.6
Technical Specifications
Input/Output Details
Input:
Name: images
Info: NCHW, BGR un-normalized image
Output:
Name: output
Info: NF, the output embeddings of the model
Model Architecture
OSNet uses a unique residual block with multiple convolutional streams, each focusing on different scales, and a unified aggregation gate that dynamically fuses these multi-scale features with input-dependent channel-wise weights. By employing pointwise and depthwise convolutions, OSNet efficiently models spatial-channel correlations while avoiding overfitting.
* Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).
* Parameters and FLOPs are obtained from the package.
Utilization
Models converted for RVC Platforms can be used for inference on OAK devices.
DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)).
Below, we present the most crucial utilization steps for the particular model.
Please consult the docs for more information.
EmbeddingsParser that outputs dai.NNData containing the output embeddings of the model.
Get parsed output(s):
while pipeline.isRuning():
parser_output: dai.NNData = parser_output_queue.get()
embeddings = message.getTensor("output") # embeddings of shape (1, 512)
Example
You can quickly run the model using our example.
The example demonstrates how to build a 2-stage DepthAI pipeline consisting of a detection model and a recognition model.
It automatically downloads the models, creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool.