OSNet | Luxonis

Our new model ZOO works with DepthAI V3. Find out more in our documentation.

0 Likes

Model Details

Model Description

The Omni-Scale Network (OSNet) is a deep convolutional neural network (CNN) designed for person re-identification (re-ID), a task that requires discriminative features to distinguish between individuals across different camera views. OSNet excels in learning omni-scale features—features that capture both homogeneous and heterogeneous spatial scales—crucial for recognizing people in varied poses, clothing, and environments.

Developed by: Kaiyang Zhou et al.
Shared by:
Model type: Computer Vision
License:
Resources for more information:

Training Details

Training Data

The model has been trained and evaluated (mainly) on four different datasets:

dataset. The Market1501 dataset consists of 32217 images, of which the 12936 are training images, the 3368 are query images, and the 15913 are gallery images.
dataset. The DukeMTMC dataset consists of 36411 images, of which the 16522 are training images, the 2228 are query images, and the 17661 are gallery images.
dataset. The MSMT17 dataset consists of 124068 images, of which the 30248 are training images, the 11659 are query images, and the 82161 are gallery images.
dataset. The CUHK03 dataset consists of 14097 images, of which the 7365 are training images, the 1400 are query images, and the 5332 are gallery images.

Testing Details

Metrics

These results showcase the performance of OSNet on the four datasets Market1501, DukeMTMC, MSMT17, and CUHK03. The results are obtained from the project's and the .

Same-domain ReID

The models below are trained and evaluated on the same (single) dataset.

Dataset	R1 (%)	mAP (%)
Market1501	94.2	82.6
DukeMTMC	87.0	70.2
MSMT17	74.9	43.8
CUHK03	72.3	67.8

Multi-source domain generalization

The models below are trained using multiple source datasets.

Source Datasets	Target Dataset	R1 (%)	mAP (%)
MSMT17+DukeMTMC+CUHK03	Market1501	72.5	44.2
MSMT17+Market1501+CUHK03	DukeMTMC	65.2	47.0
MSMT17+DukeMTMC+Market1501	CUHK03	23.9	23.3
DukeMTMC+Market1501+CUHK03	MSMT17	33.2	12.6

Technical Specifications

Input/Output Details

Input:
- Name: images
  - Info: NCHW, BGR un-normalized image
Output:
- Name: output
  - Info: NF, the output embeddings of the model

Model Architecture

OSNet uses a unique residual block with multiple convolutional streams, each focusing on different scales, and a unified aggregation gate that dynamically fuses these multi-scale features with input-dependent channel-wise weights. By employing pointwise and depthwise convolutions, OSNet efficiently models spatial-channel correlations while avoiding overfitting.

Throughput

Model variant: osnet:market1501-128x256

• Input shape: [1, 3, 256, 128] • Output shape: [1, 512]

• Params (M): 2.160 • GFLOPs: 1.002

Platform	Precision	Throughput (infs/sec)	Power Consumption (W)
RVC2	FP16	49.09	N/A
RVC4	FP16	571.04	3.37

Model variant: osnet:imagenet-128x256

• Input shape: [1, 3, 256, 128] • Output shape: [1, 512]

• Params (M): 2.160 • GFLOPs: 1.002

Platform	Precision	Throughput (infs/sec)	Power Consumption (W)
RVC2	FP16	49.01	N/A
RVC4	FP16	571.55	3.75

Model variant: osnet:multi-source-domain-128x256

• Input shape: [1, 3, 256, 128] • Output shape: [1, 512]

• Params (M): 2.160 • GFLOPs: 1.002

Platform	Precision	Throughput (infs/sec)	Power Consumption (W)
RVC2	FP16	49.16	N/A
RVC4	FP16	571.11	3.36

* Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).

* Parameters and FLOPs are obtained from the package.

Utilization

Models converted for RVC Platforms can be used for inference on OAK devices. DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)). Below, we present the most crucial utilization steps for the particular model. Please consult the docs for more information.

Install DAIv3 and depthai-nodes libraries:

pip install depthai
pip install depthai-nodes

Define model:

model_description = dai.NNModelDescription(
    "luxonis/osnet:imagenet-128x256"
)

nn = pipeline.create(ParsingNeuralNetwork).build(
    <CameraNode>, model_description
)

Inspect model head(s):

EmbeddingsParser that outputs dai.NNData containing the output embeddings of the model.

Get parsed output(s):

while pipeline.isRuning():
    parser_output: dai.NNData = parser_output_queue.get()
    embeddings = message.getTensor("output")  # embeddings of shape (1, 512)

Example

You can quickly run the model using our example.

The example demonstrates how to build a 2-stage DepthAI pipeline consisting of a detection model and a recognition model. It automatically downloads the models, creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool.

To try it out, run:

python3 main.py \
    -det luxonis/scrfd-person-detection:25g-640x640 \
    -rec luxonis/osnet:imagenet-128x256 \
    -cos 0.8

A deep ReID CNN for omni-scale feature learning
License	MIT Commercial use
Downloads	753
Tasks	Image Embedding
Model Types	ONNX

Model Variants

Name	Version	Available For	Created At	Deploy
		RVC2, RVC4	8 months ago
		RVC2, RVC4	8 months ago
		RVC2, RVC4	8 months ago