Luxonis
    Our new model ZOO works with DepthAI V3. Find out more in our documentation.
    0 Likes
    Model Details
    Model Description
    The Omni-Scale Network (OSNet) is a deep convolutional neural network (CNN) designed for person re-identification (re-ID), a task that requires discriminative features to distinguish between individuals across different camera views. OSNet excels in learning omni-scale features—features that capture both homogeneous and heterogeneous spatial scales—crucial for recognizing people in varied poses, clothing, and environments.
    • Developed by: Kaiyang Zhou et al.
    • Shared by:
    • Model type: Computer Vision
    • License:
    • Resources for more information:
    Training Details
    Training Data
    The model has been trained and evaluated (mainly) on four different datasets:
    1. dataset. The Market1501 dataset consists of 32217 images, of which the 12936 are training images, the 3368 are query images, and the 15913 are gallery images.
    2. dataset. The DukeMTMC dataset consists of 36411 images, of which the 16522 are training images, the 2228 are query images, and the 17661 are gallery images.
    3. dataset. The MSMT17 dataset consists of 124068 images, of which the 30248 are training images, the 11659 are query images, and the 82161 are gallery images.
    4. dataset. The CUHK03 dataset consists of 14097 images, of which the 7365 are training images, the 1400 are query images, and the 5332 are gallery images.
    Testing Details
    Metrics
    These results showcase the performance of OSNet on the four datasets Market1501, DukeMTMC, MSMT17, and CUHK03. The results are obtained from the project's and the .
    Same-domain ReID
    The models below are trained and evaluated on the same (single) dataset.
    DatasetR1 (%)mAP (%)
    Market150194.282.6
    DukeMTMC87.070.2
    MSMT1774.943.8
    CUHK0372.367.8
    Multi-source domain generalization
    The models below are trained using multiple source datasets.
    Source DatasetsTarget DatasetR1 (%)mAP (%)
    MSMT17+DukeMTMC+CUHK03Market150172.544.2
    MSMT17+Market1501+CUHK03DukeMTMC65.247.0
    MSMT17+DukeMTMC+Market1501CUHK0323.923.3
    DukeMTMC+Market1501+CUHK03MSMT1733.212.6
    Technical Specifications
    Input/Output Details
    • Input:
      • Name: images
        • Info: NCHW, BGR un-normalized image
    • Output:
      • Name: output
        • Info: NF, the output embeddings of the model
    Model Architecture
    OSNet uses a unique residual block with multiple convolutional streams, each focusing on different scales, and a unified aggregation gate that dynamically fuses these multi-scale features with input-dependent channel-wise weights. By employing pointwise and depthwise convolutions, OSNet efficiently models spatial-channel correlations while avoiding overfitting.
    Throughput
    Model variant: osnet:market1501-128x256
    • Input shape: [1, 3, 256, 128] • Output shape: [1, 512]
    • Params (M): 2.160 • GFLOPs: 1.002
    PlatformPrecisionThroughput (infs/sec)Power Consumption (W)
    RVC2FP1649.09N/A
    RVC4FP16571.043.37
    Model variant: osnet:imagenet-128x256
    • Input shape: [1, 3, 256, 128] • Output shape: [1, 512]
    • Params (M): 2.160 • GFLOPs: 1.002
    PlatformPrecisionThroughput (infs/sec)Power Consumption (W)
    RVC2FP1649.01N/A
    RVC4FP16571.553.75
    Model variant: osnet:multi-source-domain-128x256
    • Input shape: [1, 3, 256, 128] • Output shape: [1, 512]
    • Params (M): 2.160 • GFLOPs: 1.002
    PlatformPrecisionThroughput (infs/sec)Power Consumption (W)
    RVC2FP1649.16N/A
    RVC4FP16571.113.36
    * Benchmarked with , using 2 threads (and the DSP runtime in balanced mode for RVC4).
    * Parameters and FLOPs are obtained from the package.
    Utilization
    Models converted for RVC Platforms can be used for inference on OAK devices. DepthAI pipelines are used to define the information flow linking the device, inference model, and the output parser (as defined in model head(s)). Below, we present the most crucial utilization steps for the particular model. Please consult the docs for more information.
    Install DAIv3 and depthai-nodes libraries:
    pip install depthai
    pip install depthai-nodes
    
    Define model:
    model_description = dai.NNModelDescription(
        "luxonis/osnet:imagenet-128x256"
    )
    
    nn = pipeline.create(ParsingNeuralNetwork).build(
        <CameraNode>, model_description
    )
    
    Inspect model head(s):
    • EmbeddingsParser that outputs dai.NNData containing the output embeddings of the model.
    Get parsed output(s):
    while pipeline.isRuning():
        parser_output: dai.NNData = parser_output_queue.get()
        embeddings = message.getTensor("output")  # embeddings of shape (1, 512)
    
    Example
    You can quickly run the model using our example.
    The example demonstrates how to build a 2-stage DepthAI pipeline consisting of a detection model and a recognition model. It automatically downloads the models, creates a DepthAI pipeline, runs the inference, and displays the results using our DepthAI visualizer tool.
    To try it out, run:
    python3 main.py \
        -det luxonis/scrfd-person-detection:25g-640x640 \
        -rec luxonis/osnet:imagenet-128x256 \
        -cos 0.8
    
    OSNet
    A deep ReID CNN for omni-scale feature learning
    License
    MIT
    Commercial use
    Downloads
    753
    Tasks
    Image Embedding
    Model Types
    ONNX
    Model Variants
    NameVersionAvailable ForCreated AtDeploy
    RVC2, RVC48 months ago
    RVC2, RVC48 months ago
    RVC2, RVC48 months ago
    Luxonis - Robotic vision made simple.
    XYouTubeLinkedInGitHub