File size: 3,923 Bytes
d6c7f38
 
410a1cd
acd4732
 
7b47b4d
 
a33d9fc
f72c644
96d8b84
 
d6c7f38
6ac98f5
b040903
96d8b84
6ac98f5
4d143ab
6ac98f5
4d143ab
cc2ef74
 
bc770b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152fd9e
 
 
 
 
 
 
 
588ccff
d6c7f38
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
pipeline_tag: object-detection
inference: false
datasets:
- Mai0313/coco-pose-2017
tags:
- Pose Estimation
- YOLO-NAS-Pose
- Jetson Orin
- JetPack 5.1.1
- TensorRT 8.5.2
---

We offer a TensorRT model in various precisions including int8, fp16, fp32, and mixed, converted from Deci-AI's [YOLO-NAS-Pose](https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS-POSE.md) pre-trained weights (which is only allowed for non-commerical use) in PyTorch. 
This (TensorRT) model is compatible with JetPack 5.1.1, benchmarked and tested on Jetson Orin Nano Deveoper Kit. 

~~Note that all quantization that has been introduced in the conversion is purely static, meaning that the corresponding model has potentillay bad accuracy compared to the original one.~~

Todo: ~~use [cppe-5](https://huggingface.co/datasets/cppe-5) dataset to calibrate int8 model~~

More information on calibration for post-training quantization, check [this slide](https://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf)

# Large

| Model Name | ONNX Precision | TensorRT Preicion | Throughput (TensorRT) |
|---|---|---|---|
| yolo_nas_pose_l_fp16.onnx.best.engine |  FP16 | FP32+FP16+INT8 | 46.7231 qps |
| yolo_nas_pose_l_fp16.onnx.fp16.engine | FP16 | FP32+FP16 | 29.6093 qps |
| yolo_nas_pose_l_fp32.onnx.best.engine |  FP32 | FP32+FP16+INT8 | 47.4032 qps |
| yolo_nas_pose_l_fp32.onnx.engine | FP32 | FP32 | 15.0654 qps |
| yolo_nas_pose_l_fp32.onnx.fp16.engine | FP32 | FP32+FP16 | 29.0005 qps |
| yolo_nas_pose_l_fp32.onnx.int8.engine | FP32 | FP32+INT8 | 47.9071 qps |
| yolo_nas_pose_l_int8.onnx.best.engine |  INT8 | FP32+FP16+INT8 | 36.9695 qps |
| yolo_nas_pose_l_int8.onnx.int8.engine |  INT8 | FP32+INT8 | 30.9676 qps |

# Medium

| Model Name | ONNX Precision | TensorRT Preicion | Throughput (TensorRT) |
|---|---|---|---|
| yolo_nas_pose_m_fp16.onnx.best.engine |  FP16 | FP32+FP16+INT8 | 58.254 qps |
| yolo_nas_pose_m_fp16.onnx.fp16.engine | FP16 | FP32+FP16 | 37.8547 qps |
| yolo_nas_pose_m_fp32.onnx.best.engine |  FP32 | FP32+FP16+INT8 | 58.0306 qps |
| yolo_nas_pose_m_fp32.onnx.engine | FP32 | FP32 | 18.9603 qps |
| yolo_nas_pose_m_fp32.onnx.fp16.engine | FP32 | FP32+FP16 | 37.193 qps |
| yolo_nas_pose_m_fp32.onnx.int8.engine | FP32 | FP32+INT8 | 59.9746 qps |
| yolo_nas_pose_m_int8.onnx.best.engine |  INT8 | FP32+FP16+INT8 | 44.8046 qps |
| yolo_nas_pose_m_int8.onnx.int8.engine |  INT8 | FP32+INT8 | 38.6757 qps |

# Small

| Model Name | ONNX Precision | TensorRT Preicion | Throughput (TensorRT) |
|---|---|---|---|
| yolo_nas_pose_s_fp16.onnx.best.engine |  FP16 | FP32+FP16+INT8 |84.7072 qps|
| yolo_nas_pose_s_fp16.onnx.fp16.engine | FP16 | FP32+FP16 | 66.0151 qps |
| yolo_nas_pose_s_fp32.onnx.best.engine |  FP32 | FP32+FP16+INT8 | 85.5718 qps |
| yolo_nas_pose_s_fp32.onnx.engine | FP32 | FP32 | 33.5963 qps |
| yolo_nas_pose_s_fp32.onnx.fp16.engine | FP32 | FP32+FP16 | 65.4357 qps |
| yolo_nas_pose_s_fp32.onnx.int8.engine | FP32 | FP32+INT8 | 86.3202 qps|
| yolo_nas_pose_s_int8.onnx.best.engine |  INT8 | FP32+FP16+INT8 | 74.2494 qps |
| yolo_nas_pose_s_int8.onnx.int8.engine |  INT8 | FP32+INT8 | 63.7546 qps |


# Nano

| Model Name | ONNX Precision | TensorRT Preicion | Throughput (TensorRT) |
|---|---|---|---|
| yolo_nas_pose_n_fp16.onnx.best.engine |  FP16 | FP32+FP16+INT8 | 91.8287 qps |
| yolo_nas_pose_n_fp16.onnx.fp16.engine | FP16 | FP32+FP16 | 85.4187 qps|
| yolo_nas_pose_n_fp32.onnx.best.engine |  FP32 | FP32+FP16+INT8 | 105.519 qps|
| yolo_nas_pose_n_fp32.onnx.engine | FP32 | FP32 | 47.8265 qps |
| yolo_nas_pose_n_fp32.onnx.fp16.engine | FP32 | FP32+FP16 | 82.3834 qps|
| yolo_nas_pose_n_fp32.onnx.int8.engine | FP32 | FP32+INT8 | 88.0719 qps |
| yolo_nas_pose_n_int8.onnx.best.engine |  INT8 | FP32+FP16+INT8 | 80.8271 qps |
| yolo_nas_pose_n_int8.onnx.int8.engine |  INT8 | FP32+INT8 | 74.2658 qps |

![alt text](benchmark.png "Benchmark")