File size: 3,797 Bytes
d6c7f38
 
acd4732
 
7b47b4d
 
 
a33d9fc
f72c644
d6c7f38
6ac98f5
 
 
 
 
 
cc2ef74
 
 
bc770b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152fd9e
 
 
 
 
 
 
 
588ccff
d6c7f38
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
pipeline_tag: object-detection
datasets:
- Mai0313/coco-pose-2017
tags:
- TensorRT
- Pose Estimation
- YOLO-NAS-Pose
- Jetson Orin
---

We offer a TensorRT model in various precisions including int8, fp16, fp32, and mixed, converted from Deci-AI's [YOLO-NAS-Pose](https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS-POSE.md) pre-trained weights in PyTorch. 
This model is compatible with Jetson Orin Nano hardware. 

Note that all quantization that has been introduced in the conversion is purely static, meaning that the corresponding model has potentillay bad accuracy compared to the original one.

Todo: use [coco-pose-2017](https://huggingface.co/datasets/Mai0313/coco-pose-2017) dataset to calibrate int8 model

More information on calibration for post-training quantization, check [this slide](https://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf)

# Large

| Model Name | ONNX Precision | TensorRT Preicion | Throughput (TensorRT) |
|---|---|---|---|
| yolo_nas_pose_l_fp16.onnx.best.engine |  FP16 | FP32+FP16+INT8 | 46.7231 qps |
| yolo_nas_pose_l_fp16.onnx.fp16.engine | FP16 | FP32+FP16 | 29.6093 qps |
| yolo_nas_pose_l_fp32.onnx.best.engine |  FP32 | FP32+FP16+INT8 | 47.4032 qps |
| yolo_nas_pose_l_fp32.onnx.engine | FP32 | FP32 | 15.0654 qps |
| yolo_nas_pose_l_fp32.onnx.fp16.engine | FP32 | FP32+FP16 | 29.0005 qps |
| yolo_nas_pose_l_fp32.onnx.int8.engine | FP32 | FP32+INT8 | 47.9071 qps |
| yolo_nas_pose_l_int8.onnx.best.engine |  INT8 | FP32+FP16+INT8 | 36.9695 qps |
| yolo_nas_pose_l_int8.onnx.int8.engine |  INT8 | FP32+INT8 | 30.9676 qps |

# Medium

| Model Name | ONNX Precision | TensorRT Preicion | Throughput (TensorRT) |
|---|---|---|---|
| yolo_nas_pose_m_fp16.onnx.best.engine |  FP16 | FP32+FP16+INT8 | 58.254 qps |
| yolo_nas_pose_m_fp16.onnx.fp16.engine | FP16 | FP32+FP16 | 37.8547 qps |
| yolo_nas_pose_m_fp32.onnx.best.engine |  FP32 | FP32+FP16+INT8 | 58.0306 qps |
| yolo_nas_pose_m_fp32.onnx.engine | FP32 | FP32 | 18.9603 qps |
| yolo_nas_pose_m_fp32.onnx.fp16.engine | FP32 | FP32+FP16 | 37.193 qps |
| yolo_nas_pose_m_fp32.onnx.int8.engine | FP32 | FP32+INT8 | 59.9746 qps |
| yolo_nas_pose_m_int8.onnx.best.engine |  INT8 | FP32+FP16+INT8 | 44.8046 qps |
| yolo_nas_pose_m_int8.onnx.int8.engine |  INT8 | FP32+INT8 | 38.6757 qps |

# Small

| Model Name | ONNX Precision | TensorRT Preicion | Throughput (TensorRT) |
|---|---|---|---|
| yolo_nas_pose_s_fp16.onnx.best.engine |  FP16 | FP32+FP16+INT8 |84.7072 qps|
| yolo_nas_pose_s_fp16.onnx.fp16.engine | FP16 | FP32+FP16 | 66.0151 qps |
| yolo_nas_pose_s_fp32.onnx.best.engine |  FP32 | FP32+FP16+INT8 | 85.5718 qps |
| yolo_nas_pose_s_fp32.onnx.engine | FP32 | FP32 | 33.5963 qps |
| yolo_nas_pose_s_fp32.onnx.fp16.engine | FP32 | FP32+FP16 | 65.4357 qps |
| yolo_nas_pose_s_fp32.onnx.int8.engine | FP32 | FP32+INT8 | 86.3202 qps|
| yolo_nas_pose_s_int8.onnx.best.engine |  INT8 | FP32+FP16+INT8 | 74.2494 qps |
| yolo_nas_pose_s_int8.onnx.int8.engine |  INT8 | FP32+INT8 | 63.7546 qps |


# Nano

| Model Name | ONNX Precision | TensorRT Preicion | Throughput (TensorRT) |
|---|---|---|---|
| yolo_nas_pose_n_fp16.onnx.best.engine |  FP16 | FP32+FP16+INT8 | 91.8287 qps |
| yolo_nas_pose_n_fp16.onnx.fp16.engine | FP16 | FP32+FP16 | 85.4187 qps|
| yolo_nas_pose_n_fp32.onnx.best.engine |  FP32 | FP32+FP16+INT8 | 105.519 qps|
| yolo_nas_pose_n_fp32.onnx.engine | FP32 | FP32 | 47.8265 qps |
| yolo_nas_pose_n_fp32.onnx.fp16.engine | FP32 | FP32+FP16 | 82.3834 qps|
| yolo_nas_pose_n_fp32.onnx.int8.engine | FP32 | FP32+INT8 | 88.0719 qps |
| yolo_nas_pose_n_int8.onnx.best.engine |  INT8 | FP32+FP16+INT8 | 80.8271 qps |
| yolo_nas_pose_n_int8.onnx.int8.engine |  INT8 | FP32+INT8 | 74.2658 qps |

![alt text](benchmark.png "Benchmark")