File size: 3,489 Bytes
2cd560a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
# MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
This fork from https://github.com/megvii-research/MOTRv2 [MOTRv2](https://arxiv.org/abs/2211.09791), and after we will release our code CO-MOT.
## Main Results
### DanceTrack
| **HOTA** | **DetA** | **AssA** | **MOTA** | **IDF1** | **URL** |
| :------: | :------: | :------: | :------: | :------: | :-----------------------------------------------------------------------------------------: |
| 69.9 | 83.0 | 59.0 | 91.9 | 71.7 | [model](https://drive.google.com/file/d/1EA4lndu2yQcVgBKR09KfMe5efbf631Th/view?usp=share_link) |
### Visualization
<!-- |OC-SORT|MOTRv2| -->
|VISAM|
|![](https://raw.githubusercontent.com/BingfengYan/MOTSAM/main/visam.gif)|
## Installation
The codebase is built on top of [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR) and [MOTR](https://github.com/megvii-research/MOTR).
### Requirements
* Install pytorch using conda (optional)
```bash
conda create -n motrv2 python=3.9
conda activate motrv2
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
```
* Other requirements
```bash
pip install -r requirements.txt
```
* Build MultiScaleDeformableAttention
```bash
cd ./models/ops
sh ./make.sh
```
## Usage
### Dataset preparation
1. Download YOLOX detection from [here](https://drive.google.com/file/d/1cdhtztG4dbj7vzWSVSehLL6s0oPalEJo/view?usp=share_link).
2. Please download [DanceTrack](https://dancetrack.github.io/) and [CrowdHuman](https://www.crowdhuman.org/) and unzip them as follows:
```
/data/Dataset/mot
βββ crowdhuman
β βββ annotation_train.odgt
β βββ annotation_trainval.odgt
β βββ annotation_val.odgt
β βββ Images
βββ DanceTrack
β βββ test
β βββ train
β βββ val
βββ det_db_motrv2.json
```
You may use the following command for generating crowdhuman trainval annotation:
```bash
cat annotation_train.odgt annotation_val.odgt > annotation_trainval.odgt
```
### Training
You may download the coco pretrained weight from [Deformable DETR (+ iterative bounding box refinement)](https://github.com/fundamentalvision/Deformable-DETR#:~:text=config%0Alog-,model,-%2B%2B%20two%2Dstage%20Deformable), and modify the `--pretrained` argument to the path of the weight. Then training MOTR on 8 GPUs as following:
```bash
./tools/train.sh configs/motrv2.args
```
### Inference on DanceTrack Test Set
1. Download SAM weigth fro [ViT-H SAM model](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth)
2. run
```bash
# run a simple inference on our pretrained weights
./tools/simple_inference.sh ./motrv2_dancetrack.pth
# Or evaluate an experiment run
# ./tools/eval.sh exps/motrv2/run1
# then zip the results
zip motrv2.zip tracker/ -r
```
if you want run on yourself data, please get detection results from [ByteTrackInference](https://github.com/zyayoung/ByteTrackInference) firstly.
## Acknowledgements
- [MOTR](https://github.com/megvii-research/MOTR)
- [ByteTrack](https://github.com/ifzhang/ByteTrack)
- [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX)
- [OC-SORT](https://github.com/noahcao/OC_SORT)
- [DanceTrack](https://github.com/DanceTrack/DanceTrack)
- [BDD100K](https://github.com/bdd100k/bdd100k)
|