|
--- |
|
license: apache-2.0 |
|
--- |
|
## Model Card: dstc11-simmc2.1-scut-bds-lab |
|
|
|
**Team**: [scut-bds-lab](https://github.com/scut-bds) |
|
|
|
## Recent Update |
|
- 👏🏻 2022.10.10: The repository `dstc11-simmc2.1-scut-bds-lab` for [DSTC11 Track1](https://github.com/facebookresearch/simmc2) is created. |
|
- 👏🏻 2022.10.28: The model is public on huggingface, see the link [https://huggingface.co/scutcyr/dstc11-simmc2.1-scut-bds-lab](https://huggingface.co/scutcyr/dstc11-simmc2.1-scut-bds-lab) for detail. |
|
|
|
## Overview |
|
The [SIMMC2.1](https://github.com/facebookresearch/simmc2) challenge aims to lay the foundations for the real-world assistant agents that can handle multimodal inputs, and perform multimodal actions. It has 4 tasks: Ambiguous Candidate Identification, Multimodal Coreference Resolution, Multimodal Dialog State Tracking, Response Generation. We consider the joint input of textual context, tokenized objects and scene as multi-modal input, as well as compare the performance of single task training and multi task joint training. |
|
As to subtask4, we also consider the system belief state (act and slot values) as the prombt for response generation. Non-visual metadata is also considered by adding the embedding to the object. |
|
|
|
## Model Date |
|
Model was originally released in October 2022. |
|
|
|
## Model Type |
|
The **mt-bart**, **mt-bart-sys** and **mt-bart-sys-nvattr** have the same model framework (transformer with multi-task head), which are finetuned on [SIMMC2.1](https://github.com/facebookresearch/simmc2) based on the pretrained [BART-Large](https://huggingface.co/facebook/bart-large) model. This [repository](https://github.com/scutcyr/dstc11-simmc2.1-scut-bds-lab) also contains code to finetune the model. |
|
|
|
|
|
## Results |
|
|
|
### devtest result |
|
|
|
| Model | Subtask-1 Amb. Candi. F1 | Subtask-2 MM Coref F1 | Subtask-3 MM DST Slot F1 | Subtask-3 MM DST Intent F1 | Subtask-4 Response Gen. BLEU-4 | |
|
|:----:|:----:|:----:|:----:|:----:|:----:| |
|
| mt-bart-ensemble | 0.68466 | 0.77860 | 0.91816 | 0.97828 | 0.34496 | |
|
| mt-bart-dstcla | 0.67589 | 0.78407 | 0.92013 | 0.97468 | | |
|
| mt-bart-dstcla-ensemble | 0.67777 | 0.78640 | 0.92055 | 0.97456 | | |
|
| mt-bart-sys | | | | | 0.39064 | |
|
| mt-bart-sys-2 | | | | | 0.3909 | |
|
| mt-bart-sys-ensemble | | | | | 0.3894 | |
|
| mt-bart-sys-nvattr | | | | | 0.38995 | |
|
|
|
### teststd result |
|
The teststd result is provided in the [teststd-result](https://github.com/scutcyr/dstc11-simmc2.1-iflytek/blob/main/results/teststd-result). One subfolder corresponds to one model. |
|
|
|
|
|
## Using with Transformers |
|
(1) You should first download the model from huggingface used the scripts: |
|
```bash |
|
cd ~ |
|
mkdir pretrained_model |
|
cd pretrained_model |
|
git lfs install |
|
git clone https://huggingface.co/scutcyr/dstc11-simmc2.1-scut-bds-lab |
|
``` |
|
(2) Then you should clone our code use the follow scripts: |
|
```bash |
|
cd ~ |
|
git clone https://github.com/scutcyr/dstc11-simmc2.1-scut-bds-lab.git |
|
``` |
|
(3) Follow the [README](https://github.com/scutcyr/dstc11-simmc2.1-scut-bds-lab#readme) to use the model. |
|
|
|
|
|
## References |
|
|
|
``` |
|
@inproceedings{kottur-etal-2021-simmc, |
|
title = "{SIMMC} 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations", |
|
author = "Kottur, Satwik and |
|
Moon, Seungwhan and |
|
Geramifard, Alborz and |
|
Damavandi, Babak", |
|
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", |
|
month = nov, |
|
year = "2021", |
|
address = "Online and Punta Cana, Dominican Republic", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2021.emnlp-main.401", |
|
doi = "10.18653/v1/2021.emnlp-main.401", |
|
pages = "4903--4912", |
|
} |
|
|
|
@inproceedings{lee-etal-2022-learning, |
|
title = "Learning to Embed Multi-Modal Contexts for Situated Conversational Agents", |
|
author = "Lee, Haeju and |
|
Kwon, Oh Joon and |
|
Choi, Yunseon and |
|
Park, Minho and |
|
Han, Ran and |
|
Kim, Yoonhyung and |
|
Kim, Jinhyeon and |
|
Lee, Youngjune and |
|
Shin, Haebin and |
|
Lee, Kangwook and |
|
Kim, Kee-Eung", |
|
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022", |
|
month = jul, |
|
year = "2022", |
|
address = "Seattle, United States", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2022.findings-naacl.61", |
|
doi = "10.18653/v1/2022.findings-naacl.61", |
|
pages = "813--830", |
|
} |
|
``` |
|
|
|
|
|
## Acknowledge |
|
* We would like to express our gratitude to the authors of [Hugging Face's Transformers🤗](https://huggingface.co/) and its open source community for the excellent design on pretrained models usage. |
|
* We would like to express our gratitude to [Meta Research | Facebook AI Research](https://github.com/facebookresearch) for the SIMMC2.1 dataset and the baseline code. |
|
* We would like to express our gratitude to [KAIST-AILab](https://github.com/KAIST-AILab/DSTC10-SIMMC) for the basic research framework on SIMMC2.0. |