scutcyr's picture
Update README.md
c482252
metadata
license: apache-2.0

Model Card: dstc11-simmc2.1-scut-bds-lab

Team: scut-bds-lab

Recent Update

Overview

The SIMMC2.1 challenge aims to lay the foundations for the real-world assistant agents that can handle multimodal inputs, and perform multimodal actions. It has 4 tasks: Ambiguous Candidate Identification, Multimodal Coreference Resolution, Multimodal Dialog State Tracking, Response Generation. We consider the joint input of textual context, tokenized objects and scene as multi-modal input, as well as compare the performance of single task training and multi task joint training. As to subtask4, we also consider the system belief state (act and slot values) as the prombt for response generation. Non-visual metadata is also considered by adding the embedding to the object.

Model Date

Model was originally released in October 2022.

Model Type

The mt-bart, mt-bart-sys and mt-bart-sys-nvattr have the same model framework (transformer with multi-task head), which are finetuned on SIMMC2.1 based on the pretrained BART-Large model. This repository also contains code to finetune the model.

Results

devtest result

Model Subtask-1 Amb. Candi. F1 Subtask-2 MM Coref F1 Subtask-3 MM DST Slot F1 Subtask-3 MM DST Intent F1 Subtask-4 Response Gen. BLEU-4
mt-bart-ensemble 0.68466 0.77860 0.91816 0.97828 0.34496
mt-bart-dstcla 0.67589 0.78407 0.92013 0.97468
mt-bart-dstcla-ensemble 0.67777 0.78640 0.92055 0.97456
mt-bart-sys 0.39064
mt-bart-sys-2 0.3909
mt-bart-sys-ensemble 0.3894
mt-bart-sys-nvattr 0.38995

teststd result

The teststd result is provided in the teststd-result. One subfolder corresponds to one model.

Using with Transformers

(1) You should first download the model from huggingface used the scripts:

cd ~
mkdir pretrained_model
cd pretrained_model
git lfs install
git clone https://huggingface.co/scutcyr/dstc11-simmc2.1-scut-bds-lab

(2) Then you should clone our code use the follow scripts:

cd ~
git clone https://github.com/scutcyr/dstc11-simmc2.1-scut-bds-lab.git

(3) Follow the README to use the model.

References

@inproceedings{kottur-etal-2021-simmc,
    title = "{SIMMC} 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations",
    author = "Kottur, Satwik  and
      Moon, Seungwhan  and
      Geramifard, Alborz  and
      Damavandi, Babak",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.401",
    doi = "10.18653/v1/2021.emnlp-main.401",
    pages = "4903--4912",
}

@inproceedings{lee-etal-2022-learning,
    title = "Learning to Embed Multi-Modal Contexts for Situated Conversational Agents",
    author = "Lee, Haeju  and
      Kwon, Oh Joon  and
      Choi, Yunseon  and
      Park, Minho  and
      Han, Ran  and
      Kim, Yoonhyung  and
      Kim, Jinhyeon  and
      Lee, Youngjune  and
      Shin, Haebin  and
      Lee, Kangwook  and
      Kim, Kee-Eung",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-naacl.61",
    doi = "10.18653/v1/2022.findings-naacl.61",
    pages = "813--830",
}

Acknowledge

  • We would like to express our gratitude to the authors of Hugging Face's Transformers🤗 and its open source community for the excellent design on pretrained models usage.
  • We would like to express our gratitude to Meta Research | Facebook AI Research for the SIMMC2.1 dataset and the baseline code.
  • We would like to express our gratitude to KAIST-AILab for the basic research framework on SIMMC2.0.