hltcoe
/

plaidx-large-neuclir-mtd-mix-entries-mt5xxl-engeng

@@ -1,5 +1,5 @@
 ---
-language:
 - en
 - zh
 - fa
@@ -35,9 +35,9 @@ license: mit
 Multilingual Translate-Distill is a training technique that produces state-of-the-art MLIR dense retrieval model through translation and distillation.
 `plaidx-large-neuclir-mtd-mix-entries-mt5xxl-engeng` is trained with KL-Divergence from the `mt5xxl` MonoT5 reranker
 [`unicamp-dl/mt5-13b-mmarco-100k`](https://huggingface.co/unicamp-dl/mt5-13b-mmarco-100k)
-inferenced on English MS MARCO training queries and passages.
-The teacher scores can be found in
-[`hltcoe/tdist-msmarco-scores`](https://huggingface.co/datasets/hltcoe/tdist-msmarco-scores/blob/main/t53b-monot5-msmarco-engeng.jsonl.gz).
 ### Training Parameters
@@ -49,18 +49,18 @@ The teacher scores can be found in
 ### Mixing Strategies
-- `mix-passages`: languages are randomly assigned to the 6 sampled passages for a given query during training.
-- `mix-entries`: all passages in the a given query-passage set are randomly assigned to the same language.
-- `round-robin-entires`: for each query, the query-passage set is repeated `n` times to iterate through all languages.
 ## Usage
-To properly load ColBERT-X models from Huggingface Hub, please use the following version of PLAID-X.
 ```bash
 pip install PLAID-X>=0.3.1
 ```
-Following code snippet loads the model through Huggingface API.
 ```python
 from colbert.modeling.checkpoint import Checkpoint
 from colbert.infra import ColBERTConfig
@@ -68,12 +68,12 @@ from colbert.infra import ColBERTConfig
 Checkpoint('hltcoe/plaidx-large-neuclir-mtd-mix-entries-mt5xxl-engeng', colbert_config=ColBERTConfig())
 ```
-For full tutorial, please refer to the [PLAID-X Jupyter Notebook](https://colab.research.google.com/github/hltcoe/clir-tutorial/blob/main/notebooks/clir_tutorial_plaidx.ipynb),
-which is part of the [SIGIR 2023 CLIR Tutorial](https://github.com/hltcoe/clir-tutorial).
 ## BibTeX entry and Citation Info
-Please cite the following two papers if you use the model.
 ```bibtex
@@ -93,5 +93,6 @@ Please cite the following two papers if you use the model.
 	title = {Distillation for Multilingual Information Retrieval},
 	booktitle = {Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) (Short Paper) (Accepted)},
 	year = {2024}
 }
 ```

 ---
+language:
 - en
 - zh
 - fa
 Multilingual Translate-Distill is a training technique that produces state-of-the-art MLIR dense retrieval model through translation and distillation.
 `plaidx-large-neuclir-mtd-mix-entries-mt5xxl-engeng` is trained with KL-Divergence from the `mt5xxl` MonoT5 reranker
 [`unicamp-dl/mt5-13b-mmarco-100k`](https://huggingface.co/unicamp-dl/mt5-13b-mmarco-100k)
+inferenced on English MS MARCO training queries and passages.
+The teacher scores can be found in
+[`hltcoe/tdist-msmarco-scores`](https://huggingface.co/datasets/hltcoe/tdist-msmarco-scores/blob/main/t53b-monot5-msmarco-engeng.jsonl.gz).
 ### Training Parameters
 ### Mixing Strategies
+- `mix-passages`: languages are randomly assigned to the 6 sampled passages for a given query during training.
+- `mix-entries`: all passages in the a given query-passage set are randomly assigned to the same language.
+- `round-robin-entires`: for each query, the query-passage set is repeated `n` times to iterate through all languages.
 ## Usage
+To properly load ColBERT-X models from Huggingface Hub, please use the following version of PLAID-X.
 ```bash
 pip install PLAID-X>=0.3.1
 ```
+Following code snippet loads the model through Huggingface API.
 ```python
 from colbert.modeling.checkpoint import Checkpoint
 from colbert.infra import ColBERTConfig
 Checkpoint('hltcoe/plaidx-large-neuclir-mtd-mix-entries-mt5xxl-engeng', colbert_config=ColBERTConfig())
 ```
+For full tutorial, please refer to the [PLAID-X Jupyter Notebook](https://colab.research.google.com/github/hltcoe/clir-tutorial/blob/main/notebooks/clir_tutorial_plaidx.ipynb),
+which is part of the [SIGIR 2023 CLIR Tutorial](https://github.com/hltcoe/clir-tutorial).
 ## BibTeX entry and Citation Info
+Please cite the following two papers if you use the model.
 ```bibtex
 	title = {Distillation for Multilingual Information Retrieval},
 	booktitle = {Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) (Short Paper) (Accepted)},
 	year = {2024}
+        url = {https://arxiv.org/abs/2405.00977}
 }
 ```