Muennighoff
commited on
Commit
•
a7c87ea
1
Parent(s):
65566d0
Update model family table
Browse files
README.md
CHANGED
@@ -671,95 +671,89 @@ model-index:
|
|
671 |
- **Point of Contact:** [Niklas Muennighoff](mailto:[email protected])
|
672 |
- **Languages:** Refer to [BLOOM](https://huggingface.co/bigscience/bloom) for pretraining & [xP3](https://huggingface.co/bigscience/xP3) for finetuning language proportions. It understands both pretraining & finetuning languages.
|
673 |
- **BLOOMZ & mT0 Model Family:**
|
674 |
-
|
675 |
<table>
|
676 |
<tr>
|
677 |
-
<th colspan="12">Multitask finetuned on
|
678 |
</tr>
|
679 |
<tr>
|
680 |
-
<
|
681 |
-
<td>
|
682 |
-
<td>
|
683 |
-
<td>
|
684 |
-
<td>
|
685 |
-
<td>
|
686 |
-
<td>560M</td>
|
687 |
-
<td>
|
688 |
-
<td>
|
689 |
-
<td>
|
690 |
-
<td>
|
691 |
-
<td>
|
692 |
</tr>
|
693 |
<tr>
|
694 |
-
<
|
695 |
-
<td>
|
696 |
-
<td>
|
697 |
-
<td>
|
698 |
-
<td>
|
699 |
-
<td>
|
700 |
-
<td>
|
701 |
-
<td>
|
702 |
-
<td>
|
703 |
-
<td>
|
704 |
-
<td>
|
705 |
-
<td>
|
706 |
</tr>
|
|
|
|
|
|
|
707 |
</tr>
|
708 |
<tr>
|
709 |
-
<
|
710 |
-
<td
|
711 |
-
<td
|
712 |
-
<td
|
713 |
-
<td
|
714 |
-
<td>
|
715 |
-
<td
|
716 |
-
<td
|
717 |
-
<td
|
718 |
-
<td
|
719 |
-
<td>
|
720 |
-
<td>
|
721 |
</tr>
|
722 |
-
|
723 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
724 |
</tr>
|
725 |
</table>
|
726 |
|
727 |
-
<table>
|
728 |
-
<tr>
|
729 |
-
<td>One</td>
|
730 |
-
<td>Two</td>
|
731 |
-
</tr>
|
732 |
-
<tr>
|
733 |
-
<td colspan="2">Three</td>
|
734 |
-
</tr>
|
735 |
-
</table>
|
736 |
-
|
737 |
-
|
738 |
-
|Name|Explanation|
|
739 |
-
|----|-----------|
|
740 |
-
|[bloomz-560m](https://huggingface.co/bigscience/bloomz-560m)| 560M parameter multitask finetuned version of [bloom-560m](https://huggingface.co/bigscience/bloom-560m) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
741 |
-
|[bloomz-1b1](https://huggingface.co/bigscience/bloomz-1b1)| 1.1B parameter multitask finetuned version of [bloom-1b1](https://huggingface.co/bigscience/bloom-1b1) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
742 |
-
|[bloomz-1b7](https://huggingface.co/bigscience/bloomz-1b7)| 1.7B parameter multitask finetuned version of [bloom-1b7](https://huggingface.co/bigscience/bloom-1b7) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
743 |
-
|[bloomz-3b](https://huggingface.co/bigscience/bloomz-3b)| 3B parameter multitask finetuned version of [bloom-3b](https://huggingface.co/bigscience/bloom-3b) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
744 |
-
|[bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
745 |
-
|[bloomz](https://huggingface.co/bigscience/bloomz)|176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
746 |
-
|||
|
747 |
-
|[bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt)|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/datasets/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/datasets/xP3mt). **Better than [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1) when prompting in non-English**|
|
748 |
-
|[bloomz-mt](https://huggingface.co/bigscience/bloomz-mt)| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/datasets/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/datasets/xP3mt). **Better than [bloomz](https://huggingface.co/bigscience/bloomz) when prompting in non-English**|
|
749 |
-
|||
|
750 |
-
|[bloomz-7b1-p3](https://huggingface.co/bigscience/bloomz-7b1)| 7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [P3](https://huggingface.co/datasets/bigscience/P3). **Released for research purposes, performance is inferior to [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)**|
|
751 |
-
|[bloomz-p3](https://huggingface.co/bigscience/bloomz)| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [P3](https://huggingface.co/datasets/bigscience/P3). **Released for research purposes, performance is inferior to [bloomz](https://huggingface.co/bigscience/bloomz)**|
|
752 |
-
|||
|
753 |
-
|||
|
754 |
-
|[mt0-small](https://huggingface.co/bigscience/mt0-xxl)|300M parameter multitask finetuned version of [mt5-small](https://huggingface.co/google/mt5-small) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
755 |
-
|[mt0-base](https://huggingface.co/bigscience/mt0-xxl)|580M parameter multitask finetuned version of [mt5-base](https://huggingface.co/google/mt5-base) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
756 |
-
|[mt0-large](https://huggingface.co/bigscience/mt0-xxl)|1.2B parameter multitask finetuned version of [mt5-large](https://huggingface.co/google/mt5-large) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
757 |
-
|[mt0-xl](https://huggingface.co/bigscience/mt0-xxl)|3.7B parameter multitask finetuned version of [mt5-xl](https://huggingface.co/google/mt5-xl) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
758 |
-
|[mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
759 |
-
|||
|
760 |
-
|[mt0-xxl-mt](https://huggingface.co/bigscience/mt0-xxl-mt)|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/datasets/bigscience/xP3) & [xP3mt](https://huggingface.co/datasets/bigscience/xP3mt). **Better than [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl) when prompting in non-English**|
|
761 |
-
|||
|
762 |
-
|[mt0-xxl-p3](https://huggingface.co/bigscience/mt0-xxl-p3)| 13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [P3](https://huggingface.co/datasets/bigscience/P3). **Released for research purposes, performance is inferior to [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)**|
|
763 |
|
764 |
# Use
|
765 |
|
|
|
671 |
- **Point of Contact:** [Niklas Muennighoff](mailto:[email protected])
|
672 |
- **Languages:** Refer to [BLOOM](https://huggingface.co/bigscience/bloom) for pretraining & [xP3](https://huggingface.co/bigscience/xP3) for finetuning language proportions. It understands both pretraining & finetuning languages.
|
673 |
- **BLOOMZ & mT0 Model Family:**
|
|
|
674 |
<table>
|
675 |
<tr>
|
676 |
+
<th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/bigscience/xP3>xP3</a>. Recommended for prompting in English.
|
677 |
</tr>
|
678 |
<tr>
|
679 |
+
<td>Parameters</td>
|
680 |
+
<td>300M</td>
|
681 |
+
<td>580M</td>
|
682 |
+
<td>1.2B</td>
|
683 |
+
<td>3.7B</td>
|
684 |
+
<td>13B</td>
|
685 |
+
<td>560M</td>
|
686 |
+
<td>1.1B</td>
|
687 |
+
<td>1.7B</td>
|
688 |
+
<td>3B</td>
|
689 |
+
<td>7.1B</td>
|
690 |
+
<td>176B</td>
|
691 |
</tr>
|
692 |
<tr>
|
693 |
+
<td>Finetuned Model</td>
|
694 |
+
<td><a href=https://huggingface.co/bigscience/mt0-base>mt0-base</a></td>
|
695 |
+
<td><a href=https://huggingface.co/bigscience/mt0-small>mt0-small</a></td>
|
696 |
+
<td><a href=https://huggingface.co/bigscience/mt0-large>mt0-large</a></td>
|
697 |
+
<td><a href=https://huggingface.co/bigscience/mt0-xl>mt0-xl</a></td>
|
698 |
+
<td><a href=https://huggingface.co/bigscience/mt0-xxl>mt0-xxl</a></td>
|
699 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-560m>bloomz-560m</a></td>
|
700 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-1b1>bloomz-1b1</a></td>
|
701 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-1b7>bloomz-1b7</a></td>
|
702 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-3b>bloomz-3b</a></td>
|
703 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-7b1>bloomz-7b1</a></td>
|
704 |
+
<td><a href=https://huggingface.co/bigscience/bloomz>bloomz</a></td>
|
705 |
</tr>
|
706 |
+
</tr>
|
707 |
+
<tr>
|
708 |
+
<th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/bigscience/xP3mt>xP3mt</a>. Recommended for prompting in non-English.</th>
|
709 |
</tr>
|
710 |
<tr>
|
711 |
+
<td>Finetuned Model</td>
|
712 |
+
<td></td>
|
713 |
+
<td></td>
|
714 |
+
<td></td>
|
715 |
+
<td></td>
|
716 |
+
<td><a href=https://huggingface.co/bigscience/mt0-xxl-mt>mt0-xxl-mt</a></td>
|
717 |
+
<td></td>
|
718 |
+
<td></td>
|
719 |
+
<td></td>
|
720 |
+
<td></td>
|
721 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-7b1-mt>bloomz-7b1-mt</a></td>
|
722 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-mt>bloomz-mt</a></td>
|
723 |
</tr>
|
724 |
+
<th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/bigscience/P3>P3</a>. Released for research purposes only. Strictly inferior to above models!</th>
|
725 |
+
</tr>
|
726 |
+
<tr>
|
727 |
+
<td>Finetuned Model</td>
|
728 |
+
<td></td>
|
729 |
+
<td></td>
|
730 |
+
<td></td>
|
731 |
+
<td></td>
|
732 |
+
<td><a href=https://huggingface.co/bigscience/mt0-xxl-p3>mt0-xxl-p3</a></td>
|
733 |
+
<td></td>
|
734 |
+
<td></td>
|
735 |
+
<td></td>
|
736 |
+
<td></td>
|
737 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-7b1-p3>bloomz-7b1-p3</a></td>
|
738 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-p3>bloomz-p3</a></td>
|
739 |
+
</tr>
|
740 |
+
<th colspan="12">Original pretrained checkpoints. Not recommended.</th>
|
741 |
+
<tr>
|
742 |
+
<td>Pretrained Model</td>
|
743 |
+
<td><a href=https://huggingface.co/bigscience/mt5-base>mt5-base</a></td>
|
744 |
+
<td><a href=https://huggingface.co/bigscience/mt5-small>mt5-small</a></td>
|
745 |
+
<td><a href=https://huggingface.co/bigscience/mt5-large>mt5-large</a></td>
|
746 |
+
<td><a href=https://huggingface.co/bigscience/mt5-xl>mt5-xl</a></td>
|
747 |
+
<td><a href=https://huggingface.co/bigscience/mt5-xxl>mt5-xxl</a></td>
|
748 |
+
<td><a href=https://huggingface.co/bigscience/bloom-560m>bloom-560m</a></td>
|
749 |
+
<td><a href=https://huggingface.co/bigscience/bloom-1b1>bloom-1b1</a></td>
|
750 |
+
<td><a href=https://huggingface.co/bigscience/bloom-1b7>bloom-1b7</a></td>
|
751 |
+
<td><a href=https://huggingface.co/bigscience/bloom-3b>bloom-3b</a></td>
|
752 |
+
<td><a href=https://huggingface.co/bigscience/bloom-7b1>bloom-7b1</a></td>
|
753 |
+
<td><a href=https://huggingface.co/bigscience/bloom>bloom</a></td>
|
754 |
</tr>
|
755 |
</table>
|
756 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
757 |
|
758 |
# Use
|
759 |
|