loiccabannes
commited on
Commit
•
d132d33
1
Parent(s):
3ec46e9
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,30 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- SkelterLabsInc/JaQuAD
|
5 |
+
language:
|
6 |
+
- ja
|
7 |
---
|
8 |
+
|
9 |
+
MambaSan-370m-instruct 🐍
|
10 |
+
|
11 |
+
MambaSan-instruct is the first chat Japanese language model based on a state-space model architecture (Mamba).
|
12 |
+
|
13 |
+
The model is based on Albert Gu's and Tri Dao's work Mamba: Linear-Time Sequence Modeling with Selective State Spaces (paper) as well as their model implementation. This work was also inspired by heavenq's mamba-chat implementation in English.
|
14 |
+
|
15 |
+
Mamba-Chat is based on MambaSan-370m and was fine-tuned on 31,7k examples samples of the SkelterLabsInc/JaQuAD dataset. To learn more, you can:
|
16 |
+
|
17 |
+
- Take a look at the model on [Huggingface](https://huggingface.co/loiccabannes/MambaSan-370m-instruct) 🤗
|
18 |
+
- Talk to Mamba-Chat on [Google Colab](https://colab.research.google.com/drive/1ZqHOC_RHU8ilAKreUMc_WNbo_melmNJX?usp=sharing)
|
19 |
+
|
20 |
+
The Code used for pretraining and finetuning will soon be published on my github: https://github.com/lcabannes
|
21 |
+
Citation
|
22 |
+
|
23 |
+
bibtex
|
24 |
+
@misc{lcabannes2024MambaSan-370m-instruct,
|
25 |
+
title = {MambaSan-370m-instruct},
|
26 |
+
author = {Loïc Cabannes},
|
27 |
+
year = {2024},
|
28 |
+
howpublished = {HuggingFace},
|
29 |
+
url = {https://huggingface.co/loiccabannes/MambaSan-370m-instruct/}
|
30 |
+
}
|