loiccabannes commited on
Commit
d132d33
1 Parent(s): 3ec46e9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -1,3 +1,30 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - SkelterLabsInc/JaQuAD
5
+ language:
6
+ - ja
7
  ---
8
+
9
+ MambaSan-370m-instruct 🐍
10
+
11
+ MambaSan-instruct is the first chat Japanese language model based on a state-space model architecture (Mamba).
12
+
13
+ The model is based on Albert Gu's and Tri Dao's work Mamba: Linear-Time Sequence Modeling with Selective State Spaces (paper) as well as their model implementation. This work was also inspired by heavenq's mamba-chat implementation in English.
14
+
15
+ Mamba-Chat is based on MambaSan-370m and was fine-tuned on 31,7k examples samples of the SkelterLabsInc/JaQuAD dataset. To learn more, you can:
16
+
17
+ - Take a look at the model on [Huggingface](https://huggingface.co/loiccabannes/MambaSan-370m-instruct) 🤗
18
+ - Talk to Mamba-Chat on [Google Colab](https://colab.research.google.com/drive/1ZqHOC_RHU8ilAKreUMc_WNbo_melmNJX?usp=sharing)
19
+
20
+ The Code used for pretraining and finetuning will soon be published on my github: https://github.com/lcabannes
21
+ Citation
22
+
23
+ bibtex
24
+ @misc{lcabannes2024MambaSan-370m-instruct,
25
+ title = {MambaSan-370m-instruct},
26
+ author = {Loïc Cabannes},
27
+ year = {2024},
28
+ howpublished = {HuggingFace},
29
+ url = {https://huggingface.co/loiccabannes/MambaSan-370m-instruct/}
30
+ }