laion
/

anh-bloomz-7b1-mt-cross-lingual

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

cahya commited on Apr 6, 2023

Commit

3ea080c

•

1 Parent(s): 9e72d44

Update README.md

Files changed (1) hide show

README.md +41 -0

README.md CHANGED Viewed

@@ -1,3 +1,44 @@
 ---
 license: bigscience-openrail-m
 ---

 ---
 license: bigscience-openrail-m
+datasets:
+- laion/Anh
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- pytorch
+- casual-lm
+- multilingual
+- instruct
+- bloomz
 ---
+### Model description
+This model is [`bloomz-7b1-mt`](https://huggingface.co/bigscience/bloomz-7b1-mt) model finetuned on instruct dataset `cross_lingual.jsonl` from [`laion/Anh`](https://huggingface.co/datasets/laion/Anh).
+### How to use
+anh-bloomz-7b1-mt-cross-lingual model can be loaded and used via the following code:
+```python
+import re
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "laion/anh-bloomz-7b1-mt-cross-lingual",
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    "laion/anh-bloomz-7b1-mt-cross-lingual",
+)
+whitespace_tokens_map = {'\n': '<n>', '  ': '<w>'}
+text = "User: Apa yang terjadi pada pertempuran Cannae? Jawab dalam bahasa China.\n"
+for k, v in whitespace_tokens_map.items():
+    text = text.replace(k, v)
+inputs = tokenizer(text, return_tensors="pt")
+tokens = model.generate(**inputs)
+output = tokenizer.decode(tokens[0], skip_special_tokens=True)
+for v in whitespace_tokens_map.values():
+    output = re.sub(rf"{v}\s+(\S+)", rf"{v}\1", output)
+for k, v in whitespace_tokens_map.items():
+    output = output.replace(v, k)
+```