de-anna / README.md
Cedille's picture
Update README.md
3d4e8d1
|
raw
history blame
2.32 kB
---
language: de
license: mit
tags:
- pytorch
- causal-lm
datasets:
- c4
---
# Cedille AI
Cedille is a project to bring large language models to non-English languages.
## de-anna
Anna is a 6B parameter autoregressive language model based on the GPT-J architecture and trained using the [mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax) codebase.
Anna was trained on German text with a similar methodology to [Boris](https://huggingface.co/Cedille/fr-boris), our French model. We started training from GPT-J, which has been trained on [The Pile](https://pile.eleuther.ai/). As a consequence the model still has good performance in English language. Anna makes use of the unmodified GPT-2 tokenizer.
# How to run
## Loading the model
### Base (requires 48+ GB of RAM)
```
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna")
```
### Lower memory usage
Loading a model with Huggingface requires two copies of the weights, so 48+ GB of RAM for [GPT-J models](https://huggingface.co/docs/transformers/v4.15.0/model_doc/gptj) in float32 precision.
The first trick would be to load the model with the specific argument below to load only one copy of the weights.
```
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna", low_cup_mem_usage=True)
```
We are planning on adding an fp16 branch soon. Combined with the lower memory loading above, loading could be done on 12.1GB of RAM.
## Generation example
```
model.eval()
input_sentence = "Wo hast du unsere Sprache gelernt?"
input_ids = tokenizer.encode(input_sentence, return_tensors='pt')
beam_outputs = model.generate(
input_ids,
max_length=100,
do_sample=True,
top_k=50,
top_p=0.95,
num_return_sequences=1
)
print(tokenizer.decode(beam_outputs[0], skip_special_tokens=True))
```
## Contact us
For any custom development please contact us at [email protected].
## Links
* [Official website](https://en.cedille.ai/)
* [Blog](https://en.cedille.ai/blog)
* [GitHub](https://github.com/coteries/cedille-ai)
* [Twitter](https://twitter.com/CedilleAI)