|
--- |
|
language: de |
|
license: mit |
|
tags: |
|
- pytorch |
|
- causal-lm |
|
datasets: |
|
- c4 |
|
--- |
|
|
|
# Cedille AI |
|
Cedille is a project to bring large language models to non-English languages. |
|
|
|
## de-anna |
|
Anna is a 6B parameter autoregressive language model based on the GPT-J architecture and trained using the [mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax) codebase. |
|
|
|
Anna was trained on German text with a similar methodology to [Boris](https://huggingface.co/Cedille/fr-boris), our French model. We started training from GPT-J, which has been trained on [The Pile](https://pile.eleuther.ai/). As a consequence the model still has good performance in English language. Anna makes use of the unmodified GPT-2 tokenizer. |
|
|
|
# How to run |
|
|
|
## Loading the model |
|
### Base (requires 48+ GB of RAM) |
|
``` |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna") |
|
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna") |
|
``` |
|
### Lower memory usage |
|
Loading a model with Huggingface requires two copies of the weights, so 48+ GB of RAM for [GPT-J models](https://huggingface.co/docs/transformers/v4.15.0/model_doc/gptj) in float32 precision. |
|
The first trick would be to load the model with the specific argument below to load only one copy of the weights. |
|
``` |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna") |
|
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna", low_cup_mem_usage=True) |
|
``` |
|
|
|
We are planning on adding an fp16 branch soon. Combined with the lower memory loading above, loading could be done on 12.1GB of RAM. |
|
|
|
## Generation example |
|
``` |
|
model.eval() |
|
input_sentence = "Wo hast du unsere Sprache gelernt?" |
|
input_ids = tokenizer.encode(input_sentence, return_tensors='pt') |
|
|
|
beam_outputs = model.generate( |
|
input_ids, |
|
max_length=100, |
|
do_sample=True, |
|
top_k=50, |
|
top_p=0.95, |
|
num_return_sequences=1 |
|
) |
|
print(tokenizer.decode(beam_outputs[0], skip_special_tokens=True)) |
|
``` |
|
## Contact us |
|
For any custom development please contact us at [email protected]. |
|
|
|
## Links |
|
* [Official website](https://en.cedille.ai/) |
|
* [Blog](https://en.cedille.ai/blog) |
|
* [GitHub](https://github.com/coteries/cedille-ai) |
|
* [Twitter](https://twitter.com/CedilleAI) |
|
|