File size: 2,323 Bytes
fb2e0e1
3670fa0
fb2e0e1
3670fa0
 
 
 
 
fb2e0e1
3670fa0
 
 
 
b4ac2ab
3670fa0
 
 
 
 
e0991c5
 
a6d6f05
e0991c5
 
 
 
 
 
4aeaf54
3d4e8d1
8fcc675
4aeaf54
 
 
 
 
 
 
 
e0991c5
4aeaf54
 
 
f758e10
4aeaf54
3670fa0
4aeaf54
 
 
 
 
 
 
 
 
 
3670fa0
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
language: de
license: mit
tags:
- pytorch
- causal-lm
datasets:
- c4
---

# Cedille AI
Cedille is a project to bring large language models to non-English languages.

## de-anna
Anna is a 6B parameter autoregressive language model based on the GPT-J architecture and trained using the [mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax) codebase.

Anna was trained on German text with a similar methodology to [Boris](https://huggingface.co/Cedille/fr-boris), our French model. We started training from GPT-J, which has been trained on [The Pile](https://pile.eleuther.ai/). As a consequence the model still has good performance in English language. Anna makes use of the unmodified GPT-2 tokenizer.

# How to run

## Loading the model
### Base (requires 48+ GB of RAM)
```
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna")
```
### Lower memory usage
Loading a model with Huggingface requires two copies of the weights, so 48+ GB of RAM for [GPT-J models](https://huggingface.co/docs/transformers/v4.15.0/model_doc/gptj) in float32 precision.
The first trick would be to load the model with the specific argument below to load only one copy of the weights.
```
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna", low_cup_mem_usage=True)
```

We are planning on adding an fp16 branch soon. Combined with the lower memory loading above, loading could be done on 12.1GB of RAM.

## Generation example
```
model.eval()
input_sentence = "Wo hast du unsere Sprache gelernt?"
input_ids = tokenizer.encode(input_sentence, return_tensors='pt')

beam_outputs = model.generate(
    input_ids, 
    max_length=100, 
    do_sample=True,   
    top_k=50, 
    top_p=0.95, 
    num_return_sequences=1
)
print(tokenizer.decode(beam_outputs[0], skip_special_tokens=True))
```
## Contact us
For any custom development please contact us at [email protected].

## Links
* [Official website](https://en.cedille.ai/)
* [Blog](https://en.cedille.ai/blog)
* [GitHub](https://github.com/coteries/cedille-ai)
* [Twitter](https://twitter.com/CedilleAI)