Cedille
/

de-anna

Text Generation

Inference Endpoints

Model card Files Files and versions Community

de-anna / README.md

Cedille's picture

Update README.md

3d4e8d1 over 1 year ago

|

2.32 kB

	---
	language: de
	license: mit
	tags:
	- pytorch
	- causal-lm
	datasets:
	- c4
	---

	# Cedille AI
	Cedille is a project to bring large language models to non-English languages.

	## de-anna
	Anna is a 6B parameter autoregressive language model based on the GPT-J architecture and trained using the [mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax) codebase.

	Anna was trained on German text with a similar methodology to [Boris](https://huggingface.co/Cedille/fr-boris), our French model. We started training from GPT-J, which has been trained on [The Pile](https://pile.eleuther.ai/). As a consequence the model still has good performance in English language. Anna makes use of the unmodified GPT-2 tokenizer.

	# How to run

	## Loading the model
	### Base (requires 48+ GB of RAM)
	```
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
	model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna")
	```
	### Lower memory usage
	Loading a model with Huggingface requires two copies of the weights, so 48+ GB of RAM for [GPT-J models](https://huggingface.co/docs/transformers/v4.15.0/model_doc/gptj) in float32 precision.
	The first trick would be to load the model with the specific argument below to load only one copy of the weights.
	```
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
	model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna", low_cup_mem_usage=True)
	```

	We are planning on adding an fp16 branch soon. Combined with the lower memory loading above, loading could be done on 12.1GB of RAM.

	## Generation example
	```
	model.eval()
	input_sentence = "Wo hast du unsere Sprache gelernt?"
	input_ids = tokenizer.encode(input_sentence, return_tensors='pt')

	beam_outputs = model.generate(
	input_ids,
	max_length=100,
	do_sample=True,
	top_k=50,
	top_p=0.95,
	num_return_sequences=1
	)
	print(tokenizer.decode(beam_outputs[0], skip_special_tokens=True))
	```
	## Contact us
	For any custom development please contact us at [email protected].

	## Links
	* [Official website](https://en.cedille.ai/)
	* [Blog](https://en.cedille.ai/blog)
	* [GitHub](https://github.com/coteries/cedille-ai)
	* [Twitter](https://twitter.com/CedilleAI)