catallama
/

CataLlama-v0.1-Instruct-SFT

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

CataLlama-v0.1-Instruct-SFT / README.md

laurentiubp's picture

Update README.md

f7ca61f verified 4 months ago

|

3.32 kB

	---
	license: llama3
	base_model: catallama/CataLlama-v0.1-Base
	tags:
	- llama
	- llama-3
	- Catalan
	model-index:
	- name: CataLlama-v0.1-Instruct-SFT
	results: []
	datasets:
	- catallama/Catalan-Instruct
	language:
	- ca
	- en
	pipeline_tag: text-generation
	---

	CataLlama-v0.1-Instruct-SFT is an instruct fine-tune of [catallama/CataLlama-v0.1-Base](https://huggingface.co/catallama/CataLlama-v0.1-Base) on the [catallama/Catalan-Instruct](https://huggingface.co/datasets/catallama/Catalan-Instruct) dataset.

	The model shows improved proficiency with the Catalan language.

	This is an instruction fine-tuned model proficient on the following tasks in Catalan

	- Information extraction (suitable for RAG)
	- Named Entity Recognition (NER)
	- Translation from English to Catalan and Catalan to English
	- Summarization - both short form and long form
	- Chat
	- Sentiment analysis
	- Open question answering

	The model achieves a loss rate of 0.8528 on the validation dataset after two epochs.


	Model developers [Laurentiu Petrea](https://www.linkedin.com/in/laurentiupetrea/) based on Llama-3 from Meta.

	Model Architecture CataLlama is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and direct preference optimisation (DPO) to align with human preferences for helpfulness and safety.

	License The model uses the llama-3 license available at: [https://llama.meta.com/llama3/license](https://llama.meta.com/llama3/license)


	### Use with transformers

	See the snippet below for usage with Transformers:

	The model follows the same prompt template as Llama-3 Instruct

	```python
	import transformers
	import torch

	model_id = "catallama/CataLlama-v0.1-Base"

	pipeline = transformers.pipeline(
	"text-generation",
	model=model_id,
	model_kwargs={"torch_dtype": torch.bfloat16},
	device_map="auto",
	)

	messages = [
	{"role": "user", "content": "Ei com estàs avui?"},
	]

	prompt = pipeline.tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	outputs = pipeline(
	prompt,
	max_new_tokens=1024,
	do_sample=True,
	temperature=0.6,
	top_p=0.9,
	)

	print(outputs[0]["generated_text"][len(prompt):])
	```

	## Training procedure

	The model was trained with the same prompt template of Llama-3 Instruct.

	The model was trained for two epochs on 6x A100 80GB GPUs using DeepSpeed ZeRO State-3 without CPU offloading.

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- distributed_type: multi-GPU
	- num_devices: 6
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:----------------:\|
	\| 1.0186 \| 0.22 \| 200 \| 1.0209 \|
	\| 0.9588 \| 0.43 \| 400 \| 0.9489 \|
	\| 0.9111 \| 0.65 \| 600 \| 0.9086 \|
	\| 0.8971 \| 0.86 \| 800 \| 0.8886 \|
	\| 0.8002 \| 1.22 \| 1000 \| 0.8989 \|
	\| 0.8068 \| 1.43 \| 1200 \| 0.8835 \|
	\| 0.7722 \| 1.65 \| 1400 \| 0.8654 \|
	\| 0.7805 \| 1.86 \| 1600 \| 0.8528 \|