llama3-s-2024-07-08 / README.md

Update README.md

c8bbc93 verified 4 months ago

6.15 kB

	---
	license: apache-2.0
	datasets:
	- jan-hq/instruction-speech-v1
	language:
	- en
	tags:
	- sound language model
	---

	## Model Details

	We have developed and released the family [Jan-Llama3](https://huggingface.co/collections/jan-hq/jan-llama3-668e4dad446c8736208dca4f). This family is natively understanding audio and text input.

	We continue to expand [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with sound understanding capabilities by leveraging 700M tokens [Instruction Speech v1](https://huggingface.co/datasets/jan-hq/instruction-speech-v1) dataset.

	Model developers Homebrew Research.

	Input Text and sound.

	Output Text.

	Model Architecture Llama-3.

	Language(s): English.

	## Intended Use

	Intended Use Cases This family is primarily intended for research applications. This version aims to further improve the LLM on sound understanding capabilities.

	Out-of-scope The use of Llama-3-Sound in any manner that violates applicable laws or regulations is strictly prohibited.

	## How to Get Started with the Model

	> TODO

	## Training process
	Training Metrics Image: Below is a snapshot of the training loss curve visualized.

	![train_loss_curve/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/9bv-kpnqrTxaBhiYrVHN7.png)

	### Hardware

	GPU Configuration: Cluster of 8x NVIDIA H100-SXM-80GB.
	GPU Usage:
	- Continual Training: 8 hours.

	### Training Arguments

	\| Parameter \| Continual Training \|
	\|----------------------------\|-------------------------\|
	\| Epoch \| 1 \|
	\| Global batch size \| 128 \|
	\| Learning Rate \| 5e-5 \|
	\| Learning Scheduler \| Cosine with warmup \|
	\| Optimizer \| [Adam-mini](https://arxiv.org/abs/2406.16793) \|
	\| Warmup Ratio \| 0.1 \|
	\| Weight Decay \| 0.01 \|
	\| beta1 \| 0.9 \|
	\| beta2 \| 0.98 \|
	\| epsilon \| 1e-6 \|
	\| Gradient Cliping \| 1.0 \|

	###
	Accelerate FSDP Config

	```
	compute_environment: LOCAL_MACHINE
	debug: false
	distributed_type: FSDP
	downcast_bf16: 'no'
	enable_cpu_affinity: true
	fsdp_config:
	fsdp_activation_checkpointing: true
	fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
	fsdp_backward_prefetch: BACKWARD_PRE
	fsdp_cpu_ram_efficient_loading: true
	fsdp_forward_prefetch: false
	fsdp_offload_params: false
	fsdp_sharding_strategy: FULL_SHARD
	fsdp_state_dict_type: SHARDED_STATE_DICT
	fsdp_sync_module_states: true
	fsdp_use_orig_params: false
	machine_rank: 0
	main_training_function: main
	mixed_precision: bf16
	num_machines: 1
	num_processes: 8
	rdzv_backend: static
	same_network: true
	tpu_env: []
	tpu_use_cluster: false
	tpu_use_sudo: false
	use_cpu: false
	```

	## Examples

	1. Good example:
	```
	# Question:
	Does the following review have a positive or negative opinion of the movie?

	i thought the relationships were wonderful ,
	Options: A). negative; B). positive;

	# Expected Output:
	The review has a B). positive opinion about the movie because they think the relationships were wonderful.

	# Model's Output:
	The review is positive.
	```

	2. Misunderstanding example:
	```
	# Question:
	Translate the following sentence to Russian:
	6 Work in JOBS and careers after 40

	Russian:

	# Expected Output:
	6 Работать в РАБОТАХ и карьере после 40

	Russian (Cyrillic): 6 Работать в РАБОТАХ и карьере после 40

	Let me explain this to you like you're five years old: This sentence means that people can still work and have careers, even if they're older than 40 years old. The words are in a different language, called Russian.
	# Model's Output:
	В случае, когда вы хотите перевести фразу "The first step is to identify the language of the text." на русский язык, вы можете использовать следующую фразу: "Первый шаг - определить язык текста."
	```

	3. Off-tracked example:
	```
	# Quesion:
	Answer the following question: - The roots of the growing plants start to break up the rock - The plant acids dissolve the rock - The rock is broken down into smaller pieces - Erosion begins. What might be the first step of the process?
	Answer:

	# Expected Output:
	The first step of the process might be: The roots of the growing plants start to break up the rock.

	# Model's Output:
	Step 1: Identify the key elements in the premise and hypothesis.

	Premise: "A man in a blue shirt and a red hat is holding a sign."
	Key elements: man, blue shirt, red hat, holding, sign

	Hypothesis: "A man is holding a sign."
	Key elements: man, holding, sign

	Step 2: Compare the key elements in the premise and hypothesis.

	In the premise, we have a man wearing a blue shirt and a red hat, while in the hypothesis, we only have a man. The premise also mentions that the man is holding a sign, which is also mentioned in the hypothesis.

	Step 3: Determine if the hypothesis is entailed by the premise.

	Since the hypothesis states that a man is holding a sign, and the premise confirms that a man is holding a sign, we can conclude that the hypothesis is entailed by the premise. The additional information about the man's clothing in the premise does not contradict or negate the
	```

	Despite being undertrained, the model demonstrates an emerging grasp of sound-text semantics.


	## Citation Information

	BibTeX:

	```
	@article{Llama-3-Sound: Sound Instruction LLM 2024,
	title={Llama-3-Sound},
	author={Homebrew Research},
	year=2024,
	month=July},
	url={https://huggingface.co/jan-hq/Jan-Llama3-0708}
	```

	## Acknowledgement

	- [WhisperSpeech](https://github.com/collabora/WhisperSpeech)

	- [Encodec](https://github.com/facebookresearch/encodec)

	- [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)