Fine-Tuning LLaMA-2 Chat Model with Medical QnA Dataset using QLoRA

This repository contains the code and configuration for fine-tuning the LLaMA-2 chat model using the Medical QnA dataset with the QLoRA technique.Used only 2k data elements for training due to constrained gpu resources.

Model and Dataset

  • Pre-trained Model: NousResearch/Llama-2-7b-chat-hf
  • Dataset for Fine-Tuning: randomani/MedicalQnA-llama2
  • Fine-Tuned Model Name: Llama-2-7b-Medchat-finetune

QLoRA Parameters

  • LoRA Attention Dimension (lora_r): 64
  • LoRA Scaling Alpha (lora_alpha): 16
  • LoRA Dropout Probability (lora_dropout): 0.1

bitsandbytes Parameters

  • Use 4-bit Precision (use_4bit): True
  • 4-bit Compute Dtype (bnb_4bit_compute_dtype): float16
  • 4-bit Quantization Type (bnb_4bit_quant_type): nf4
  • Use Nested Quantization (use_nested_quant): False

Training Arguments

  • Number of Training Epochs (num_train_epochs): 1
  • Use fp16 (fp16): False
  • Use bf16 (bf16): False
  • Training Batch Size per GPU (per_device_train_batch_size): 4
  • Evaluation Batch Size per GPU (per_device_eval_batch_size): 4
  • Gradient Accumulation Steps (gradient_accumulation_steps): 1
  • Enable Gradient Checkpointing (gradient_checkpointing): True
  • Maximum Gradient Norm (max_grad_norm): 0.3
  • Initial Learning Rate (learning_rate): 2e-4
  • Weight Decay (weight_decay): 0.001
  • Optimizer (optim): paged_adamw_32bit
  • Learning Rate Scheduler Type (lr_scheduler_type): cosine
  • Maximum Training Steps (max_steps): -1
  • Warmup Ratio (warmup_ratio): 0.03
  • Group Sequences by Length (group_by_length): True
  • Save Checkpoints Every X Steps (save_steps): 0
  • Logging Steps (logging_steps): 25

Supervised Fine-Tuning (SFT) Parameters

  • Maximum Sequence Length (max_seq_length): None
  • Packing Multiple Short Examples (packing): False


For more details and access to the dataset, visit the Hugging Face Dataset Page.

