Materaili's picture
Create readme.md
bd8fe57 verified
|
raw
history blame contribute delete
No virus
676 Bytes

Base model is LLAMA 3.1 8B

Modifications:

  1. Quantization to INT4 for training on COLAB A100 GPU with 40GB of VRAM
  2. LORA for parameter-efficient-fine-tuning which allowed attaching an adapter that was customized for specific task.

Observations:

  1. Initial model does not have enough of predictive power to distinguish each entry that is passed during inference
  2. Adapters indeed adapt the model for specific tasks, which was evident, when model changed its predictions towards the majority-class instead of random prediction during inference.
  3. Requirement is easy, adapt the model and passed data to create some predictive power.

Actions:

  • Use 70B model