|
# Base model is LLAMA 3.1 8B |
|
## Modifications: |
|
1. Quantization to INT4 for training on COLAB A100 GPU with 40GB of VRAM |
|
2. LORA for parameter-efficient-fine-tuning which allowed attaching an adapter that was customized for specific task. |
|
|
|
## Observations: |
|
1. Initial model does not have enough of predictive power to distinguish each entry that is passed during inference |
|
2. Adapters indeed adapt the model for specific tasks, which was evident, when model changed its predictions towards the majority-class instead of random prediction during inference. |
|
3. Requirement is easy, adapt the model and passed data to create some predictive power. |
|
|
|
## Actions: |
|
- Use 70B model |
|
- |