SnakyMcSnekFace commited on
Commit
f19846b
1 Parent(s): 4a7d23e

Updated Readme.txt

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -52,20 +52,21 @@ By design, this model has a strong vorny bias. It's not intended for use by anyo
52
 
53
  The model was fine-tuned using a [rank-stabilized](https://arxiv.org/abs/2312.03732) [QLoRA adapter](https://arxiv.org/abs/2305.14314). Training was performed using [Unsloth AI](https://github.com/unslothai/unsloth) library on `Ubuntu 22.04.4 LTS` with `CUDA 12.1` and `Pytorch 2.3.0`.
54
 
55
- The total training time on NVIDIA GeForce RTX 4060 Ti is about 24 hours.
56
 
57
  After training, the adapter weights were merged into the dequantized model as described in [ChrisHayduk's GitHub gist](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930).
58
 
59
  The quantized version of the model was prepared using [llama.cpp](https://github.com/ggerganov/llama.cpp).
60
 
61
- ### LoRa adapter configuration
62
 
63
  - Rank: 64
64
  - Alpha: 16
65
  - Dropout rate: 0.1
66
- - Target weights: `["q_proj", "k_proj", "o_proj", "gate_proj", "up_proj"]`,
67
  - `use_rslora=True`
68
 
 
69
 
70
  ### Domain adaptation
71
 
@@ -91,6 +92,7 @@ The raw-text stories in dataset were edited as follows:
91
  - Batch size: 1
92
  - Gradient accumulation steps: 1
93
 
 
94
 
95
  #### Plots
96
 
 
52
 
53
  The model was fine-tuned using a [rank-stabilized](https://arxiv.org/abs/2312.03732) [QLoRA adapter](https://arxiv.org/abs/2305.14314). Training was performed using [Unsloth AI](https://github.com/unslothai/unsloth) library on `Ubuntu 22.04.4 LTS` with `CUDA 12.1` and `Pytorch 2.3.0`.
54
 
55
+ The total training time on NVIDIA GeForce RTX 4060 Ti is about 26 hours.
56
 
57
  After training, the adapter weights were merged into the dequantized model as described in [ChrisHayduk's GitHub gist](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930).
58
 
59
  The quantized version of the model was prepared using [llama.cpp](https://github.com/ggerganov/llama.cpp).
60
 
61
+ ### QLoRa adapter configuration
62
 
63
  - Rank: 64
64
  - Alpha: 16
65
  - Dropout rate: 0.1
66
+ - Target weights: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]`,
67
  - `use_rslora=True`
68
 
69
+ Targeting all projections for QLoRA adapter resulted in the smallest loss compared to other combinations, even compared to larger rank adapters.
70
 
71
  ### Domain adaptation
72
 
 
92
  - Batch size: 1
93
  - Gradient accumulation steps: 1
94
 
95
+ The training takes ~24 hours on NVIDIA GeForce RTX 4060 Ti.
96
 
97
  #### Plots
98