Uploaded model
- Developed by: alnrg2arg
- Finetuned from model : alnrg2arg/blockchainlabs_7B_merged_test2_4
This is a model from blockchainlab test 2.4 which are merged - alnrg2arg/blockchainlabs_7B_merged_test2_4.
The project is running to make a small LLM for a on-device purpose.
Overall pipeline for this iteration is
1.Merging to make a base model (7B) 2.Prune the model to reduce the parameter (50% sparcity) 3.For recovery phase of the pruning, the DPO is chosen.
This model which is not pruned is intended to compare with the pruned model.
DPO consists of two parts : SFT and DPO - Now this model is the intermediate format
This is the code and parameters I chose for this model(SFT).
from transformers import TrainingArguments
from trl import SFTTrainer
from datasets import load_dataset
from unsloth import FastLanguageModel, FastMistralModel
max_seq_length = 2048 # Supports automatic RoPE Scaling, so choose any number
# Load model
model, tokenizer = FastMistralModel.from_pretrained(
model_name = "alnrg2arg/blockchainlabs_7B_merged_test2_4,
max_seq_length = max_seq_length,
dtype = None, # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True, # Use 4bit quantization to reduce memory usage. Can be False
#device_map = "balanced"
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
model = FastMistralModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Dropout = 0 is currently optimized
bias = "none", # Bias = "none" is currently optimized
use_gradient_checkpointing = True,
random_state = 3407,
max_seq_length = max_seq_length,
)
The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
- Downloads last month
- 1
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for alnrg2arg/blockchainlabs_7B_merged_test2_4_sft_4bit
Base model
alnrg2arg/blockchainlabs_7B_merged_test2_4