metadata

library_name: transformers
license: cc-by-4.0
datasets:
  - hendrycks/ethics

Model Card for Model ID

Fine-tuned version of Phi-3-mini-4k-instruct on a subset of the hendrycks/ethics dataset

How to Get Started with the Model

Use the code below to get started with the model.

Install the latest version of the following python libraries:
-torch
-accelerate
-peft
-bitsandbytes

Run the model

from transformers import AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
peft_model_id = "fc91/phi3-mini-instruct-full_ethics-lora_v2.5"
model = PeftModel.from_pretrained(base_model, peft_model_id)

Run the model with a quantization configuration

import torch, accelerate, peft 
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
from peft import PeftModel

# Set up quantization configuration
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=getattr(torch, "float16")
)

# Load the base model with quantization
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    quantization_config=quantization_config,
    device_map="auto",
    attn_implementation='eager',
    torch_dtype="auto",
    trust_remote_code=True,
)

peft_model_id = "fc91/phi3-mini-instruct-full_ethics-lora_v2.5"
model = PeftModel.from_pretrained(base_model, peft_model_id)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

messages = [ 
     {"role": "system", "content": "You are a helpful AI assistant sensitive to ethical concerns. Carefully read and interpret the user prompt under a [SPECIFY ETHICAL THEORY] perspective. Does it represent an 'ethical' or an 'unethical' [SPECIFY ETHICAL THEORY] reply? Respond ONLY with 'ethical' or 'unethical"}, 
     {"role": "user", "content": [PROVIDE USER CONTENT]},
     {"role": "assistant", "content": "The user reply is..."}, 
] 

pipe = pipeline( 
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
) 

generation_args = { 
    "max_new_tokens": 1000, 
    "return_full_text": False, 
    "temperature": 0.5, 
    "do_sample": False, 
} 

# Run inference
output = pipe(messages, **generation_args) 
print(output[0]['generated_text'])

Training Details

Training Data

"hendrycks/ethics"

The following subsets of the above dataset were leveraged:
-commonsense/train (13.9k random samples)
-commonsense/validation (3.6k random samples)
-deontology/train (18.2k random samples)
-deontology/validation (2.8k random samples)
-justice/train (21k random samples)
-utilitarianism/train (21k random samples)

Training Procedure

Training Hyperparameters

per_device_train_batch_size=64  
per_device_eval_batch_size=64  
gradient_accumulation_steps=2
gradient_checkpointing=True
warmup_steps=100
num_train_epochs=1
learning_rate=0.00005
weight_decay=0.01
optim="adamw_hf"
fp16=True

Speeds, Sizes, Times

The overall training took 5 hours and 24 minutes.

Evaluation

Training Loss = 0.210800

Validation Loss = 0.234834

Testing Data, Factors & Metrics

Testing Data

"hendrycks/ethics"

The following subsets of the above dataset were leveraged:
-commonsense/test (2.5k random samples)
-deontology/test (2.5k random samples)
-justice/test (2.5k random samples)
-utilitarianism/test (2.5k random samples)

Hardware

6xNVIDIA A100-SXM4-40GB