Edit model card

This is the model weight of MUFFIN-T5-3B (Multi-Faceted Instructions).

We fine-tune the T5-3B model on our MUFFIN dataset.

We released both 3B and 11B models:

Model Number of parameters
MUFFIN-T5-3B 3 billion
MUFFIN-T5-11B 11 billion

You can also find the Llama2-based model weights here.

Prompt Template

Please use the following prompt template when using the models for inference (including the evaluations on SuperNI-Test, T0-Eval, and BBH):

prompt = "### Input:\n{input}"
prompt += "\n\n"
prompt += "### Instruction:\n{instruction}"
prompt += "\n\n"
prompt += "### Output:\n"

print(prompt)

Please use the below prompt when testing the models on classification tasks (i.e., the MMLU).

prompt = "### Input:\n{input}"
prompt += "\n\n"
prompt += "### Instruction:\n{instruction}\n"
prompt += "(A): {option1}\n(B): {option2}\n(C): {option3}\n(D): {option4}\nAvoid answers outside of (A, B, C, D)."  # Add one more sentence in the prompt to indicate the output spaces
prompt += "\n\n"
prompt += "### Output:\n"

print(prompt)

Model Usage

Download our model weights through HuggingFace transformers 🤗:

import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

## Download
tokenizer = AutoTokenizer.from_pretrained("Reza8848/MUFFIN-T5-3B")
model = AutoModelForSeq2SeqLM.from_pretrained("Reza8848/MUFFIN-T5-3B")


## Inference
#### Please prepare your testing instance (as shown below)
value_dict = {
  "input": "Drink more wine when you feel thirsty.\nDrink more water when you feel thirsty"
  "instruction": "In this task, you are given two unconventional instructions for quenching thirst. Your goal is to identify which instruction is more likely to be followed by a person who wants to try something new or different. Answer \"Wine\" if the person is more likely to drink wine when thirsty, and \"Water\" if they are more likely to drink water."
}

#### Please use the prompt template mentioned before
input_sequence = prompt.format_map(value_dict)

input_ids = tokenizer(input_sequence, return_tensors="pt").input_ids
raw_outputs = model.generate(input_ids)  # set the generation arguments according to your needs (e.g., `do_sample`, `num_beams`)
outputs = tokenizer.decode(raw_outputs[0], skip_special_tokens=True)
print(outputs)

Zero-Shot Evaluation Performances

Our training and inference code is based on Tk-Instruct, including the metric calculation scripts (i.e., ROUGE-L and Exact-Match).

performances.png

🥳 Citation

Please kindly cite our paper if you use any resources in this repository:

@inproceedings{Lou2023MUFFIN,
   title={{MUFFIN}: Curating Multi-Faceted Instructions for Improving Instruction Following},
   author={Renze Lou and Kai Zhang and Jian Xie and Yuxuan Sun and Janice Ahn and Hanzi Xu and Yu su and Wenpeng Yin},
   booktitle={The Twelfth International Conference on Learning Representations},
   year={2024},
   url={https://openreview.net/forum?id=1vrS1zwekw}
}
Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Reza8848/MUFFIN-T5-3B