|
--- |
|
license: mit |
|
library_name: transformers |
|
model-index: |
|
- name: caliburn-12b |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: HuggingFaceH4/ifeval |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 35.76 |
|
name: strict accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xclbr7/caliburn-12b |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: BBH |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 35.64 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xclbr7/caliburn-12b |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: hendrycks/competition_math |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 9.67 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xclbr7/caliburn-12b |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 11.52 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xclbr7/caliburn-12b |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 13.78 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xclbr7/caliburn-12b |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 29.72 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Xclbr7/caliburn-12b |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
# caliburn 12b-merged |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This model is a 12 billion parameter language model created by merging multiple existing models using the MergeKit library. It is designed for general text generation tasks. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
This is a large language model with 12 billion parameters, created by merging multiple pre-existing models using the MergeKit library. The model is based on the transformer architecture and is fine-tuned for general text generation tasks. |
|
|
|
- **Developed by:** The user who created this merged model |
|
- **Model type:** Transformer-based language model |
|
- **Language(s) (NLP):** English |
|
- **License:** MIT |
|
- **Finetuned from model:** Multiple source models merged using MergeKit |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** [More Information Needed] |
|
- **Paper [optional]:** N/A |
|
- **Demo [optional]:** N/A |
|
|
|
## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Xclbr7__caliburn-12b) |
|
|
|
| Metric |Value| |
|
|-------------------|----:| |
|
|Avg. |22.68| |
|
|IFEval (0-Shot) |35.76| |
|
|BBH (3-Shot) |35.64| |
|
|MATH Lvl 5 (4-Shot)| 9.67| |
|
|GPQA (0-shot) |11.52| |
|
|MuSR (0-shot) |13.78| |
|
|MMLU-PRO (5-shot) |29.72| |
|
|
|
### Direct Use |
|
|
|
This model can be used for various natural language processing tasks, including: |
|
|
|
- Text generation |
|
- Code completion |
|
- Question answering |
|
- Summarization |
|
|
|
### Downstream Use [optional] |
|
|
|
The model can be fine-tuned for specific tasks or domains to improve performance on targeted applications. |
|
|
|
### Out-of-Scope Use |
|
|
|
This model should not be used for generating harmful, biased, or unethical content. It should not be relied upon for critical decision-making without human oversight. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
- The model may inherit biases present in its training data or source models. |
|
- It may generate incorrect or nonsensical information. |
|
- The model's outputs should be carefully reviewed and fact-checked. |
|
|
|
### Recommendations |
|
|
|
Users should be aware of the model's limitations and potential biases. It's recommended to use the model with appropriate content filtering and human oversight, especially for public-facing applications. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the following code to get started with the model: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("./models/12b-merged") |
|
model = AutoModelForCausalLM.from_pretrained("./models/12b-merged", torch_dtype=torch.float16).to("cuda") |
|
|
|
prompt = "Your prompt here" |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs.to("cuda"), max_new_tokens=100) |
|
result = tokenizer.batch_decode(outputs, skip_special_tokens=True) |
|
print(result) |
|
|
|
|
|
|