File size: 1,313 Bytes
a6c63f2 16b29a1 537dafd a6c63f2 16b29a1 a6c63f2 16b29a1 a6c63f2 16b29a1 a6c63f2 16b29a1 a6c63f2 16b29a1 a6c63f2 16b29a1 a6c63f2 16b29a1 a6c63f2 f95aa06 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
library_name: transformers
license: gemma
language:
- en
base_model:
- google/gemma-2-9b-it
---
Yippie!! It's my birthday in 2 days! So I'm gonna drop this model that I made at 2AM.
It's quite a bit different from what I've been doing recently, but it was pretty fun to work on. :D
Used a different merging technique that I quickly designed & crafted which focused on the math and logical reasoning.
Also fine-tuned this on Self-Play RL algorithms.
Not the best LLM out there but it was pretty fun making this, and coming up with something different.
Borrowed some ideas from my other "distillation" technique, which significantly reduces the number of layers while aiming to retain the output quality.
Thanks and have fun!!
Also happy birthday to myself :)
Usage (taken from official Gemma 2 9B repo):
```python
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="minchyeom/birthday-llm",
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda", # replace with "mps" to run on a Mac device
)
messages = [
{"role": "user", "content": "How many r's are there in the word strawberry?"},
]
outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)
``` |