language:
- en
license: mit
tags:
- generated_from_trainer
- text generation
- pytorch
- casual-lm
metrics:
- accuracy
base_model: EleutherAI/gpt-neo-125M
model-index:
- name: openchatgpt-neo-r1
results: []
--- Disclaimer ---
"Neo is an incredibly cursed codebase, it should not be used by anyone" (C) co-founder of EleutherAI - Connor Leahy
!!! USE openchatgpt-neox-125m INSTEAD !!!
--- Archived ---
openchatgpt-neo-r1
This model is a fine-tuned version of EleutherAI/gpt-neo-125M on the openchatgpt safe-r1 dataset. It achieves the following results on the evaluation set:
- Loss: 3.2156
- Accuracy: 0.8338
Model description
Finetune based on the inner workings of ChatGPT. I won't elaborate on that. You must have a faint idea of how prompt is made for it to spit anything that's not garbled mess.
This is effectively a schizophrenic idea that met the light of day. Practically a collab of 3 students in a virtual shed.
Intended uses & limitations
Intended uses & limitations fall in line with OpenAI's. Dataset used consists of safe texts (i.e. not highly sexual/erotica type stuff). NSFW version of the dataset is not planned to exist at the moment.
Keep in mind that this is a 125m version of GPT-Neo. My 1050Ti Mobile couldn't even handle that without gradient thingmabobs. If anyone knows how to effectively finetune larger models on free colabs - feel free to let me know. Pile tokenizer also has one downside compared to native GPT-2/3 - Assistant
.
Training and evaluation data
Data was split in ratio of 95%/5%. Preproccess included removing mentions of OpenAI wherever it was not deemed appropriete (GPT-2 has one of the appropriete mentions). Whole dataset consists of just shy off 3k input-output pairs. One input has multiple outputs (read as: one message has multiple variants of an answer). <<<1% (3 total) are curated lines (i.e. a huge mistake was spotted that needed corrections).
Heavy bias on IT.
Training procedure
Input and output were straight up concatenated due to the nature of how ChatGPT works. Padding chosen was the same as the separator token, if that's not effective - please let me know as I am new to this stuff.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
4.9203 | 1.0 | 1378 | 5.1668 | 0.7274 |
4.1368 | 2.0 | 2756 | 4.3841 | 0.7563 |
3.4554 | 3.0 | 4134 | 3.8068 | 0.7875 |
2.7598 | 4.0 | 5512 | 3.3097 | 0.8303 |
2.5879 | 5.0 | 6890 | 3.2156 | 0.8338 |
Framework versions
- Transformers 4.25.1
- Pytorch 1.13.0+cu116
- Datasets 2.8.0
- Tokenizers 0.13.2