|
--- |
|
license: mit |
|
language: de |
|
widget: |
|
- text: "In einer schockierenden Entdeckung fanden Wissenschaftler eine Herde Einhörner, die in einem abgelegenen, zuvor unerforschten Tal in den Anden lebten." |
|
--- |
|
|
|
# Replication of [gpt2-wechsel-german](https://huggingface.co/benjamin/gpt2-wechsel-german) |
|
|
|
- trained with [BigScience's DeepSpeed-Megatron-LM code base](https://github.com/bigscience-workshop/Megatron-DeepSpeed) |
|
- 22hrs on 4xA100 GPUs (~ 80 TFLOPs / GPU) |
|
- stopped after 100k steps |
|
- less than a single epoch on `oscar_unshuffled_deduplicated_de` (excluding validation set; original model was trained for 75 epochs on less data) |
|
- bf16 |
|
- zero stage 1 |
|
- tp/pp = 1 |
|
|
|
## Evaluation |
|
|
|
| Model | PPL | |
|
|---|---| |
|
| `gpt2-wechsel-german-ds-meg` | **26.4** | |
|
| `gpt2-wechsel-german` | 26.8 | |
|
| `gpt2` (retrained from scratch) | 27.63 | |
|
|
|
## License |
|
|
|
MIT |
|
|
|
|
|
|