calum commited on
Commit
83fd442
1 Parent(s): 5133c7a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -7
README.md CHANGED
@@ -7,6 +7,8 @@ model-index:
7
  datasets:
8
  - roneneldan/TinyStories
9
  pipeline_tag: text-generation
 
 
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -18,22 +20,58 @@ This model is a tiny (3M trainable parameters) GPT-2 model pre-trained for 3 epo
18
 
19
  ## Model description
20
 
21
- More information needed
 
 
 
22
 
23
  ## Intended uses & limitations
24
 
25
- More information needed
 
 
 
 
 
 
 
 
 
26
 
27
  ## Training and evaluation data
28
 
29
- More information needed
30
 
31
  ## Training procedure
32
- Trained for 400k steps (~7 hours) on 2xH100 80GB PCIe with 32vCPU and 500GB RAM on Runpod.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ### Training hyperparameters
35
 
36
  The following hyperparameters were used during training:
 
 
 
37
  - learning_rate: 5e-05
38
  - train_batch_size: 16
39
  - eval_batch_size: 16
@@ -42,9 +80,6 @@ The following hyperparameters were used during training:
42
  - lr_scheduler_type: linear
43
  - num_epochs: 3.0
44
 
45
- ### Training results
46
-
47
-
48
 
49
  ### Framework versions
50
 
 
7
  datasets:
8
  - roneneldan/TinyStories
9
  pipeline_tag: text-generation
10
+ language:
11
+ - en
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
20
 
21
  ## Model description
22
 
23
+ TinyStories-GPT2-3M is a replication of the TinyStories model, using a GPT-2 architecture in place of GPT-Neo. This was a
24
+ deliberate choice made to accelerate research, as the GPT-2 architecture is more widely supported across tooling. We do not
25
+ contribute any performance improvements of note, though similarly to the original model, we find a surprising degree of coherency
26
+ within the model, given its size.
27
 
28
  ## Intended uses & limitations
29
 
30
+ Research use only - NOT suitable for commercial use per OpenAI TOS on using their APIs to source training data.
31
+
32
+
33
+ Note that the vocabulary this model was trained on is quite minimal. Out of distribution inputs will not work as well as
34
+ a larger, more general purpose model. To observe this behaviour, try generating a few tokens after a non-trivial word like
35
+ "Biology". The model typically treats words that did not frequently appear in training as character names in a story.
36
+
37
+
38
+ All training data is English. As such, input with other languages is out of distribution, and will result in the model treating
39
+ previous input as character names, ignoring it entirely, or generating meaningless tokens.
40
 
41
  ## Training and evaluation data
42
 
43
+ Trained for 3 epochs on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) V2 dataset, produced by GPT-4.
44
 
45
  ## Training procedure
46
+ Trained for 400k steps (~7 hours) on 2xH100 80GB PCIe with 32vCPU and 500GB RAM on Runpod.
47
+
48
+ To replicate, download GPT-4 V2 version of the TinyStories dataset alongside HuggingFace's `train_clm.py` script. Then run the following:
49
+ ```bash
50
+ #! /bin/bash
51
+
52
+ python train_clm.py \
53
+ --model_type=gpt2 \
54
+ --config_overrides=n_embd=64,n_layer=8,n_head=16 \
55
+ --tokenizer_name=gpt2 \
56
+ --train_file="data/TinyStoriesV2-GPT4-train.txt" \
57
+ --validation_file="data/TinyStoriesV2-GPT4-valid.txt" \
58
+ --block_size=256 \
59
+ --preprocessing_num_workers=8 \
60
+ --output_dir="out" \
61
+ --logging_dir="./log" \
62
+ --logging_steps=100 \
63
+ --logging_strategy=steps \
64
+ --save_steps=5000 \
65
+ --save_total_limit=10 \
66
+ --do_train
67
+ ```
68
 
69
  ### Training hyperparameters
70
 
71
  The following hyperparameters were used during training:
72
+ - n_embd: 64
73
+ - n_layer: 8
74
+ - n_head: 16
75
  - learning_rate: 5e-05
76
  - train_batch_size: 16
77
  - eval_batch_size: 16
 
80
  - lr_scheduler_type: linear
81
  - num_epochs: 3.0
82
 
 
 
 
83
 
84
  ### Framework versions
85