flax-community
/

gpt-neo-125M-code-clippy-dedup

@@ -14,16 +14,16 @@ PT-Neo-125M-Code-Clippy-Dedup is a [GPT-Neo-125M model](https://huggingface.co/E
 In this model's training we tried to stabilize the training by limiting the types of files we were using to train to only those that contained file extensions for popular programming languages as our dataset contains other types of files as well such as `.txt` or project configuration files. We used the following extensions to filter by:
-The training script used to train this model can be found [here](https://github.com/ncoop57/gpt-code-clippy/blob/camera-ready/training/run_clm_apps.py).
 ```bash
-./run_clm_streaming_flax.py \
-    --output_dir $HOME/gpt-neo-125M-code-clippy \
-    --model_name_or_path="flax-community/gpt-neo-125M-code-clippy" \
-    --dataset_name $HOME/gpt-code-clippy/code_clippy.py \
-    --data_dir /home/shared/code_clippy_data \
     --text_column_name="text" \
     --do_train --do_eval \
     --block_size="2048" \
@@ -32,13 +32,13 @@ The training script used to train this model can be found [here](https://github.
     --preprocessing_num_workers="8" \
     --learning_rate="1e-4" \
     --max_steps 100000 \
-    --warmup_steps 2500 \
-    --decay_steps 25000 \
     --adam_beta1="0.9" \
     --adam_beta2="0.95" \
     --weight_decay="0.1" \
     --overwrite_output_dir \
-    --logging_steps="100" \
     --eval_steps="500" \
     --push_to_hub="False" \
     --report_to="all" \
@@ -48,7 +48,7 @@ The training script used to train this model can be found [here](https://github.
     --save_total_limit 10 \
     --gradient_accumulation_steps 16 \
     --report_to="wandb" \
-    --run_name="125m_1e-4lr_1024bs" \
     --max_eval_samples 2000 \
     --save_optimizer true
 ```

 In this model's training we tried to stabilize the training by limiting the types of files we were using to train to only those that contained file extensions for popular programming languages as our dataset contains other types of files as well such as `.txt` or project configuration files. We used the following extensions to filter by:
+The training script used to train this model can be found [here](https://github.com/ncoop57/gpt-code-clippy/blob/camera-ready/training/run_clm_streaming_filter_flax.py).
 ```bash
+./run_clm_streaming_filter_flax.py \
+    --output_dir $HOME/gpt-neo-125M-code-clippy-dedup \
+    --model_name_or_path="EleutherAI/gpt-neo-125M" \
+    --dataset_name $HOME/gpt-code-clippy/code_clippy_filter.py \
+    --data_dir $HOME/code_clippy_data/code_clippy_dedup_data \
     --text_column_name="text" \
     --do_train --do_eval \
     --block_size="2048" \
     --preprocessing_num_workers="8" \
     --learning_rate="1e-4" \
     --max_steps 100000 \
+    --warmup_steps 2000 \
+    --decay_steps 30000 \
     --adam_beta1="0.9" \
     --adam_beta2="0.95" \
     --weight_decay="0.1" \
     --overwrite_output_dir \
+    --logging_steps="25" \
     --eval_steps="500" \
     --push_to_hub="False" \
     --report_to="all" \
     --save_total_limit 10 \
     --gradient_accumulation_steps 16 \
     --report_to="wandb" \
+    --run_name="gpt-neo-125M-code-clippy-dedup-filtered-no-resize-2048bs" \
     --max_eval_samples 2000 \
     --save_optimizer true
 ```