sanchit-gandhi HF staff commited on
Commit
ebc6ad5
β€’
1 Parent(s): 4ea2eae

Training in progress, step 1000

Browse files
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d7f84b30ad1e26b72493f2e487a84b8fb077327a611d56fcd0605d78146fa822
3
  size 3141646744
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a7667b4f7fb01bc0f00d3a07e35846f784c9d9bf8171e61c789861b33c4987f
3
  size 3141646744
runs/Apr24_16-42-31_ip-26-0-162-233/events.out.tfevents.1713977002.ip-26-0-162-233.1854033.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d99d51bb7fcd506f76273107b7f54b0a07695a2cf620317840bf9823aa458c38
3
- size 9086
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:283b4ce5861b6a32871a30e6646ee3c820d4f3b45c93457f52525e6cba33a9a4
3
+ size 13306
wandb/debug-internal.log CHANGED
The diff for this file is too large to render. See raw diff
 
wandb/run-20240424_164324-xfbnm7qo/files/output.log CHANGED
@@ -520,3 +520,482 @@
520
 
521
 
522
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
520
 
521
 
522
 
523
+
524
+
525
+
526
+
527
+
528
+
529
+
530
+
531
+
532
+
533
+
534
+
535
+
536
+
537
+
538
+
539
+ 3%|β–ˆβ–ˆ | 550/20000 [17:12<9:50:23, 1.82s/it]
540
+
541
+
542
+
543
+
544
+
545
+
546
+
547
+
548
+
549
+
550
+
551
+
552
+
553
+
554
+
555
+
556
+
557
+
558
+
559
+
560
+
561
+
562
+
563
+ 3%|β–ˆβ–ˆβ– | 575/20000 [17:57<9:51:03, 1.83s/it]
564
+
565
+
566
+
567
+
568
+
569
+
570
+
571
+
572
+
573
+
574
+
575
+
576
+
577
+
578
+
579
+
580
+
581
+
582
+
583
+
584
+
585
+
586
+
587
+ 3%|β–ˆβ–ˆβ–Ž | 600/20000 [18:43<9:49:40, 1.82s/it]
588
+
589
+
590
+
591
+
592
+
593
+
594
+
595
+
596
+
597
+
598
+
599
+
600
+
601
+
602
+
603
+
604
+
605
+
606
+
607
+
608
+
609
+
610
+
611
+ 3%|β–ˆβ–ˆβ– | 625/20000 [19:28<9:46:32, 1.82s/it]
612
+
613
+
614
+
615
+
616
+
617
+
618
+
619
+
620
+
621
+
622
+
623
+
624
+
625
+
626
+
627
+
628
+
629
+
630
+
631
+
632
+
633
+
634
+ 3%|β–ˆβ–ˆβ–Œ | 650/20000 [20:14<9:44:36, 1.81s/it]
635
+
636
+
637
+
638
+
639
+
640
+
641
+
642
+
643
+
644
+
645
+
646
+
647
+
648
+
649
+
650
+
651
+
652
+
653
+
654
+
655
+
656
+
657
+
658
+ 3%|β–ˆβ–ˆβ–Œ | 675/20000 [20:59<9:45:00, 1.82s/it]
659
+
660
+
661
+
662
+
663
+
664
+
665
+
666
+
667
+
668
+
669
+
670
+
671
+
672
+
673
+
674
+
675
+
676
+
677
+
678
+
679
+
680
+ 4%|β–ˆβ–ˆβ–‹ | 700/20000 [21:45<9:46:59, 1.82s/it]
681
+
682
+
683
+
684
+
685
+
686
+
687
+
688
+
689
+
690
+
691
+
692
+
693
+
694
+
695
+
696
+
697
+
698
+
699
+
700
+
701
+
702
+
703
+ 4%|β–ˆβ–ˆβ–Š | 724/20000 [22:28<9:44:44, 1.82s/it]
704
+
705
+
706
+
707
+
708
+
709
+
710
+
711
+
712
+
713
+
714
+
715
+
716
+
717
+
718
+
719
+
720
+
721
+
722
+
723
+
724
+
725
+
726
+
727
+ 4%|β–ˆβ–ˆβ–‰ | 750/20000 [23:16<9:44:11, 1.82s/it]
728
+
729
+
730
+
731
+
732
+
733
+
734
+
735
+
736
+
737
+
738
+
739
+
740
+
741
+
742
+
743
+
744
+
745
+
746
+
747
+
748
+
749
+
750
+
751
+ 4%|β–ˆβ–ˆβ–‰ | 775/20000 [24:01<9:42:52, 1.82s/it]
752
+
753
+
754
+
755
+
756
+
757
+
758
+
759
+
760
+
761
+
762
+
763
+
764
+
765
+
766
+
767
+
768
+
769
+
770
+
771
+
772
+
773
+
774
+
775
+ 4%|β–ˆβ–ˆβ–ˆ | 800/20000 [24:47<10:00:33, 1.88s/it]
776
+
777
+
778
+
779
+
780
+
781
+
782
+
783
+
784
+
785
+
786
+
787
+
788
+
789
+
790
+
791
+
792
+
793
+
794
+
795
+
796
+
797
+
798
+ 4%|β–ˆβ–ˆβ–ˆβ– | 824/20000 [25:30<9:42:02, 1.82s/it]
799
+
800
+
801
+
802
+
803
+
804
+
805
+
806
+
807
+
808
+
809
+
810
+
811
+
812
+
813
+
814
+
815
+
816
+
817
+
818
+
819
+
820
+
821
+
822
+ 4%|β–ˆβ–ˆβ–ˆβ–Ž | 850/20000 [26:18<9:41:27, 1.82s/it]
823
+
824
+
825
+
826
+
827
+
828
+
829
+
830
+
831
+
832
+
833
+
834
+
835
+
836
+
837
+
838
+
839
+
840
+
841
+
842
+
843
+
844
+
845
+
846
+ 4%|β–ˆβ–ˆβ–ˆβ–Ž | 875/20000 [27:03<9:40:29, 1.82s/it]
847
+
848
+
849
+
850
+
851
+
852
+
853
+
854
+
855
+
856
+
857
+
858
+
859
+
860
+
861
+
862
+
863
+
864
+
865
+
866
+
867
+
868
+
869
+
870
+ 4%|β–ˆβ–ˆβ–ˆβ– | 900/20000 [27:49<9:39:54, 1.82s/it]
871
+
872
+
873
+
874
+
875
+
876
+
877
+
878
+
879
+
880
+
881
+
882
+
883
+
884
+
885
+
886
+
887
+
888
+
889
+
890
+
891
+
892
+
893
+ 5%|β–ˆβ–ˆβ–ˆβ–Œ | 924/20000 [28:32<9:40:41, 1.83s/it]
894
+
895
+
896
+
897
+
898
+
899
+
900
+
901
+
902
+
903
+
904
+
905
+
906
+
907
+
908
+
909
+
910
+
911
+
912
+
913
+
914
+
915
+
916
+
917
+ 5%|β–ˆβ–ˆβ–ˆβ–‹ | 950/20000 [29:20<9:37:26, 1.82s/it]
918
+
919
+
920
+
921
+
922
+
923
+
924
+
925
+
926
+
927
+
928
+
929
+
930
+
931
+
932
+
933
+
934
+
935
+
936
+
937
+
938
+
939
+
940
+
941
+ 5%|β–ˆβ–ˆβ–ˆβ–Š | 975/20000 [30:05<9:34:44, 1.81s/it]
942
+
943
+
944
+
945
+
946
+
947
+
948
+
949
+
950
+
951
+
952
+
953
+
954
+
955
+
956
+
957
+
958
+
959
+
960
+
961
+
962
+
963
+
964
+ 5%|β–ˆβ–ˆβ–ˆβ–Š | 1000/20000 [30:51<9:35:10, 1.82s/it][INFO|trainer.py:3304] 2024-04-24 17:14:21,632 >> Saving model checkpoint to ./checkpoint-1000
965
+ [INFO|configuration_utils.py:471] 2024-04-24 17:14:21,635 >> Configuration saved in ./checkpoint-1000/config.json
966
+ [INFO|configuration_utils.py:697] 2024-04-24 17:14:21,638 >> Configuration saved in ./checkpoint-1000/generation_config.json
967
+ {'loss': 1.5546, 'grad_norm': 1.2265625, 'learning_rate': 9.743589743589744e-05, 'epoch': 0.25}
968
+ [INFO|modeling_utils.py:2590] 2024-04-24 17:14:25,965 >> Model weights saved in ./checkpoint-1000/model.safetensors
969
+ [INFO|tokenization_utils_base.py:2488] 2024-04-24 17:14:25,975 >> tokenizer config file saved in ./checkpoint-1000/tokenizer_config.json
970
+ [INFO|tokenization_utils_base.py:2497] 2024-04-24 17:14:25,977 >> Special tokens file saved in ./checkpoint-1000/special_tokens_map.json
971
+ [INFO|tokenization_utils_base.py:2488] 2024-04-24 17:14:35,995 >> tokenizer config file saved in ./tokenizer_config.json
972
+ [INFO|tokenization_utils_base.py:2497] 2024-04-24 17:14:35,997 >> Special tokens file saved in ./special_tokens_map.json
973
+ /fsx/sanchit/miniconda3/envs/venv/lib/python3.11/site-packages/torch/utils/checkpoint.py:144: UserWarning: Tensor arguments, excluding CPU tensors, are detected on at least two types of devices. Device state will only be saved for devices of a single device type, and the remaining devices will be ignored. Consequently, if any checkpointed functions involve randomness, this may result in incorrect gradients. (Note that if CUDA devices are among the devices detected, it will be prioritized; otherwise, the first device encountered will be selected.)
974
+ warnings.warn(
975
+
976
+
977
+
978
+
979
+
980
+
981
+
982
+
983
+
984
+
985
+
986
+
987
+
988
+
989
+
990
+
991
+
992
+
993
+
994
+
995
+
996
+
997
+ 5%|β–ˆβ–ˆβ–ˆβ–‰ | 1026/20000 [31:52<9:35:24, 1.82s/it]
998
+
999
+
1000
+
1001
+
wandb/run-20240424_164324-xfbnm7qo/files/wandb-summary.json CHANGED
@@ -1 +1 @@
1
- {"train/loss": 2.0215, "train/grad_norm": 4.125, "train/learning_rate": 9.987179487179488e-05, "train/epoch": 0.13, "train/global_step": 525, "_timestamp": 1713977997.0387745, "_runtime": 992.4967684745789, "_step": 21}
 
1
+ {"train/loss": 1.5537, "train/grad_norm": 1.171875, "train/learning_rate": 9.730769230769232e-05, "train/epoch": 0.25, "train/global_step": 1025, "_timestamp": 1713978921.4080203, "_runtime": 1916.8660142421722, "_step": 41}
wandb/run-20240424_164324-xfbnm7qo/logs/debug-internal.log CHANGED
The diff for this file is too large to render. See raw diff
 
wandb/run-20240424_164324-xfbnm7qo/run-xfbnm7qo.wandb CHANGED
Binary files a/wandb/run-20240424_164324-xfbnm7qo/run-xfbnm7qo.wandb and b/wandb/run-20240424_164324-xfbnm7qo/run-xfbnm7qo.wandb differ