Commit
β’
ebc6ad5
1
Parent(s):
4ea2eae
Training in progress, step 1000
Browse files- model.safetensors +1 -1
- runs/Apr24_16-42-31_ip-26-0-162-233/events.out.tfevents.1713977002.ip-26-0-162-233.1854033.0 +2 -2
- wandb/debug-internal.log +0 -0
- wandb/run-20240424_164324-xfbnm7qo/files/output.log +479 -0
- wandb/run-20240424_164324-xfbnm7qo/files/wandb-summary.json +1 -1
- wandb/run-20240424_164324-xfbnm7qo/logs/debug-internal.log +0 -0
- wandb/run-20240424_164324-xfbnm7qo/run-xfbnm7qo.wandb +0 -0
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 3141646744
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9a7667b4f7fb01bc0f00d3a07e35846f784c9d9bf8171e61c789861b33c4987f
|
3 |
size 3141646744
|
runs/Apr24_16-42-31_ip-26-0-162-233/events.out.tfevents.1713977002.ip-26-0-162-233.1854033.0
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:283b4ce5861b6a32871a30e6646ee3c820d4f3b45c93457f52525e6cba33a9a4
|
3 |
+
size 13306
|
wandb/debug-internal.log
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
wandb/run-20240424_164324-xfbnm7qo/files/output.log
CHANGED
@@ -520,3 +520,482 @@
|
|
520 |
|
521 |
|
522 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
520 |
|
521 |
|
522 |
|
523 |
+
|
524 |
+
|
525 |
+
|
526 |
+
|
527 |
+
|
528 |
+
|
529 |
+
|
530 |
+
|
531 |
+
|
532 |
+
|
533 |
+
|
534 |
+
|
535 |
+
|
536 |
+
|
537 |
+
|
538 |
+
|
539 |
+
3%|ββ | 550/20000 [17:12<9:50:23, 1.82s/it]
|
540 |
+
|
541 |
+
|
542 |
+
|
543 |
+
|
544 |
+
|
545 |
+
|
546 |
+
|
547 |
+
|
548 |
+
|
549 |
+
|
550 |
+
|
551 |
+
|
552 |
+
|
553 |
+
|
554 |
+
|
555 |
+
|
556 |
+
|
557 |
+
|
558 |
+
|
559 |
+
|
560 |
+
|
561 |
+
|
562 |
+
|
563 |
+
3%|βββ | 575/20000 [17:57<9:51:03, 1.83s/it]
|
564 |
+
|
565 |
+
|
566 |
+
|
567 |
+
|
568 |
+
|
569 |
+
|
570 |
+
|
571 |
+
|
572 |
+
|
573 |
+
|
574 |
+
|
575 |
+
|
576 |
+
|
577 |
+
|
578 |
+
|
579 |
+
|
580 |
+
|
581 |
+
|
582 |
+
|
583 |
+
|
584 |
+
|
585 |
+
|
586 |
+
|
587 |
+
3%|βββ | 600/20000 [18:43<9:49:40, 1.82s/it]
|
588 |
+
|
589 |
+
|
590 |
+
|
591 |
+
|
592 |
+
|
593 |
+
|
594 |
+
|
595 |
+
|
596 |
+
|
597 |
+
|
598 |
+
|
599 |
+
|
600 |
+
|
601 |
+
|
602 |
+
|
603 |
+
|
604 |
+
|
605 |
+
|
606 |
+
|
607 |
+
|
608 |
+
|
609 |
+
|
610 |
+
|
611 |
+
3%|βββ | 625/20000 [19:28<9:46:32, 1.82s/it]
|
612 |
+
|
613 |
+
|
614 |
+
|
615 |
+
|
616 |
+
|
617 |
+
|
618 |
+
|
619 |
+
|
620 |
+
|
621 |
+
|
622 |
+
|
623 |
+
|
624 |
+
|
625 |
+
|
626 |
+
|
627 |
+
|
628 |
+
|
629 |
+
|
630 |
+
|
631 |
+
|
632 |
+
|
633 |
+
|
634 |
+
3%|βββ | 650/20000 [20:14<9:44:36, 1.81s/it]
|
635 |
+
|
636 |
+
|
637 |
+
|
638 |
+
|
639 |
+
|
640 |
+
|
641 |
+
|
642 |
+
|
643 |
+
|
644 |
+
|
645 |
+
|
646 |
+
|
647 |
+
|
648 |
+
|
649 |
+
|
650 |
+
|
651 |
+
|
652 |
+
|
653 |
+
|
654 |
+
|
655 |
+
|
656 |
+
|
657 |
+
|
658 |
+
3%|βββ | 675/20000 [20:59<9:45:00, 1.82s/it]
|
659 |
+
|
660 |
+
|
661 |
+
|
662 |
+
|
663 |
+
|
664 |
+
|
665 |
+
|
666 |
+
|
667 |
+
|
668 |
+
|
669 |
+
|
670 |
+
|
671 |
+
|
672 |
+
|
673 |
+
|
674 |
+
|
675 |
+
|
676 |
+
|
677 |
+
|
678 |
+
|
679 |
+
|
680 |
+
4%|βββ | 700/20000 [21:45<9:46:59, 1.82s/it]
|
681 |
+
|
682 |
+
|
683 |
+
|
684 |
+
|
685 |
+
|
686 |
+
|
687 |
+
|
688 |
+
|
689 |
+
|
690 |
+
|
691 |
+
|
692 |
+
|
693 |
+
|
694 |
+
|
695 |
+
|
696 |
+
|
697 |
+
|
698 |
+
|
699 |
+
|
700 |
+
|
701 |
+
|
702 |
+
|
703 |
+
4%|βββ | 724/20000 [22:28<9:44:44, 1.82s/it]
|
704 |
+
|
705 |
+
|
706 |
+
|
707 |
+
|
708 |
+
|
709 |
+
|
710 |
+
|
711 |
+
|
712 |
+
|
713 |
+
|
714 |
+
|
715 |
+
|
716 |
+
|
717 |
+
|
718 |
+
|
719 |
+
|
720 |
+
|
721 |
+
|
722 |
+
|
723 |
+
|
724 |
+
|
725 |
+
|
726 |
+
|
727 |
+
4%|βββ | 750/20000 [23:16<9:44:11, 1.82s/it]
|
728 |
+
|
729 |
+
|
730 |
+
|
731 |
+
|
732 |
+
|
733 |
+
|
734 |
+
|
735 |
+
|
736 |
+
|
737 |
+
|
738 |
+
|
739 |
+
|
740 |
+
|
741 |
+
|
742 |
+
|
743 |
+
|
744 |
+
|
745 |
+
|
746 |
+
|
747 |
+
|
748 |
+
|
749 |
+
|
750 |
+
|
751 |
+
4%|βββ | 775/20000 [24:01<9:42:52, 1.82s/it]
|
752 |
+
|
753 |
+
|
754 |
+
|
755 |
+
|
756 |
+
|
757 |
+
|
758 |
+
|
759 |
+
|
760 |
+
|
761 |
+
|
762 |
+
|
763 |
+
|
764 |
+
|
765 |
+
|
766 |
+
|
767 |
+
|
768 |
+
|
769 |
+
|
770 |
+
|
771 |
+
|
772 |
+
|
773 |
+
|
774 |
+
|
775 |
+
4%|βββ | 800/20000 [24:47<10:00:33, 1.88s/it]
|
776 |
+
|
777 |
+
|
778 |
+
|
779 |
+
|
780 |
+
|
781 |
+
|
782 |
+
|
783 |
+
|
784 |
+
|
785 |
+
|
786 |
+
|
787 |
+
|
788 |
+
|
789 |
+
|
790 |
+
|
791 |
+
|
792 |
+
|
793 |
+
|
794 |
+
|
795 |
+
|
796 |
+
|
797 |
+
|
798 |
+
4%|ββββ | 824/20000 [25:30<9:42:02, 1.82s/it]
|
799 |
+
|
800 |
+
|
801 |
+
|
802 |
+
|
803 |
+
|
804 |
+
|
805 |
+
|
806 |
+
|
807 |
+
|
808 |
+
|
809 |
+
|
810 |
+
|
811 |
+
|
812 |
+
|
813 |
+
|
814 |
+
|
815 |
+
|
816 |
+
|
817 |
+
|
818 |
+
|
819 |
+
|
820 |
+
|
821 |
+
|
822 |
+
4%|ββββ | 850/20000 [26:18<9:41:27, 1.82s/it]
|
823 |
+
|
824 |
+
|
825 |
+
|
826 |
+
|
827 |
+
|
828 |
+
|
829 |
+
|
830 |
+
|
831 |
+
|
832 |
+
|
833 |
+
|
834 |
+
|
835 |
+
|
836 |
+
|
837 |
+
|
838 |
+
|
839 |
+
|
840 |
+
|
841 |
+
|
842 |
+
|
843 |
+
|
844 |
+
|
845 |
+
|
846 |
+
4%|ββββ | 875/20000 [27:03<9:40:29, 1.82s/it]
|
847 |
+
|
848 |
+
|
849 |
+
|
850 |
+
|
851 |
+
|
852 |
+
|
853 |
+
|
854 |
+
|
855 |
+
|
856 |
+
|
857 |
+
|
858 |
+
|
859 |
+
|
860 |
+
|
861 |
+
|
862 |
+
|
863 |
+
|
864 |
+
|
865 |
+
|
866 |
+
|
867 |
+
|
868 |
+
|
869 |
+
|
870 |
+
4%|ββββ | 900/20000 [27:49<9:39:54, 1.82s/it]
|
871 |
+
|
872 |
+
|
873 |
+
|
874 |
+
|
875 |
+
|
876 |
+
|
877 |
+
|
878 |
+
|
879 |
+
|
880 |
+
|
881 |
+
|
882 |
+
|
883 |
+
|
884 |
+
|
885 |
+
|
886 |
+
|
887 |
+
|
888 |
+
|
889 |
+
|
890 |
+
|
891 |
+
|
892 |
+
|
893 |
+
5%|ββββ | 924/20000 [28:32<9:40:41, 1.83s/it]
|
894 |
+
|
895 |
+
|
896 |
+
|
897 |
+
|
898 |
+
|
899 |
+
|
900 |
+
|
901 |
+
|
902 |
+
|
903 |
+
|
904 |
+
|
905 |
+
|
906 |
+
|
907 |
+
|
908 |
+
|
909 |
+
|
910 |
+
|
911 |
+
|
912 |
+
|
913 |
+
|
914 |
+
|
915 |
+
|
916 |
+
|
917 |
+
5%|ββββ | 950/20000 [29:20<9:37:26, 1.82s/it]
|
918 |
+
|
919 |
+
|
920 |
+
|
921 |
+
|
922 |
+
|
923 |
+
|
924 |
+
|
925 |
+
|
926 |
+
|
927 |
+
|
928 |
+
|
929 |
+
|
930 |
+
|
931 |
+
|
932 |
+
|
933 |
+
|
934 |
+
|
935 |
+
|
936 |
+
|
937 |
+
|
938 |
+
|
939 |
+
|
940 |
+
|
941 |
+
5%|ββββ | 975/20000 [30:05<9:34:44, 1.81s/it]
|
942 |
+
|
943 |
+
|
944 |
+
|
945 |
+
|
946 |
+
|
947 |
+
|
948 |
+
|
949 |
+
|
950 |
+
|
951 |
+
|
952 |
+
|
953 |
+
|
954 |
+
|
955 |
+
|
956 |
+
|
957 |
+
|
958 |
+
|
959 |
+
|
960 |
+
|
961 |
+
|
962 |
+
|
963 |
+
|
964 |
+
5%|ββββ | 1000/20000 [30:51<9:35:10, 1.82s/it][INFO|trainer.py:3304] 2024-04-24 17:14:21,632 >> Saving model checkpoint to ./checkpoint-1000
|
965 |
+
[INFO|configuration_utils.py:471] 2024-04-24 17:14:21,635 >> Configuration saved in ./checkpoint-1000/config.json
|
966 |
+
[INFO|configuration_utils.py:697] 2024-04-24 17:14:21,638 >> Configuration saved in ./checkpoint-1000/generation_config.json
|
967 |
+
{'loss': 1.5546, 'grad_norm': 1.2265625, 'learning_rate': 9.743589743589744e-05, 'epoch': 0.25}
|
968 |
+
[INFO|modeling_utils.py:2590] 2024-04-24 17:14:25,965 >> Model weights saved in ./checkpoint-1000/model.safetensors
|
969 |
+
[INFO|tokenization_utils_base.py:2488] 2024-04-24 17:14:25,975 >> tokenizer config file saved in ./checkpoint-1000/tokenizer_config.json
|
970 |
+
[INFO|tokenization_utils_base.py:2497] 2024-04-24 17:14:25,977 >> Special tokens file saved in ./checkpoint-1000/special_tokens_map.json
|
971 |
+
[INFO|tokenization_utils_base.py:2488] 2024-04-24 17:14:35,995 >> tokenizer config file saved in ./tokenizer_config.json
|
972 |
+
[INFO|tokenization_utils_base.py:2497] 2024-04-24 17:14:35,997 >> Special tokens file saved in ./special_tokens_map.json
|
973 |
+
/fsx/sanchit/miniconda3/envs/venv/lib/python3.11/site-packages/torch/utils/checkpoint.py:144: UserWarning: Tensor arguments, excluding CPU tensors, are detected on at least two types of devices. Device state will only be saved for devices of a single device type, and the remaining devices will be ignored. Consequently, if any checkpointed functions involve randomness, this may result in incorrect gradients. (Note that if CUDA devices are among the devices detected, it will be prioritized; otherwise, the first device encountered will be selected.)
|
974 |
+
warnings.warn(
|
975 |
+
|
976 |
+
|
977 |
+
|
978 |
+
|
979 |
+
|
980 |
+
|
981 |
+
|
982 |
+
|
983 |
+
|
984 |
+
|
985 |
+
|
986 |
+
|
987 |
+
|
988 |
+
|
989 |
+
|
990 |
+
|
991 |
+
|
992 |
+
|
993 |
+
|
994 |
+
|
995 |
+
|
996 |
+
|
997 |
+
5%|ββββ | 1026/20000 [31:52<9:35:24, 1.82s/it]
|
998 |
+
|
999 |
+
|
1000 |
+
|
1001 |
+
|
wandb/run-20240424_164324-xfbnm7qo/files/wandb-summary.json
CHANGED
@@ -1 +1 @@
|
|
1 |
-
{"train/loss":
|
|
|
1 |
+
{"train/loss": 1.5537, "train/grad_norm": 1.171875, "train/learning_rate": 9.730769230769232e-05, "train/epoch": 0.25, "train/global_step": 1025, "_timestamp": 1713978921.4080203, "_runtime": 1916.8660142421722, "_step": 41}
|
wandb/run-20240424_164324-xfbnm7qo/logs/debug-internal.log
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
wandb/run-20240424_164324-xfbnm7qo/run-xfbnm7qo.wandb
CHANGED
Binary files a/wandb/run-20240424_164324-xfbnm7qo/run-xfbnm7qo.wandb and b/wandb/run-20240424_164324-xfbnm7qo/run-xfbnm7qo.wandb differ
|
|