ccrains commited on
Commit
c85c959
1 Parent(s): 63d2d58

update files

Browse files
Files changed (1) hide show
  1. README.md +16 -48
README.md CHANGED
@@ -1,59 +1,27 @@
1
  ---
2
- license: other
3
- base_model: /apdcephfs_cq10/share_919031/larsonwang/LLaMA-Factory/save_model/train_lora_1709305042/
4
- tags:
5
- - llama-factory
6
- - full
7
- - generated_from_trainer
8
- model-index:
9
- - name: train_lora_1709346779
10
- results: []
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
 
16
- # train_lora_1709346779
17
 
18
- This model is a fine-tuned version of [/apdcephfs_cq10/share_919031/larsonwang/LLaMA-Factory/save_model/train_lora_1709305042/](https://huggingface.co//apdcephfs_cq10/share_919031/larsonwang/LLaMA-Factory/save_model/train_lora_1709305042/) on the sft_sample dataset.
19
 
20
- ## Model description
 
21
 
22
- More information needed
 
23
 
24
- ## Intended uses & limitations
 
25
 
26
- More information needed
 
27
 
28
- ## Training and evaluation data
 
29
 
30
- More information needed
31
-
32
- ## Training procedure
33
-
34
- ### Training hyperparameters
35
-
36
- The following hyperparameters were used during training:
37
- - learning_rate: 5e-05
38
- - train_batch_size: 8
39
- - eval_batch_size: 8
40
- - seed: 42
41
- - distributed_type: multi-GPU
42
- - num_devices: 8
43
- - total_train_batch_size: 64
44
- - total_eval_batch_size: 64
45
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
- - lr_scheduler_type: cosine
47
- - num_epochs: 5.0
48
- - mixed_precision_training: Native AMP
49
-
50
- ### Training results
51
-
52
-
53
-
54
- ### Framework versions
55
-
56
- - Transformers 4.38.0
57
- - Pytorch 2.0.1+cu118
58
- - Datasets 2.17.1
59
- - Tokenizers 0.15.2
 
1
  ---
2
+ license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
4
 
5
+ base_model:https://huggingface.co/google/gemma-2b
 
6
 
7
+ the language of model: chinese and english
8
 
9
+ The following uses gemma-2b (a language model that only supports English) to train a large model process that supports Chinese and English.
10
 
11
+ step 1:
12
+ Use SentencePiece(bpe) to train Chinese corpus to obtain tokenizer.model and tokenizer.vocab
13
 
14
+ step 2:
15
+ Merge the Chinese of tokenizer.model and the original of tokenizer.model
16
 
17
+ step 3:
18
+ Use the merged special_tokens_map.json, tokenizer.model, tokenizer_config.json to replace the files of the original model (such as gemma-2b)
19
 
20
+ step 4:
21
+ Use LLaMA-Factory for pre-training. Pay attention to the pre-training parameters. Resize vocab and resize embedding are required.
22
 
23
+ step 5:
24
+ Based on the model pre-trained in step 4, the instructions are fine-tuned, which significantly improves the model's ability to understand and execute instructions.
25
 
26
+ step 6:
27
+ Based on the instruction fine-tuning model, we can use this model for SFT training under different specific tasks, so that the model can perform better on specific tasks.