Minami-su commited on
Commit
f13ee8a
1 Parent(s): 0a60f61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -15,11 +15,22 @@ tags:
15
  - qwen1.5
16
  - qwen2
17
  ---
18
- This is the Mistral version of [Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) model by Alibaba Cloud.
19
  The original codebase can be found at: (https://github.com/hiyouga/LLaMA-Factory/blob/main/tests/llamafy_qwen.py).
20
  I have made modifications to make it compatible with qwen1.5.
21
  This model is converted with https://github.com/Minami-su/character_AI_open/blob/main/mistral_qwen2.py
22
 
 
 
 
 
 
 
 
 
 
 
 
23
  Usage:
24
 
25
  ```python
 
15
  - qwen1.5
16
  - qwen2
17
  ---
18
+ This is the Mistral version of [Qwen1.5-0.5B-Chat](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat) model by Alibaba Cloud.
19
  The original codebase can be found at: (https://github.com/hiyouga/LLaMA-Factory/blob/main/tests/llamafy_qwen.py).
20
  I have made modifications to make it compatible with qwen1.5.
21
  This model is converted with https://github.com/Minami-su/character_AI_open/blob/main/mistral_qwen2.py
22
 
23
+ ## special
24
+
25
+ Before using this model, you need to modify modeling_mistral.py in transformers library
26
+ vim /root/anaconda3/envs/train/lib/python3.9/site-packages/transformers/models/mistral/modeling_mistral.py
27
+ find MistralAttention,
28
+ modify q,k,v,o bias=False ----->, bias=config.attention_bias
29
+ Before:
30
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62d7f90b102d144db4b4245b/AKj_fwEoLUKWZ4mViYW-q.png)
31
+ After:
32
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62d7f90b102d144db4b4245b/A2gSwq9l6Zx8X1qegtgvE.png)
33
+
34
  Usage:
35
 
36
  ```python