Upload /README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -141,20 +141,17 @@ You can run the models on [dreamgen.com](https://dreamgen.com) for free — you
|
|
141 |
- **LM Studio**
|
142 |
- [Config](configs/lmstudio/preset.json)
|
143 |
- Just like ChatML, just changed "assistant" to "text" role.
|
|
|
144 |
- **HuggingFace**
|
145 |
- [Chat template](tokenizer_config.json#L51)
|
146 |
- Just like ChatML, just changed "assistant" to "text" role.
|
147 |
|
148 |
## Known Issues
|
149 |
|
150 |
-
- **34B tokenization**:
|
151 |
-
- There seems to be a mismatch between the tokenizer of the base and fine-tuned model. It's unclear whether this also affected training, or whether it's just incorrectly saved tokenizer (you can see `tokenizer.json` was not saved ([bug report](https://github.com/OpenAccess-AI-Collective/axolotl/issues/1322))).
|
152 |
-
- This affects BOS and EOS (which aren't really used by Yi) and the tokenization of the first input token.
|
153 |
-
- Overall impact should be minor.
|
154 |
- **34B repetition**:
|
155 |
- The 34B sometimes gets stuck repeating the same word, or synonyms. This seems to be a common problem across various Yi 34B fine-tunes.
|
156 |
-
- **GGUF
|
157 |
-
- The tokenization might be messed up. Some users reported that `<|im_start|>` and `<|im_end|>` are tokenized as multiple tokens.
|
158 |
|
159 |
## License
|
160 |
|
|
|
141 |
- **LM Studio**
|
142 |
- [Config](configs/lmstudio/preset.json)
|
143 |
- Just like ChatML, just changed "assistant" to "text" role.
|
144 |
+
- **There's a bug** in LM Studio if you delete a message or click "Continue", [see here for details](https://discord.com/channels/1110598183144399058/1212665261128417280/1212665261128417280).
|
145 |
- **HuggingFace**
|
146 |
- [Chat template](tokenizer_config.json#L51)
|
147 |
- Just like ChatML, just changed "assistant" to "text" role.
|
148 |
|
149 |
## Known Issues
|
150 |
|
|
|
|
|
|
|
|
|
151 |
- **34B repetition**:
|
152 |
- The 34B sometimes gets stuck repeating the same word, or synonyms. This seems to be a common problem across various Yi 34B fine-tunes.
|
153 |
+
- **GGUF**:
|
154 |
+
- The tokenization might be messed up. Some users reported that `<|im_start|>` and `<|im_end|>` are tokenized as multiple tokens. Also llama.cpp may not tokenize correctly (the Yi tokenizer is subtly different from the Llama 2 tokenizer).
|
155 |
|
156 |
## License
|
157 |
|