dreamgen
/

opus-v1-34b

@@ -141,20 +141,17 @@ You can run the models on [dreamgen.com](https://dreamgen.com) for free — you
 - **LM Studio**
   - [Config](configs/lmstudio/preset.json)
   - Just like ChatML, just changed "assistant" to "text" role.
 - **HuggingFace**
   - [Chat template](tokenizer_config.json#L51)
   - Just like ChatML, just changed "assistant" to "text" role.
 ## Known Issues
-- **34B tokenization**:
-  - There seems to be a mismatch between the tokenizer of the base and fine-tuned model. It's unclear whether this also affected training, or whether it's just incorrectly saved tokenizer (you can see `tokenizer.json` was not saved ([bug report](https://github.com/OpenAccess-AI-Collective/axolotl/issues/1322))).
-  - This affects BOS and EOS (which aren't really used by Yi) and the tokenization of the first input token.
-  - Overall impact should be minor.
 - **34B repetition**:
   - The 34B sometimes gets stuck repeating the same word, or synonyms. This seems to be a common problem across various Yi 34B fine-tunes.
-- **GGUF** / **Ooba**:
-  - The tokenization might be messed up. Some users reported that `<|im_start|>` and `<|im_end|>` are tokenized as multiple tokens.
 ## License

 - **LM Studio**
   - [Config](configs/lmstudio/preset.json)
   - Just like ChatML, just changed "assistant" to "text" role.
+  - **There's a bug** in LM Studio if you delete a message or click "Continue", [see here for details](https://discord.com/channels/1110598183144399058/1212665261128417280/1212665261128417280).
 - **HuggingFace**
   - [Chat template](tokenizer_config.json#L51)
   - Just like ChatML, just changed "assistant" to "text" role.
 ## Known Issues
 - **34B repetition**:
   - The 34B sometimes gets stuck repeating the same word, or synonyms. This seems to be a common problem across various Yi 34B fine-tunes.
+- **GGUF**:
+  - The tokenization might be messed up. Some users reported that `<|im_start|>` and `<|im_end|>` are tokenized as multiple tokens. Also llama.cpp may not tokenize correctly (the Yi tokenizer is subtly different from the Llama 2 tokenizer).
 ## License