Spaces:
Running
on
Zero
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 394: character maps to <undefined>
On two seperate local installs, I get this error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 394: character maps to
Full dump:
C:\Users\OneDrive\Desktop\E2-F5-TTS>python app_local.py
WARNING: You are running this unofficial E2/F5 TTS demo locally, it may not be as up-to-date as the hosted version (https://huggingface.co/spaces/mrfakename/E2-F5-TTS)
config.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1.26k/1.26k [00:00<?, ?B/s]
C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\file_download.py:147: UserWarning: huggingface_hub
cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\jorda.cache\huggingface\hub\models--openai--whisper-large-v3-turbo. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the HF_HUB_DISABLE_SYMLINKS_WARNING
environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
warnings.warn(message)
model.safetensors: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1.62G/1.62G [01:00<00:00, 26.9MB/s]
generation_config.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 3.77k/3.77k [00:00<?, ?B/s]
tokenizer_config.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 283k/283k [00:00<00:00, 1.93MB/s]
vocab.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1.04M/1.04M [00:00<00:00, 3.45MB/s]
tokenizer.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 2.71M/2.71M [00:00<00:00, 5.89MB/s]
merges.txt: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 494k/494k [00:00<00:00, 2.23MB/s]
normalizer.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 52.7k/52.7k [00:00<?, ?B/s]
added_tokens.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 34.6k/34.6k [00:00<?, ?B/s]
special_tokens_map.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 2.19k/2.19k [00:00<?, ?B/s]
preprocessor_config.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 340/340 [00:00<?, ?B/s]
model_1200000.pt: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1.35G/1.35G [00:50<00:00, 26.9MB/s]
C:\Users\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\file_download.py:147: UserWarning: huggingface_hub
cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users.cache\huggingface\hub\models--SWivid--F5-TTS. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the HF_HUB_DISABLE_SYMLINKS_WARNING
environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
warnings.warn(message)
C:\Users\OneDrive\Desktop\E2-F5-TTS\app_local.py:49: FutureWarning: You are using torch.load
with weights_only=False
(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only
will be flipped to True
. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals
. We recommend you start setting weights_only=True
for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(str(cached_path(f"hf://SWivid/F5-TTS/{exp_name}/model_{ckpt_step}.pt")), map_location=device)
Traceback (most recent call last):
File "C:\Users\OneDrive\Desktop\E2-F5-TTS\app_local.py", line 78, in
F5TTS_ema_model, F5TTS_base_model = load_model("F5TTS_Base", DiT, F5TTS_model_cfg, 1200000)
File "C:\Users\OneDrive\Desktop\E2-F5-TTS\app_local.py", line 50, in load_model
vocab_char_map, vocab_size = get_tokenizer("Emilia_ZH_EN", "pinyin")
File "C:\Users\OneDrive\Desktop\E2-F5-TTS\model\utils.py", line 139, in get_tokenizer
for i, char in enumerate(f):
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 394: character maps to
I imagine this is a user error, apologies!
The solution is to modify utils.py line 137 to this:
with open (f"data/{dataset_name}_{tokenizer}/vocab.txt", "r", encoding="utf-8") as f: