--- library_name: transformers datasets: - reazon-research/reazonspeech - joujiboi/japanese-anime-speech language: - ja - en metrics: - cer pipeline_tag: automatic-speech-recognition --- # Model Card for Model ID ![image](./cover_image.jpeg) Fine tunned ASR model from [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2). This model aimed to transcribe japanese audio especially visual novel. # WaifuModel Collections - [TTS](https://huggingface.co/spow12/visual_novel_tts) - [Chat](https://huggingface.co/spow12/ChatWaifu_v1.2.1) - [ASR](https://huggingface.co/spow12/Visual-novel-transcriptor) # Unified Demo [WaifuAssitant](https://github.com/yw0nam/WaifuAssistant) ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** spow12(yw_nam) - **Shared by :** spow12(yw_nam) - **Model type:** Seq2Seq - **Language(s) (NLP):** japanese - **Finetuned from model :** [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2). ## Uses ```python from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq import librosa processor = AutoProcessor.from_pretrained('spow12/Visual-novel-transcriptor', language="ja", task="transcribe") model = AutoModelForSpeechSeq2Seq.from_pretrained('spow12/Visual-novel-transcriptor').cuda() model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="ja", task="transcribe") data, _ = librosa.load(wav_path, sr=16000) input_features = processor(data, sampling_rate=16000, return_tensors="pt").input_features.cuda() predicted_ids = model.generate(input_features) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) print(transcription[0]) ``` ## Bias, Risks, and Limitations This model trained by japanese dataset included visual novel which contain nsfw content. ## Use & Credit This model is currently available for non-commercial use only. Also, since I'm not detailed in licensing, I hope you use it responsibly. By sharing this model, I hope to contribute to the research efforts of our community (the open-source community and anime persons). ## Citation ```bibtex @misc {Visual-novel-transcriptor, author = { YoungWoo Nam }, title = { Visual-novel-transcriptor }, year = 2024, url = { https://huggingface.co/spow12/Visual-novel-transcriptor }, publisher = { Hugging Face } } ```