File size: 2,457 Bytes
4f6613a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# FunASR Command Line Interface

This tool provides a command-line interface for separating vocals from instrumental tracks, converting videos to audio, and performing speech-to-text transcription on the resulting audio files.

## Requirements

- Python >= 3.10
- PyTorch <= 2.3.1
- ffmpeg, pydub, audio-separator[gpu].

## Installation

Install the required packages:

```bash
pip install -e .[stable]
```

Make sure you have `ffmpeg` installed and available in your `PATH`.

## Usage

### Basic Usage

To run the tool with default settings:

```bash
python tools/sensevoice/fun_asr.py --audio-dir <audio_directory> --save-dir <output_directory>
```

## Options

|          Option           |                                  Description                                  |
| :-----------------------: | :---------------------------------------------------------------------------: |
|        --audio-dir        |                  Directory containing audio or video files.                   |
|        --save-dir         |                   Directory to save processed audio files.                    |
|         --device          |         Device to use for processing. Options: cuda (default) or cpu.         |
|        --language         |                Language of the transcription. Default is auto.                |
| --max_single_segment_time | Maximum duration of a single audio segment in milliseconds. Default is 20000. |
|          --punc           |                        Enable punctuation prediction.                         |
|         --denoise         |                  Enable noise reduction (vocal separation).                   |

## Example

To process audio files in the directory `path/to/audio` and save the output to `path/to/output`, with punctuation and noise reduction enabled:

```bash
python tools/sensevoice/fun_asr.py --audio-dir path/to/audio --save-dir path/to/output --punc --denoise
```

## Additional Notes

- The tool supports `both audio and video files`. Videos will be converted to audio automatically.
- If the `--denoise` option is used, the tool will perform vocal separation to isolate the vocals from the instrumental tracks.
- The script will automatically create necessary directories in the `--save-dir`.

## Troubleshooting

If you encounter any issues, make sure all dependencies are correctly installed and configured. For more detailed troubleshooting, refer to the documentation of each dependency.