metadata

title: Audio Summarizer
emoji: 🔥
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.6.0
app_file: app.py
pinned: false
short_description: Transcribes an audio and creates a summary

Limitations

I have tested the application with audio files of varying lengths. Initially, I attempted processing audios of 1 to 2 hours, but due to hardware constraints, my PC was unable to handle files of that size effectively.

After testing, I found that the application operates best with audio files under 20 minutes, although this 20 minutes should be consider the longest length I would recommend, since the app processes shorter audios much more effectively. For example, a stereo audio file that is around 20 minutes long usually takes about 15 to 18 minutes to process. This processing time may vary depending on the capabilities of your PC.

For users with high-performance computers, it may be possible to process longer audio files. However, for consistent and reliable results, I recommend audios around the length of 10 to 15 minutes.

Main Use

The primary purpose of this application is to facilitate the process of creating summaries from videos by processing their audio. This solves a common problem for those who find note-taking from videos to be slow and tedious

Additionally, it can serve as a tool for obtaining a general understanding of video content if oyu have downloaded the audio of it. Users can then utilize the summary as a base to create their own notes.

This app converts audio to text and then text to audio in order to create the summary.

Models

The application uses two primary models: Facebook BART and Facebook Wav2Vec, which were selected after extensive experimentation with various alternatives. Other models, such as Google T5, were tested but did not yield comparable performance or accuracy for this specific use case.

Facebook BART: A transformer-based model, that has a great performance in text summarization tasks.
Facebook Wav2Vec: A speech recognition model, which efficiently converts audio into accurate text transcriptions.