metadata

title: Audio Summarizer
emoji: 📚
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 5.6.0
app_file: app.py
pinned: false
short_description: Transcribes an audio and creates a summary

Limitations

I have tested the application with audio files of varying lengths. Initially, I attempted processing audios of 1 to 2 hours, but due to hardware constraints, my PC was unable to handle files of that size effectively. S After testing, I found that the application operates best with audio files under 15 minutes, although this 15 minutes should be consider the longest length I would recommend, since the app processes shorter audios much more effectively. For example, a stereo audio file that is around 15 minutes long usually takes about 6 to 8 minutes to process, but again i wouldn't recommend suing this model for such audio files. This processing time may vary depending on the capabilities of your PC.

Estimated times

5 min file: 80 to 90 seconds
10 min file: 190 seconds to 230 seconds
15 min file: 350 seconds to 400 seconds

Main Use

The primary purpose of this application is to facilitate the process of creating summaries from videos by processing their audio. This solves a common problem for those who find note-taking from videos to be slow and tedious

Additionally, it can serve as a tool for obtaining a general understanding of video content if oyu have downloaded the audio of it. Users can then utilize the summary as a base to create their own notes.

This app converts audio to text and then text to audio in order to create the summary.

Models

The application uses two primary models: Facebook BART and Facebook Wav2Vec, which were selected after extensive experimentation with various alternatives. Other models, such as Google T5, were tested but did not yield comparable performance or accuracy for this specific use case.

Facebook BART: A transformer-based model, that has a great performance in text summarization tasks.
Facebook Wav2Vec: A speech recognition model, which efficiently converts audio into accurate text transcriptions.