MatsRooth
/

wav2vec2-base_down_on

Audio Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-base_down_on / README.md

Mats Rooth

README and json

be49f54 over 1 year ago

|

3.95 kB

	---
	license: apache-2.0
	base_model: facebook/wav2vec2-base
	tags:
	- audio-classification
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: wav2vec2-base_down_on
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# wav2vec2-base_down_on

	This model is a fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the MatsRooth/down_on dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1385
	- Accuracy: 0.9962

	MatsRooth/down_on is the part of [superb ks](https://huggingface.co/datasets/superb)
	with the labels down and on.
	Superb ks is in turn derived from (Speech Commands dataset v1.0)[https://www.tensorflow.org/datasets/catalog/speech_commands].
	Train/validation/test splits are as in superb ks.

	## Intended uses

	MatsRooth/down_on and this model exercise methodology for creating an audio classification dataset from
	local directory structures and audio files, and check whether fine tuning wav2vec2 classification with two labels works
	well.

	## Training procedure
	Training used 'sbatch' on a cluster and the program [run_audio_classification.py](https://github.com/huggingface/transformers).
	'down_on.sub' is below, start it with 'sbatch down_on.sub'.

	'''
	#!/bin/bash
	#SBATCH -J down_on # Job name
	#SBATCH -o down_on_%j.out # Name of stdout output log file (%j expands to jobID)
	#SBATCH -e down_on_%j.err # Name of stderr output log file (%j expands to jobID)
	#SBATCH -N 1 # Total number of nodes requested
	#SBATCH -n 1 # Total number of cores requested
	#SBATCH --mem=5000 # Total amount of (real) memory requested (per node)
	#SBATCH -t 10:00:00 # Time limit (hh:mm:ss)
	#SBATCH --partition=gpu # Request partition for resource allocation
	#SBATCH --gres=gpu:1 # Specify a list of generic consumable resources (per node)

	cd ~/ac_h
	/home/mr249/env/hugh/bin/python run_audio_classification.py \
	--model_name_or_path facebook/wav2vec2-base \
	--dataset_name MatsRooth/down_on \
	--output_dir wav2vec2-base_down_on \
	--overwrite_output_dir \
	--remove_unused_columns False \
	--do_train \
	--do_eval \
	--fp16 \
	--learning_rate 3e-5 \
	--max_length_seconds 1 \
	--attention_mask False \
	--warmup_ratio 0.1 \
	--num_train_epochs 5 \
	--per_device_train_batch_size 32 \
	--gradient_accumulation_steps 4 \
	--per_device_eval_batch_size 32 \
	--dataloader_num_workers 1 \
	--logging_strategy steps \
	--logging_steps 10 \
	--evaluation_strategy epoch \
	--save_strategy epoch \
	--load_best_model_at_end True \
	--metric_for_best_model accuracy \
	--save_total_limit 3 \
	--seed 0
	'''

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 0
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 0.6089 \| 1.0 \| 29 \| 0.1385 \| 0.9962 \|
	\| 0.1297 \| 2.0 \| 58 \| 0.0513 \| 0.9962 \|
	\| 0.0835 \| 3.0 \| 87 \| 0.0389 \| 0.9885 \|
	\| 0.058 \| 4.0 \| 116 \| 0.0302 \| 0.9923 \|
	\| 0.0481 \| 5.0 \| 145 \| 0.0245 \| 0.9942 \|


	### Framework versions

	- Transformers 4.31.0.dev0
	- Pytorch 2.0.1+cu117
	- Datasets 2.13.1
	- Tokenizers 0.13.3