File size: 3,950 Bytes

be49f54

---
license: apache-2.0
base_model: facebook/wav2vec2-base
tags:
- audio-classification
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: wav2vec2-base_down_on
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# wav2vec2-base_down_on

This model is a fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the MatsRooth/down_on dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1385
- Accuracy: 0.9962

MatsRooth/down_on is the part of [superb ks](https://huggingface.co/datasets/superb)
with the labels *down* and *on*.
Superb ks is in turn derived from (Speech Commands dataset v1.0)[https://www.tensorflow.org/datasets/catalog/speech_commands].
Train/validation/test splits are as in superb ks.

## Intended uses 

MatsRooth/down_on and this model exercise methodology for creating an audio classification dataset from
local directory structures and audio files, and check whether fine tuning wav2vec2 classification with two labels works
well.

## Training procedure
Training used 'sbatch' on a cluster and the program [run_audio_classification.py](https://github.com/huggingface/transformers).
'down_on.sub' is below, start it with 'sbatch down_on.sub'.

'''
#!/bin/bash
#SBATCH -J down_on                         # Job name
#SBATCH -o down_on_%j.out                  # Name of stdout output log file (%j expands to jobID)
#SBATCH -e down_on_%j.err                  # Name of stderr output log file (%j expands to jobID)
#SBATCH -N 1                                 # Total number of nodes requested
#SBATCH -n 1                                 # Total number of cores requested
#SBATCH --mem=5000                          # Total amount of (real) memory requested (per node)
#SBATCH -t 10:00:00                          # Time limit (hh:mm:ss)
#SBATCH --partition=gpu              # Request partition for resource allocation
#SBATCH --gres=gpu:1                         # Specify a list of generic consumable resources (per node)

cd ~/ac_h
/home/mr249/env/hugh/bin/python run_audio_classification.py \
    --model_name_or_path facebook/wav2vec2-base \
    --dataset_name MatsRooth/down_on \
    --output_dir wav2vec2-base_down_on \
    --overwrite_output_dir \
    --remove_unused_columns False \
    --do_train \
    --do_eval \
    --fp16 \
    --learning_rate 3e-5 \
    --max_length_seconds 1 \
    --attention_mask False \
    --warmup_ratio 0.1 \
    --num_train_epochs 5 \
    --per_device_train_batch_size 32 \
    --gradient_accumulation_steps 4 \
    --per_device_eval_batch_size 32 \
    --dataloader_num_workers 1 \
    --logging_strategy steps \
    --logging_steps 10 \
    --evaluation_strategy epoch \
    --save_strategy epoch \
    --load_best_model_at_end True \
    --metric_for_best_model accuracy \
    --save_total_limit 3 \
    --seed 0
'''

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 0
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5.0

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
| 0.6089        | 1.0   | 29   | 0.1385          | 0.9962   |
| 0.1297        | 2.0   | 58   | 0.0513          | 0.9962   |
| 0.0835        | 3.0   | 87   | 0.0389          | 0.9885   |
| 0.058         | 4.0   | 116  | 0.0302          | 0.9923   |
| 0.0481        | 5.0   | 145  | 0.0245          | 0.9942   |


### Framework versions

- Transformers 4.31.0.dev0
- Pytorch 2.0.1+cu117
- Datasets 2.13.1
- Tokenizers 0.13.3