metadata

license: apache-2.0
base_model: facebook/wav2vec2-base-960h
tags:
  - generated_from_trainer
datasets:
  - ami
metrics:
  - wer
model-index:
  - name: 6e-5_4000eval
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: ami
          type: ami
          config: ihm
          split: None
          args: ihm
        metrics:
          - name: Wer
            type: wer
            value: 0.2470857142857143

6e-5_4000eval

This model is a fine-tuned version of facebook/wav2vec2-base-960h on the ami dataset. It achieves the following results on the evaluation set:

Loss: 0.8508
Wer: 0.2471

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 6e-05
train_batch_size: 32
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
training_steps: 4000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
No log	7.5758	250	5.4590	0.9995
9.3761	15.1515	500	3.7020	0.9995
9.3761	22.7273	750	3.0706	0.9995
3.2176	30.3030	1000	3.0517	0.9995
3.2176	37.8788	1250	1.8920	0.7721
2.0444	45.4545	1500	1.3641	0.3488
2.0444	53.0303	1750	1.1031	0.2779
0.8363	60.6061	2000	1.1269	0.2679
0.8363	68.1818	2250	1.0291	0.2656
0.6824	75.7576	2500	0.9712	0.2629
0.6824	83.3333	2750	0.8902	0.2619
0.5956	90.9091	3000	0.8432	0.2441
0.5956	98.4848	3250	0.8714	0.2485
0.4071	106.0606	3500	0.8222	0.2478
0.4071	113.6364	3750	0.8398	0.2501
0.4479	121.2121	4000	0.8508	0.2471

Framework versions

Transformers 4.42.4
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1