Wav2Vec2 Adult/Child Speech Classifier
Wav2Vec2 Adult/Child Speech Classifier is an audio classification model based on the wav2vec 2.0 architecture. This model is a fine-tuned version of wav2vec2-base on a private adult/child speech classification dataset.
This model was trained using HuggingFace's PyTorch framework. All training was done on a Tesla P100, provided by Kaggle. Training metrics were logged via Tensorboard.
Model
Model | #params | Arch. | Training/Validation data (text) |
---|---|---|---|
wav2vec2-adult-child-cls |
91M | wav2vec 2.0 | Adult/Child Speech Classification Dataset |
Evaluation Results
The model achieves the following results on evaluation:
Dataset | Loss | Accuracy | F1 |
---|---|---|---|
Adult/Child Speech Classification | 0.1682 | 95.80% | 0.9618 |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
learning_rate
: 3e-05train_batch_size
: 32eval_batch_size
: 32seed
: 42optimizer
: Adam withbetas=(0.9,0.999)
andepsilon=1e-08
lr_scheduler_type
: linearlr_scheduler_warmup_ratio
: 0.1num_epochs
: 5
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
---|---|---|---|---|---|
0.2709 | 1.0 | 384 | 0.2616 | 0.9104 | 0.9142 |
0.2112 | 2.0 | 768 | 0.1826 | 0.9386 | 0.9421 |
0.1755 | 3.0 | 1152 | 0.1898 | 0.9354 | 0.9428 |
0.0915 | 4.0 | 1536 | 0.1682 | 0.9580 | 0.9618 |
0.1042 | 5.0 | 1920 | 0.1717 | 0.9511 | 0.9554 |
Disclaimer
Do consider the biases which came from pre-training datasets that may be carried over into the results of this model.
Authors
Wav2Vec2 Adult/Child Speech Classifier was trained and evaluated by Wilson Wongso. All computation and development are done on Kaggle.
Framework versions
- Transformers 4.16.2
- Pytorch 1.10.2+cu102
- Datasets 1.18.3
- Tokenizers 0.10.3
- Downloads last month
- 139
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.