bert-base-cased-finetuned-Stromberg_NLP_Twitter-PoS_v2

This model is a fine-tuned version of bert-base-cased on the twitter_pos_vcb dataset. It achieves the following results on the evaluation set:

Loss: 0.0502

Token	Precision	Recall	F1-Score	Support
$	0.0	0.0	0.0	3
''	0.9312320916905444	0.9530791788856305	0.9420289855072465	341
(	0.9791666666666666	0.9591836734693877	0.9690721649484536	196
)	0.960167714884696	0.9703389830508474	0.9652265542676501	472
,	0.9988979501873485	0.9993384785005512	0.9991181657848325	4535
.	0.9839189708141322	0.9894762249577601	0.9866897730281368	20715
:	0.9926405887528997	0.9971072719967858	0.9948689168604183	12445
Cc	0.9991067440821796	0.9986607142857142	0.9988836793927215	4480
Cd	0.9903884661593912	0.9899919935948759	0.9901901901901902	2498
Dt	0.9981148589510537	0.9976446837146703	0.9978797159492478	14860
Ex	0.9142857142857143	0.9846153846153847	0.9481481481481482	65
Fw	1.0	0.1	0.18181818181818182	10
Ht	0.999877541023757	0.9997551120362435	0.9998163227820978	8167
In	0.9960399353003514	0.9954846981437092	0.9957622393219583	17939
Jj	0.9812470698546648	0.9834756049808129	0.9823600735322877	12769
Jjr	0.9304511278195489	0.9686888454011742	0.9491850431447747	511
Jjs	0.9578414839797639	0.9726027397260274	0.9651656754460493	584
Md	0.9901398761751892	0.9908214777420835	0.990480559697213	4358
Nn	0.9810285563194078	0.9819697621331922	0.9814989335846437	30227
Nnp	0.9609722697706266	0.9467116357504216	0.9537886510363575	8895
Nnps	1.0	0.037037037037037035	0.07142857142857142	27
Nns	0.9697771061579146	0.9776564681985528	0.9737008471361739	7877
Pos	0.9977272727272727	0.984304932735426	0.9909706546275394	446
Prp	0.9983503349829983	0.9985184187487373	0.9984343697917544	29698
Prp$	0.9974262182566919	0.9974262182566919	0.9974262182566919	5828
Rb	0.9939770374552983	0.9929802569727358	0.9934783971906942	15955
Rbr	0.9058823529411765	0.8191489361702128	0.8603351955307263	94
Rbs	0.92	1.0	0.9583333333333334	69
Rp	0.9802197802197802	0.9903774981495189	0.9852724594992636	1351
Rt	0.9995065383666419	0.9996298581122763	0.9995681944358769	8105
Sym	0.0	0.0	0.0	9
To	0.9984649496844619	0.9989761092150171	0.9987204640450398	5860
Uh	0.9614460148062687	0.9507510933637574	0.9560686457287633	10518
Url	1.0	0.9997242900468707	0.9998621260168207	3627
Usr	0.9999025388626285	1.0	0.9999512670565303	20519
Vb	0.9619302598929085	0.9570556133056133	0.9594867452615125	15392
Vbd	0.9592894152479645	0.9548719837907533	0.9570756023262255	5429
Vbg	0.9848831077518018	0.984191111891797	0.9845369882270251	5693
Vbn	0.9053408597481546	0.9164835164835164	0.910878112712975	2275
Vbp	0.963605718209626	0.9666228317364894	0.9651119169688633	15969
Vbz	0.9881780250347705	0.9861207494795281	0.9871483153872872	5764
Wdt	0.8666666666666667	0.9285714285714286	0.896551724137931	14
Wp	0.99125	0.993734335839599	0.9924906132665832	1596
Wrb	0.9963488843813387	0.9979683055668428	0.9971579374746244	2461
``	0.9481865284974094	0.9786096256684492	0.963157894736842	187

Overall

Accuracy: 0.9853
Macro avg:
- Precision: 0.9296417163691048
- Recall: 0.8931046018294694
- F1-score: 0.8930917459781836
- Support: 308833
Weighted avg:
- Precision: 0.985306457604231
- Recall: 0.9853480683735223
- F1-Score: 0.9852689858931941
- Support: 308833

Model description

For more information on how it was created, check out the following link: https://github.com/DunnBC22/NLP_Projects/blob/main/Token%20Classification/Monolingual/StrombergNLP-Twitter_pos_vcb/NER%20Project%20Using%20StrombergNLP%20Twitter_pos_vcb%20Dataset%20with%20PosEval.ipynb.

Intended uses & limitations

This model is intended to demonstrate my ability to solve a complex problem using technology.

Training and evaluation data

Dataset Source: https://huggingface.co/datasets/strombergnlp/twitter_pos_vcb

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2

Training results

Framework versions

Transformers 4.28.1
Pytorch 2.0.0
Datasets 2.11.0
Tokenizers 0.13.3

DunnBC22
/

bert-base-cased-finetuned-Stromberg_NLP_Twitter-PoS_v2