Token Classification
Collection
12 items
•
Updated
This model is a fine-tuned version of bert-base-cased on the twitter_pos_vcb dataset. It achieves the following results on the evaluation set:
Token | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
$ | 0.0 | 0.0 | 0.0 | 3 |
'' | 0.9312320916905444 | 0.9530791788856305 | 0.9420289855072465 | 341 |
( | 0.9791666666666666 | 0.9591836734693877 | 0.9690721649484536 | 196 |
) | 0.960167714884696 | 0.9703389830508474 | 0.9652265542676501 | 472 |
, | 0.9988979501873485 | 0.9993384785005512 | 0.9991181657848325 | 4535 |
. | 0.9839189708141322 | 0.9894762249577601 | 0.9866897730281368 | 20715 |
: | 0.9926405887528997 | 0.9971072719967858 | 0.9948689168604183 | 12445 |
Cc | 0.9991067440821796 | 0.9986607142857142 | 0.9988836793927215 | 4480 |
Cd | 0.9903884661593912 | 0.9899919935948759 | 0.9901901901901902 | 2498 |
Dt | 0.9981148589510537 | 0.9976446837146703 | 0.9978797159492478 | 14860 |
Ex | 0.9142857142857143 | 0.9846153846153847 | 0.9481481481481482 | 65 |
Fw | 1.0 | 0.1 | 0.18181818181818182 | 10 |
Ht | 0.999877541023757 | 0.9997551120362435 | 0.9998163227820978 | 8167 |
In | 0.9960399353003514 | 0.9954846981437092 | 0.9957622393219583 | 17939 |
Jj | 0.9812470698546648 | 0.9834756049808129 | 0.9823600735322877 | 12769 |
Jjr | 0.9304511278195489 | 0.9686888454011742 | 0.9491850431447747 | 511 |
Jjs | 0.9578414839797639 | 0.9726027397260274 | 0.9651656754460493 | 584 |
Md | 0.9901398761751892 | 0.9908214777420835 | 0.990480559697213 | 4358 |
Nn | 0.9810285563194078 | 0.9819697621331922 | 0.9814989335846437 | 30227 |
Nnp | 0.9609722697706266 | 0.9467116357504216 | 0.9537886510363575 | 8895 |
Nnps | 1.0 | 0.037037037037037035 | 0.07142857142857142 | 27 |
Nns | 0.9697771061579146 | 0.9776564681985528 | 0.9737008471361739 | 7877 |
Pos | 0.9977272727272727 | 0.984304932735426 | 0.9909706546275394 | 446 |
Prp | 0.9983503349829983 | 0.9985184187487373 | 0.9984343697917544 | 29698 |
Prp$ | 0.9974262182566919 | 0.9974262182566919 | 0.9974262182566919 | 5828 |
Rb | 0.9939770374552983 | 0.9929802569727358 | 0.9934783971906942 | 15955 |
Rbr | 0.9058823529411765 | 0.8191489361702128 | 0.8603351955307263 | 94 |
Rbs | 0.92 | 1.0 | 0.9583333333333334 | 69 |
Rp | 0.9802197802197802 | 0.9903774981495189 | 0.9852724594992636 | 1351 |
Rt | 0.9995065383666419 | 0.9996298581122763 | 0.9995681944358769 | 8105 |
Sym | 0.0 | 0.0 | 0.0 | 9 |
To | 0.9984649496844619 | 0.9989761092150171 | 0.9987204640450398 | 5860 |
Uh | 0.9614460148062687 | 0.9507510933637574 | 0.9560686457287633 | 10518 |
Url | 1.0 | 0.9997242900468707 | 0.9998621260168207 | 3627 |
Usr | 0.9999025388626285 | 1.0 | 0.9999512670565303 | 20519 |
Vb | 0.9619302598929085 | 0.9570556133056133 | 0.9594867452615125 | 15392 |
Vbd | 0.9592894152479645 | 0.9548719837907533 | 0.9570756023262255 | 5429 |
Vbg | 0.9848831077518018 | 0.984191111891797 | 0.9845369882270251 | 5693 |
Vbn | 0.9053408597481546 | 0.9164835164835164 | 0.910878112712975 | 2275 |
Vbp | 0.963605718209626 | 0.9666228317364894 | 0.9651119169688633 | 15969 |
Vbz | 0.9881780250347705 | 0.9861207494795281 | 0.9871483153872872 | 5764 |
Wdt | 0.8666666666666667 | 0.9285714285714286 | 0.896551724137931 | 14 |
Wp | 0.99125 | 0.993734335839599 | 0.9924906132665832 | 1596 |
Wrb | 0.9963488843813387 | 0.9979683055668428 | 0.9971579374746244 | 2461 |
`` | 0.9481865284974094 | 0.9786096256684492 | 0.963157894736842 | 187 |
Overall
For more information on how it was created, check out the following link: https://github.com/DunnBC22/NLP_Projects/blob/main/Token%20Classification/Monolingual/StrombergNLP-Twitter_pos_vcb/NER%20Project%20Using%20StrombergNLP%20Twitter_pos_vcb%20Dataset%20with%20PosEval.ipynb.
This model is intended to demonstrate my ability to solve a complex problem using technology.
Dataset Source: https://huggingface.co/datasets/strombergnlp/twitter_pos_vcb
The following hyperparameters were used during training: