ilsilfverskiold commited on
Commit
3c122de
1 Parent(s): 45559ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -21
README.md CHANGED
@@ -15,43 +15,64 @@ model-index:
15
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
- # news_category_classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
- This model is a fine-tuned version of [KB/bert-base-swedish-cased](https://huggingface.co/KB/bert-base-swedish-cased) on the None dataset.
21
  It achieves the following results on the evaluation set:
22
  - Loss: 0.8030
23
  - Accuracy: 0.7431
24
  - F1: 0.7474
25
  - Precision: 0.7695
26
  - Recall: 0.7431
27
- - Accuracy Label Arts, culture, entertainment and media: 0.6842
28
- - Accuracy Label Conflict, war and peace: 0.7351
29
- - Accuracy Label Crime, law and justice: 0.8918
30
- - Accuracy Label Disaster, accident, and emergency incident: 0.8699
31
- - Accuracy Label Economy, business, and finance: 0.6893
32
- - Accuracy Label Environment: 0.4483
33
- - Accuracy Label Health: 0.7222
34
- - Accuracy Label Human interest: 0.3182
35
- - Accuracy Label Labour: 0.5
36
- - Accuracy Label Lifestyle and leisure: 0.5556
37
- - Accuracy Label Politics: 0.7909
38
- - Accuracy Label Religion: 0.0
39
- - Accuracy Label Science and technology: 0.4583
40
- - Accuracy Label Society: 0.3538
41
- - Accuracy Label Sport: 0.9615
42
- - Accuracy Label Weather: 0.0
 
 
43
 
44
  ## Model description
45
 
46
- More information needed
 
47
 
48
  ## Intended uses & limitations
49
 
50
- More information needed
51
 
52
  ## Training and evaluation data
53
 
54
- More information needed
55
 
56
  ## Training procedure
57
 
 
15
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
+ # News Category Classification for IPTC NewsCodes
19
+
20
+ This model is a fine-tuned version of [KB/bert-base-swedish-cased](https://huggingface.co/KB/bert-base-swedish-cased) on a private dataset.
21
+
22
+ Built from a limited set of English, Swedish and Norwegian titles to classify news content within 16 categories as specified by the IPTC NewsCodes.
23
+
24
+ The model has been fine-tuned on a dataset that is greatly skewed, but has been slightly augmented to stabilize it.
25
+
26
+ # Test examples
27
+
28
+ **Input:** Mann siktet for drapsforsøk på Slovakias statsministeren
29
+ **Output:** crime, law and justice
30
+
31
+ **Input:** Tre døde i kioskbrann i Tyskland
32
+ Output: disaster, accident, and emergency incident
33
+
34
+ **Input:** Kultfilm får Netflix-oppfølger. Kultfilmen «Happy Gilmore» fra 1996 får en oppfølger på Netflix. Det røper strømmetjenesten selv på X, tidligere Twitter. –Happy Gilmore er tilbake!
35
+ **Output:** arts, culture, entertainment and media
36
+
37
+ # Performance
38
 
 
39
  It achieves the following results on the evaluation set:
40
  - Loss: 0.8030
41
  - Accuracy: 0.7431
42
  - F1: 0.7474
43
  - Precision: 0.7695
44
  - Recall: 0.7431
45
+
46
+ See the performance (accuracy) for each label below:
47
+ - Arts, culture, entertainment and media: 0.6842
48
+ - Conflict, war and peace: 0.7351
49
+ - Crime, law and justice: 0.8918
50
+ - Disaster, accident, and emergency incident: 0.8699
51
+ - Economy, business, and finance: 0.6893
52
+ - Environment: 0.4483
53
+ - Health: 0.7222
54
+ - Human interest: 0.3182
55
+ - Labour: 0.5
56
+ - Lifestyle and leisure: 0.5556
57
+ - Politics: 0.7909
58
+ - Science and technology: 0.4583
59
+ - Society: 0.3538
60
+ - Sport: 0.9615
61
+ - Weather: 1.0
62
+ - Religion: 0.0
63
 
64
  ## Model description
65
 
66
+ The model is intended to categorize Norwegian, Swedish and English news content within the specified 16 categories but is a test model for demonstration purposes.
67
+ It needs more data within several categories to provide 100% value but it will outperform Claude Haiku and GPT-3.5 on this use case.
68
 
69
  ## Intended uses & limitations
70
 
71
+ Use it to categorize news texts. Only set the category if the value is at least 60% for the label, otherwise the model is uncertain.
72
 
73
  ## Training and evaluation data
74
 
75
+ Trained with the trainer, setting a learning rate of 2e-05 and batch size of 16 for 3 epochs.
76
 
77
  ## Training procedure
78