kubota commited on
Commit
f41ae7b
1 Parent(s): a80ae18

update model card README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -36
README.md CHANGED
@@ -1,48 +1,66 @@
1
  ---
2
- license: cc-by-sa-4.0
3
- datasets:
4
- - kubota/defamation-japanese-twitter
5
- language:
6
- - ja
7
- pipeline_tag: text-classification
8
- widget:
9
- - text: お前のことを殺すぞ
10
- - text: 本当に不細工だなぁ
11
- - text: あの人は殺人を犯した犯罪者らしい
12
  ---
13
 
 
 
 
14
  # luke-large-defamation-detection-japanese
15
- # 日本語誹謗中傷検出器
16
 
17
- This model is a fine-tuned version of [studio-ousia/luke-japanese-large](https://huggingface.co/studio-ousia/luke-japanese-large) for the Japanese language finetuned for automatic defamation detection.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- The original foundation model was finetuned on a balanced dataset created by unifying two datasets:
20
- - [![Generic badge](https://img.shields.io/badge/Dataset-DefamationJapaneseTwitter-red.svg)](https://huggingface.co/datasets/kubota/defamation-japanese-twitter)
21
- - `DefamationJapaneseYouTube` : TBA
22
 
23
- <b>Labels</b>:\
24
- 0 -> "中傷性のない発言"\
25
- 1 -> "脅迫的な発言"\
26
- 2 -> "侮蔑的な発言"\
27
- 3"-> "名誉を低下させる発言"
 
 
 
 
28
 
29
- ## Example Pipeline
30
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kubotaissei/defamation_japanese_twitter/blob/master/notebooks/pipeline_example.ipynb)
31
- ```python
32
- # !pip install transformers==4.26 sentencepiece
33
- from transformers import pipeline
34
- pipe = pipeline(model="kubota/luke-large-defamation-detection-japanese")
35
- pipe("あの人は殺人を犯した犯罪者らしい")
36
- ```
37
- ```
38
- [{'label': '名誉を低下させる発言', 'score': 0.8889994621276855}]
39
- ```
40
- ## Training Scripts
41
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kubotaissei/defamation_japanese_twitter/blob/master/notebooks/train_example.ipynb)
42
 
 
 
 
 
 
 
43
 
44
- ## Licenses
45
 
46
- The finetuned model with all attached files is licensed under [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/), or Creative Commons Attribution-ShareAlike 4.0 International License.
47
 
48
- <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a>
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ tags:
4
+ - generated_from_trainer
5
+ metrics:
6
+ - accuracy
7
+ - f1
8
+ model-index:
9
+ - name: luke-large-defamation-detection-japanese
10
+ results: []
 
11
  ---
12
 
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
  # luke-large-defamation-detection-japanese
 
17
 
18
+ This model is a fine-tuned version of [studio-ousia/luke-japanese-large](https://huggingface.co/studio-ousia/luke-japanese-large) on the None dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.4430
21
+ - Accuracy: 0.6616
22
+ - F1: 0.6381
23
+ - Auc: 0.8630
24
+
25
+ ## Model description
26
+
27
+ More information needed
28
+
29
+ ## Intended uses & limitations
30
+
31
+ More information needed
32
+
33
+ ## Training and evaluation data
34
+
35
+ More information needed
36
+
37
+ ## Training procedure
38
 
39
+ ### Training hyperparameters
 
 
40
 
41
+ The following hyperparameters were used during training:
42
+ - learning_rate: 1e-05
43
+ - train_batch_size: 4
44
+ - eval_batch_size: 4
45
+ - seed: 777
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: cosine
48
+ - num_epochs: 4
49
+ - mixed_precision_training: Native AMP
50
 
51
+ ### Training results
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Auc |
54
+ |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:------:|
55
+ | 0.4219 | 1.0 | 1780 | 0.3979 | 0.6630 | 0.6084 | 0.8466 |
56
+ | 0.3375 | 2.0 | 3560 | 0.4050 | 0.6706 | 0.6242 | 0.8618 |
57
+ | 0.2716 | 3.0 | 5340 | 0.4362 | 0.6595 | 0.6370 | 0.8626 |
58
+ | 0.2331 | 4.0 | 7120 | 0.4430 | 0.6616 | 0.6381 | 0.8630 |
59
 
 
60
 
61
+ ### Framework versions
62
 
63
+ - Transformers 4.26.0
64
+ - Pytorch 1.13.1+cu116
65
+ - Datasets 2.8.0
66
+ - Tokenizers 0.13.2