Joelito kapllan commited on
Commit
27fdd55
1 Parent(s): 71effaa

Added some simple evaluation results (#4)

Browse files

- Added some simple evaluation results (fb06c626b61beec453b3d19216e578a187ef8da1)


Co-authored-by: kapllan <[email protected]>

Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -93,9 +93,30 @@ For further details see [Niklaus et al. 2023](https://arxiv.org/abs/2306.02069?u
93
 
94
  ## Evaluation
95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  For further insights into the evaluation, we refer to the [trainer state](https://huggingface.co/joelito/legal-xlm-roberta-large/blob/main/last-checkpoint/trainer_state.json). Additional information is available in the [tensorboard](https://huggingface.co/joelito/legal-xlm-roberta-large/tensorboard).
97
 
98
- For performance on downstream tasks, such as [LEXTREME](https://huggingface.co/datasets/joelito/lextreme) ([Niklaus et al. 2023](https://arxiv.org/abs/2301.13126)) or [LEXGLUE](https://huggingface.co/datasets/lex_glue) ([Chalkidis et al. 2021](https://arxiv.org/abs/2110.00976)), we refer to the results presented in Niklaus et al. (2023) [1](https://arxiv.org/abs/2306.02069), [2](https://arxiv.org/abs/2306.09237).
99
 
100
  ### Model Architecture and Objective
101
 
 
93
 
94
  ## Evaluation
95
 
96
+ We compare joelito/legal-swiss-roberta-large with the other multilingual models.
97
+ The results are based on the text classification tasks presented in [Niklaus et al. (2023)](https://arxiv.org/abs/2306.09237) which are part of [LEXTREME](https://huggingface.co/datasets/joelito/lextreme).
98
+ We provide the arithmetic mean over three seeds (1, 2, 3) based on the macro-F1-score on the test set.
99
+ The highest values are in bold.
100
+
101
+
102
+ | _name_or_path | SCP-BC | SCP-BF | SCP-CC | SCP-CF | SJPXL-C | SJPXL-F | SLAP-SC | SLAP-SF |
103
+ |:--------------------------------------------------------------------------------------------------------|:----------|:----------|:----------|:----------|:----------|:----------|:----------|:----------|
104
+ | [ZurichNLP/swissbert-xlm-vocab](https://huggingface.co/ZurichNLP/swissbert-xlm-vocab) | 71.36 | 57.48 | 27.33 | 23.37 | 80.81 | 61.75 | 77.89 | 71.27 |
105
+ | [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased) | 66.56 | 56.58 | 22.67 | 21.31 | 77.26 | 60.79 | 73.54 | 72.24 |
106
+ | [facebook/xmod-base](https://huggingface.co/facebook/xmod-base) | 70.35 | 58.16 | 23.87 | 19.57 | 80.55 | 60.84 | 73.16 | 69.03 |
107
+ | [joelito/legal-swiss-longformer-base](https://huggingface.co/joelito/legal-swiss-longformer-base) | **73.25** | **60.06** | **28.68** | 24.39 | 87.46 | **65.23** | 83.84 | 77.96 |
108
+ | [joelito/legal-swiss-roberta-base](https://huggingface.co/joelito/legal-swiss-roberta-base) | 72.41 | 59.31 | 25.99 | 23.27 | 87.48 | 64.16 | **86.8** | **81.56** |
109
+ | [joelito/legal-swiss-roberta-large](https://huggingface.co/joelito/legal-swiss-roberta-large) | 70.95 | 57.59 | 27.86 | 23.48 | **88.33** | 62.92 | 82.1 | 78.62 |
110
+ | [microsoft/Multilingual-MiniLM-L12-H384](https://huggingface.co/microsoft/Multilingual-MiniLM-L12-H384) | 67.29 | 56.56 | 24.23 | 14.9 | 79.52 | 58.29 | 63.03 | 67.57 |
111
+ | [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) | 72.01 | 57.59 | 22.93 | **25.18** | 79.41 | 60.89 | 67.64 | 74.13 |
112
+ | [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) | 68.55 | 58.48 | 25.66 | 21.52 | 80.98 | 61.45 | 79.3 | 74.47 |
113
+ | [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) | 69.5 | 58.15 | 27.9 | 22.05 | 82.19 | 61.24 | 81.09 | 71.82 |
114
+
115
+
116
+ For more detailed insights into the performance on downstream tasks, such as [LEXTREME](https://huggingface.co/datasets/joelito/lextreme) ([Niklaus et al. 2023](https://arxiv.org/abs/2301.13126)) or [LEXGLUE](https://huggingface.co/datasets/lex_glue) ([Chalkidis et al. 2021](https://arxiv.org/abs/2110.00976)), we refer to the results presented in Niklaus et al. (2023) [1](https://arxiv.org/abs/2306.02069), [2](https://arxiv.org/abs/2306.09237).
117
+
118
  For further insights into the evaluation, we refer to the [trainer state](https://huggingface.co/joelito/legal-xlm-roberta-large/blob/main/last-checkpoint/trainer_state.json). Additional information is available in the [tensorboard](https://huggingface.co/joelito/legal-xlm-roberta-large/tensorboard).
119
 
 
120
 
121
  ### Model Architecture and Objective
122