Bailii-Roberta / README.md
tsantosh7's picture
Update README.md
17b5954
---
license: apache-2.0
tags:
- fill-mask
language:
- en
widget:
- text: "He carefully assessed the financial position of the <mask> disclosed within its accounts, including its pension scheme liabilities."
- text: "Moreover, she had chosen not to give <mask> and therefore had not provided any innocent explanation of her communications."
---
# Pre-trained Language Model for England and Wales Court of Appeal (Criminal Division) Decisions
## Introduction
The research for understanding the bias in criminal court decisions need the support of natural language processing tools.
The pre-trained language model has greatly improved the accuracy of text mining in general texts. At present, there is an urgent need for a pre-trained language model specifically for the automatic processing of court decision texts.
We used the text from the [Bailii website](https://www.bailii.org/ew/cases/EWCA/Crim/) as the training set. Based on the deep language model framework of RoBERTa, we constructed bailii-roberta pre-training language model by [transformers/run_mlm.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py) and [transformers/mlm_wwm](https://github.com/huggingface/transformers/tree/main/examples/research_projects/mlm_wwm).
## How to use
### Huggingface Transformers
The `from_pretrained` method based on [Huggingface Transformers](https://github.com/huggingface/transformers) can directly obtain bailii-roberta model online.
```python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("tsantosh7/bailii-roberta")
model = AutoModel.from_pretrained("tsantosh7/bailii-roberta")
```
### Download Models
- The version of the model we provide is `PyTorch`.
### From Huggingface
- Download directly through Huggingface's official website.
- [tsantosh7/bailii-roberta](https://huggingface.co/tsantosh7/Bailii-Roberta/)
## Disclaimer
- The experimental results presented in the report only show the performance under a specific data set and hyperparameter combination, and cannot represent the essence of each model. The experimental results may change due to the random number of seeds and computing equipment.
- **Users can use the model arbitrarily within the scope of the license, but we are not responsible for the direct or indirect losses caused by using the content of the project.**
## Acknowledgment
- bailii-roberta was trained based on [roberta-base](https://arxiv.org/abs/1907.11692)).