Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# LS-LLaMA: Label Supervised LLaMA Finetuning
|
2 |
+
|
3 |
+
<h2>📢: For convenience, we build a bi-directional LLMs toolkit <a href='https://github.com/WhereIsAI/BiLLM'>BiLLM</a> for language understanding. Welcome to use it.</h2>
|
4 |
+
|
5 |
+
<p align="center">
|
6 |
+
|
7 |
+
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/label-supervised-llama-finetuning/named-entity-recognition-on-conll03-4)](https://paperswithcode.com/sota/named-entity-recognition-on-conll03-4?p=label-supervised-llama-finetuning)
|
8 |
+
|
9 |
+
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/label-supervised-llama-finetuning/named-entity-recognition-on-ontonotes-5-0-1)](https://paperswithcode.com/sota/named-entity-recognition-on-ontonotes-5-0-1?p=label-supervised-llama-finetuning)
|
10 |
+
</p>
|
11 |
+
|
12 |
+
|
13 |
+
<p align='center'>
|
14 |
+
<img src='./docs/lsllama.png'/>
|
15 |
+
</p>
|
16 |
+
|
17 |
+
## Usage
|
18 |
+
|
19 |
+
Our implementation currently supports the following sequence classification benchmarks:
|
20 |
+
1. SST2 (2 classes) / SST5 (5 classes)
|
21 |
+
2. AGNews (4 classes)
|
22 |
+
3. Twitter Financial News Sentiment (twitterfin, 3 classes)
|
23 |
+
|
24 |
+
and token classification benchmarks for named entity recognition (NER): CoNLL2003 and OntonotesV5.
|
25 |
+
|
26 |
+
Commands for training LS-LLaMA and LS-unLLaMA on different tasks can follow the templates below:
|
27 |
+
```console
|
28 |
+
foo@bar:~$ CUDA_VISIBLE_DEVICES=0 python file_name.py dataset_name model_size
|
29 |
+
```
|
30 |
+
|
31 |
+
`file_name.py` can be one of `unllama_seq_clf.py`, `unllama_token_clf.py`, `llama_seq_clf.py`, and `llama_token_clf.py`, for training LS-LLaMA and LS-unLLaMA on sequence- and token-level classification.
|
32 |
+
|
33 |
+
`dataset_name` can be one of `sst2`, `sst5`, `agnews`, `twitterfin`, `conll03`, and `ontonotesv5`.
|
34 |
+
|
35 |
+
`model_size` can be `7b` or `13b`, corresponding to LLaMA-2-7B and LLaMA-2-13B.
|
36 |
+
|
37 |
+
For example, the following command will train LS-unLLaMA based on LLaMA-2-7B on AGNews for sequence classification:
|
38 |
+
```console
|
39 |
+
foo@bar:~$ CUDA_VISIBLE_DEVICES=0 python unllama_seq_clf.py agnews 7b
|
40 |
+
```
|
41 |
+
|
42 |
+
## Implementations
|
43 |
+
|
44 |
+
Load Pretrained Models
|
45 |
+
|
46 |
+
```python
|
47 |
+
from transformers import AutoTokenizer
|
48 |
+
from modeling_llama import (
|
49 |
+
LlamaForSequenceClassification, LlamaForTokenClassification,
|
50 |
+
UnmaskingLlamaForSequenceClassification, UnmaskingLlamaForTokenClassification,
|
51 |
+
)
|
52 |
+
|
53 |
+
|
54 |
+
model_id = 'meta-llama/Llama-2-7b'
|
55 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
56 |
+
model = LlamaForSequenceClassification.from_pretrained(model_id).bfloat16()
|
57 |
+
model = LlamaForTokenClassification.from_pretrained(model_id).bfloat16()
|
58 |
+
model = UnmaskingLlamaForSequenceClassification.from_pretrained(model_id).bfloat16()
|
59 |
+
model = UnmaskingLlamaForTokenClassification.from_pretrained(model_id).bfloat16()
|
60 |
+
```
|
61 |
+
|
62 |
+
For more usage, please refer to `unllama_seq_clf.py`, `unllama_token_clf.py`, `llama_seq_clf.py`, `llama_token_clf.py`.
|
63 |
+
|
64 |
+
# Citation
|
65 |
+
|
66 |
+
```
|
67 |
+
@article{li2023label,
|
68 |
+
title={Label supervised llama finetuning},
|
69 |
+
author={Li, Zongxi and Li, Xianming and Liu, Yuzhang and Xie, Haoran and Li, Jing and Wang, Fu-lee and Li, Qing and Zhong, Xiaoqin},
|
70 |
+
journal={arXiv preprint arXiv:2310.01208},
|
71 |
+
year={2023}
|
72 |
+
}
|
73 |
+
```
|