File size: 1,200 Bytes
dfd612b
 
64702d3
 
 
 
 
 
 
dfd612b
64702d3
 
43af915
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64702d3
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: apache-2.0
datasets:
- humarin/chatgpt-paraphrases
language:
- en
tags:
- paraphrase
- similar text
---
This model re-fine-tunes the [ChatGPT Paraphraser on T5 Base](https://huggingface.co/humarin/chatgpt_paraphraser_on_T5_base) with additional Google PAWS dataset.

## Usage example
```python
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM

#'cuda' for gpu otherwise use 'cpu'
device = "cuda"
model     = AutoModelForSeq2SeqLM.from_pretrained("sharad/ParaphraseGPT").to(device)
tokenizer = AutoTokenizer.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base")
predict   = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

def paraphrase(sentence):
  generated = predict(
              sentence,
              num_beams=3,
              num_beam_groups=3,
              num_return_sequences=1,
              diversity_penalty=2.0,
              no_repeat_ngram_size=2,
              repetition_penalty=0.99,
              max_length=len(sentence)
          )
  return generated

output = paraphrase('My sentence to paraphrase...')
print(output[0]['generated_text'])
```

## Train parameters
```python
epochs = 4
max_length = 128
lr = 5e-5
```