File size: 1,200 Bytes
dfd612b 64702d3 dfd612b 64702d3 43af915 64702d3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
license: apache-2.0
datasets:
- humarin/chatgpt-paraphrases
language:
- en
tags:
- paraphrase
- similar text
---
This model re-fine-tunes the [ChatGPT Paraphraser on T5 Base](https://huggingface.co/humarin/chatgpt_paraphraser_on_T5_base) with additional Google PAWS dataset.
## Usage example
```python
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
#'cuda' for gpu otherwise use 'cpu'
device = "cuda"
model = AutoModelForSeq2SeqLM.from_pretrained("sharad/ParaphraseGPT").to(device)
tokenizer = AutoTokenizer.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base")
predict = pipeline("text2text-generation", model=model, tokenizer=tokenizer)
def paraphrase(sentence):
generated = predict(
sentence,
num_beams=3,
num_beam_groups=3,
num_return_sequences=1,
diversity_penalty=2.0,
no_repeat_ngram_size=2,
repetition_penalty=0.99,
max_length=len(sentence)
)
return generated
output = paraphrase('My sentence to paraphrase...')
print(output[0]['generated_text'])
```
## Train parameters
```python
epochs = 4
max_length = 128
lr = 5e-5
``` |