File size: 2,597 Bytes
e88c193 ca13cba e88c193 71adb80 4c1e43e 455b203 13b82f3 5127871 d812072 6992736 5127871 2b6a9d8 77da23e 2b6a9d8 5127871 6d195f6 5127871 768a9dc 5127871 1f5f0c4 5127871 1f5f0c4 5127871 1f5f0c4 264e1ba 1f5f0c4 455b203 40295ee 455b203 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
---
language:
- ru
- ru-RU
tags:
- mbart
inference:
parameters:
no_repeat_ngram_size: 4,
num_beams : 5
datasets:
- IlyaGusev/gazeta
- samsum
- samsum (translated to RU)
widget:
- text: |
Джефф: Могу ли я обучить модель 🤗 Transformers на Amazon SageMaker?
Филипп: Конечно, вы можете использовать новый контейнер для глубокого обучения HuggingFace.
Джефф: Хорошо.
Джефф: и как я могу начать?
Джефф: где я могу найти документацию?
Филипп: ок, ок, здесь можно найти все: https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face
model-index:
- name: "mbart_ruDialogSum"
results:
- task:
name: Abstractive Dialogue Summarization
type: abstractive-text-summarization
dataset:
name: "SAMSum Corpus (translated to Russian)"
type: samsum
metrics:
- name: Validation ROGUE-1
type: rogue-1
value: 34.5
- name: Validation ROGUE-L
type: rogue-l
value: 33
- name: Test ROGUE-1
type: rogue-1
value: 31
- name: Test ROGUE-L
type: rogue-l
value: 28
---
### 📝 Description
MBart for Russian summarization fine-tuned for **dialogues** summarization.
This model was firstly fine-tuned by [Ilya Gusev](https://hf.co/IlyaGusev) on [Gazeta dataset](https://huggingface.co/datasets/IlyaGusev/gazeta). We have **fine tuned** that model on [SamSum dataset]() **translated to Russian** using GoogleTranslateAPI
🤗 Moreover! We have implemented a **! telegram bot [@summarization_bot](https://t.me/summarization_bot) !** with the inference of this model. Add it to the chat and get summaries instead of dozens spam messages! 🤗
### ❓ How to use with code
```python
from transformers import MBartTokenizer, MBartForConditionalGeneration
# Download model and tokenizer
model_name = "Kirili4ik/mbart_ruDialogSum"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)
model.eval()
article_text = "..."
input_ids = tokenizer(
[article_text],
max_length=600,
padding="max_length",
truncation=True,
return_tensors="pt",
)["input_ids"]
output_ids = model.generate(
input_ids=input_ids,
top_k=0,
num_beams=3,
no_repeat_ngram_size=3
)[0]
summary = tokenizer.decode(output_ids, skip_special_tokens=True)
print(summary)
```
|