File size: 2,597 Bytes
e88c193
ca13cba
 
 
e88c193
71adb80
4c1e43e
 
455b203
13b82f3
5127871
d812072
 
6992736
5127871
 
2b6a9d8
 
77da23e
2b6a9d8
 
 
 
5127871
6d195f6
5127871
 
 
 
 
768a9dc
5127871
 
 
 
1f5f0c4
5127871
 
1f5f0c4
5127871
 
 
 
 
1f5f0c4
264e1ba
1f5f0c4
455b203
 
 
 
40295ee
455b203
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
language:
- ru
- ru-RU
tags:
- mbart
inference:
  parameters:
    no_repeat_ngram_size: 4,
    num_beams : 5
datasets:
- IlyaGusev/gazeta
- samsum
- samsum (translated to RU)
widget:
- text: | 
    Джефф: Могу ли я обучить модель 🤗 Transformers на Amazon SageMaker? 
    Филипп: Конечно, вы можете использовать новый контейнер для глубокого обучения HuggingFace. 
    Джефф: Хорошо.
    Джефф: и как я могу начать? 
    Джефф: где я могу найти документацию? 
    Филипп: ок, ок, здесь можно найти все: https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face

model-index:
- name: "mbart_ruDialogSum"
  results:
  - task: 
      name: Abstractive Dialogue Summarization
      type: abstractive-text-summarization 
    dataset:
      name: "SAMSum Corpus (translated to Russian)" 
      type: samsum
    metrics:
       - name: Validation ROGUE-1
         type: rogue-1
         value: 34.5
       - name: Validation ROGUE-L
         type: rogue-l
         value: 33
       - name: Test ROGUE-1
         type: rogue-1
         value: 31
       - name: Test ROGUE-L
         type: rogue-l
         value: 28
---
### 📝 Description

MBart for Russian summarization fine-tuned for **dialogues** summarization.


This model was firstly fine-tuned by [Ilya Gusev](https://hf.co/IlyaGusev) on [Gazeta dataset](https://huggingface.co/datasets/IlyaGusev/gazeta). We have **fine tuned** that model on [SamSum dataset]() **translated to Russian** using GoogleTranslateAPI

🤗 Moreover! We have implemented a **! telegram bot [@summarization_bot](https://t.me/summarization_bot) !** with the inference of this model. Add it to the chat and get summaries instead of dozens spam messages!  🤗


### ❓ How to use with code
```python
from transformers import MBartTokenizer, MBartForConditionalGeneration

# Download model and tokenizer
model_name = "Kirili4ik/mbart_ruDialogSum"   
tokenizer =  AutoTokenizer.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)
model.eval()

article_text = "..."

input_ids = tokenizer(
    [article_text],
    max_length=600,
    padding="max_length",
    truncation=True,
    return_tensors="pt",
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    top_k=0,
    num_beams=3,
    no_repeat_ngram_size=3
)[0]


summary = tokenizer.decode(output_ids, skip_special_tokens=True)
print(summary)
```