|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- artemsnegirev/ru-word-games |
|
language: |
|
- ru |
|
metrics: |
|
- exact_match |
|
pipeline_tag: text2text-generation |
|
--- |
|
|
|
Model was trained on companion [dataset](artemsnegirev/ru-word-games). Minibob guess word from a description modeling well known Alias word game. |
|
|
|
```python |
|
from transformers import T5ForConditionalGeneration, T5Tokenizer |
|
|
|
prefix = "guess word:" |
|
|
|
def predict_word(prompt, model, tokenizer): |
|
prompt = prompt.replace("...", "<extra_id_0>") |
|
prompt = f"{prefix} {prompt}" |
|
|
|
input_ids = tokenizer([prompt], return_tensors="pt").input_ids |
|
|
|
outputs = model.generate( |
|
input_ids.to(model.device), |
|
num_beams=5, |
|
max_new_tokens=8, |
|
do_sample=False, |
|
num_return_sequences=5 |
|
) |
|
|
|
candidates = set() |
|
|
|
for tokens in outputs: |
|
candidate = tokenizer.decode(tokens, skip_special_tokens=True) |
|
candidate = candidate.strip().lower() |
|
|
|
candidates.add(candidate) |
|
|
|
return candidates |
|
|
|
model_name = "artemsnegirev/minibob" |
|
|
|
tokenizer = T5Tokenizer.from_pretrained(model_name) |
|
model = T5ForConditionalGeneration.from_pretrained(model_name) |
|
|
|
prompt = "это животное с копытами на нем ездят" |
|
|
|
print(predict_word(prompt, model, tokenizer)) |
|
# {'верблюд', 'конь', 'коня', 'лошадь', 'пони'} |
|
``` |
|
|
|
Detailed github-based [tutorial](https://github.com/artemsnegirev/minibob) with pipeline and source code for building Minibob |