Perceiver IO masked language model
This model is a Perceiver IO model pretrained on the masked language modeling (MLM) task using a text corpus created
from C4 and English Wikipedia. It
is weight-equivalent to the deepmind/language-perceiver model
but based on implementation classes of the perceiver-io library. It can
be created from the deepmind/language-perceiver
model with a library-specific conversion utility.
Both models generate equal output for the same input.
Content of the deepmind/language-perceiver
model card
also applies to this model except usage examples. Refer to the linked card for further model and
training details.
Model description
The model is specified in Section 4 (Table 1) and Appendix F (Table 11) of the Perceiver IO paper (UTF-8 bytes tokenization, vocabulary size of 262, 201M parameters).
Intended use
Although the raw model can be used directly for masked language modeling, the main use case is fine-tuning. This can be fine-tuning with masked language modeling and whole word masking on an unlabeled dataset (example) or fine-tuning on a labeled dataset using the pretrained encoder of this model (example) for weight initialization.
Usage examples
To use this model you first need to install
the perceiver-io
library with extension text
.
pip install perceiver-io[text]
Then the model can be used with PyTorch. Either use the model and tokenizer directly
from transformers import AutoModelForMaskedLM, AutoTokenizer
from perceiver.model.text import mlm # auto-class registration
repo_id = "krasserm/perceiver-io-mlm"
model = AutoModelForMaskedLM.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
masked_text = "This is an incomplete sentence where some words are" \
"[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"
encoding = tokenizer(masked_text, return_tensors="pt")
outputs = model(**encoding)
# get predictions for 9 [MASK] tokens (exclude [SEP] token at the end)
masked_token_predictions = outputs.logits[0, -10:-1].argmax(dim=-1)
print(tokenizer.decode(masked_token_predictions))
missing.
or use a fill-mask
pipeline:
from transformers import pipeline
from perceiver.model.text import mlm # auto-class registration
repo_id = "krasserm/perceiver-io-mlm"
masked_text = "This is an incomplete sentence where some words are" \
"[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"
filler_pipeline = pipeline("fill-mask", model=repo_id)
masked_token_predictions = filler_pipeline(masked_text)
print("".join([pred[0]["token_str"] for pred in masked_token_predictions]))
missing.
Model conversion
The krasserm/perceiver-io-mlm
model has been created from the source deepmind/language-perceiver
model with:
from perceiver.model.text.mlm import convert_model
convert_model(
save_dir="krasserm/perceiver-io-mlm",
source_repo_id="deepmind/language-perceiver",
push_to_hub=True,
)
Citation
@article{jaegle2021perceiver,
title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
journal={arXiv preprint arXiv:2107.14795},
year={2021}
}
- Downloads last month
- 5