Pretrained on PMC fulltext paragraphs on masked language modeling task, it's mostly biology/ medical papers