Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

kwang2049/TSDAE-scidocs2nli_stsb

This is a model from the paper "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning". This model was only trained with the TSDAE objective on scidocs in an unsupervised manner. Training procedure of this model:

  1. Initialized with bert-base-uncased;
  2. Unsupervised training on scidocs with the TSDAE objective;

The pooling method is CLS-pooling.

Usage

To use this model, an convenient way is through SentenceTransformers. So please install it via:

pip install sentence-transformers

And then load the model and use it to encode sentences:

from sentence_transformers import SentenceTransformer, models
dataset = 'scidocs'
model_name_or_path = f'kwang2049/TSDAE-{dataset}'
model = SentenceTransformer(model_name_or_path)
model[1] = models.Pooling(model[0].get_word_embedding_dimension(), pooling_mode='cls')  # Note this model uses CLS-pooling
sentence_embeddings = model.encode(['This is the first sentence.', 'This is the second one.'])

Evaluation

To evaluate the model against the datasets used in the paper, please install our evaluation toolkit USEB:

pip install useb  # Or git clone and pip install .
python -m useb.downloading all  # Download both training and evaluation data

And then do the evaluation:

from sentence_transformers import SentenceTransformer, models
import torch
from useb import run_on
dataset = 'scidocs'
model_name_or_path = f'kwang2049/TSDAE-{dataset}'
model = SentenceTransformer(model_name_or_path)
model[1] = models.Pooling(model[0].get_word_embedding_dimension(), pooling_mode='cls')  # Note this model uses CLS-pooling
@torch.no_grad()
def semb_fn(sentences) -> torch.Tensor:
   return torch.Tensor(model.encode(sentences, show_progress_bar=False))
result = run_on(
   dataset,
   semb_fn=semb_fn,
   eval_type='test',
   data_eval_path='data-eval'
)

Training

Please refer to the page of TSDAE training in SentenceTransformers.

Cite & Authors

If you use the code for evaluation, feel free to cite our publication TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning:

@article{wang-2021-TSDAE,
    title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
    author = "Wang, Kexin and Reimers, Nils and  Gurevych, Iryna", 
    journal= "arXiv preprint arXiv:2104.06979",
    month = "4",
    year = "2021",
    url = "https://arxiv.org/abs/2104.06979",
}
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.