how to get vectorized output

#1
by irfanzafar - opened

Hi, I just want to vectorize my arabic data using this transformer. Can someone help me in this regard.
Thanks

I know it's too late but I hope this benefits you or someone else 🤗(you can run it on Google Colab).

from transformers import AutoTokenizer, AutoModel
import torch


tokenizer = AutoTokenizer.from_pretrained("asafaya/bert-base-arabic")
model = AutoModel.from_pretrained("asafaya/bert-base-arabic")


text = "اسمي ماجد و انا مبرمج"


inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

embeddings = outputs.last_hidden_state
sentence_embedding = torch.mean(embeddings, dim=1)
print(sentence_embedding)

Sign up or log in to comment