how to get vectorized output
#1
by
irfanzafar
- opened
Hi, I just want to vectorize my arabic data using this transformer. Can someone help me in this regard.
Thanks
I know it's too late but I hope this benefits you or someone else 🤗(you can run it on Google Colab).
from transformers import AutoTokenizer, AutoModel
import torch
tokenizer = AutoTokenizer.from_pretrained("asafaya/bert-base-arabic")
model = AutoModel.from_pretrained("asafaya/bert-base-arabic")
text = "اسمي ماجد و انا مبرمج"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state
sentence_embedding = torch.mean(embeddings, dim=1)
print(sentence_embedding)