How does this model deal with OOV (out-of-vocabulary) words?

by HeitorC - opened Nov 29, 2023

Nov 29, 2023

I've been reading and couldn't find data on this specific topic. Either this model or the bge-reranker can detect subword tokens? How do they deal with completely unseen or made-up words?

Shitao

Beijing Academy of Artificial Intelligence org Nov 29, 2023

Hi, bge embedding and bge reranker both encode the text into a tokens sequence. They will split the unseen words into servel tokens.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment