How does this model deal with OOV (out-of-vocabulary) words?
#6
by
HeitorC
- opened
I've been reading and couldn't find data on this specific topic. Either this model or the bge-reranker can detect subword tokens? How do they deal with completely unseen or made-up words?
Hi, bge embedding and bge reranker both encode the text into a tokens sequence. They will split the unseen words into servel tokens.