Questions about model & architecture
Hello!
I just stumbled upon this when looking at recent Sentence Transformer models, and I think it's quite interesting to see the custom architecture (although I haven't yet figured out what's new about it compared to e.g. RoBERTa). Would you like to share some information about it?
I also wanted to let you know that Sentence Transformers recently had a big v3.0 update, which refactored the training. Old training scripts should mostly still work, but training can now also be done with a SentenceTransformerTrainer
that resembles the transformers
Trainer
, in case you're familiar with that one. Notably, it's now much easier to track the performance of your model during training, via Weights and Biases/Tensorboard integrations and better callbacks. I think it might be quite useful for you. The updated training documentation can be found here: https://sbert.net/docs/sentence_transformer/training_overview.html
The produced model cards are also much more meaningful, see e.g. other recent Sentence Transformer models like cristuf/bge-base-financial-matryoshka.
Also, once your model is ready for people to use, then feel free to reach out and I can share the word on the socials.
cc @dangvantuan
- Tom Aarsen
Hi
@tomaarsen
I am using the XLMRoberta architecture but training it only for French and English, so I have customized it into a Bilingual model for these languages. The model is still in the experimental step. I am currently training NLI and will share it with you soon. I am also experimenting with Sentence Transformer v3.0.
Tuan
so I have customized it into a Bilingual model for these languages.
Out of curiosity, have you trained a custom tokenizer on English/French data? The XLM-R default tokenizer has a lot of tokens that you won't end up using that'll 1) slow down inference and 2) potentially reduce your performance.
I'm glad that you've discovered Sentence Transformers v3.0, I like to think that it can help make your life a bit easier.
I'll happily follow your progress along.
- Tom Aarsen
Hi
@tomaarsen
I checked the MTEB leaderboard but only saw Ranking Average (2 datasets) and Summarization Average (1 dataset) displayed.
The metrics of other tasks are not displayed. Could I ask you where the cause comes from?
Thank you.
Tuan
Heya!
I'm OOO now so it's a bit hard to tell, but it might be possible to figure it out by going to the individual tabs and seeing where 1) this model is missing and/or 2) what tasks exist that you don't seem to have scores for. That might be a good start.
- Tom Aarsen
Hi
@tomaarsen
Could you refresh leaderboard mteb https://huggingface.co/spaces/mteb/leaderboard, please?
Thank you so much!
Tuan