Questions about model & architecture

#1
by tomaarsen HF staff - opened

Hello!

I just stumbled upon this when looking at recent Sentence Transformer models, and I think it's quite interesting to see the custom architecture (although I haven't yet figured out what's new about it compared to e.g. RoBERTa). Would you like to share some information about it?

I also wanted to let you know that Sentence Transformers recently had a big v3.0 update, which refactored the training. Old training scripts should mostly still work, but training can now also be done with a SentenceTransformerTrainer that resembles the transformers Trainer, in case you're familiar with that one. Notably, it's now much easier to track the performance of your model during training, via Weights and Biases/Tensorboard integrations and better callbacks. I think it might be quite useful for you. The updated training documentation can be found here: https://sbert.net/docs/sentence_transformer/training_overview.html

The produced model cards are also much more meaningful, see e.g. other recent Sentence Transformer models like cristuf/bge-base-financial-matryoshka.

Also, once your model is ready for people to use, then feel free to reach out and I can share the word on the socials.

cc @dangvantuan

  • Tom Aarsen
La Javaness org

Hi @tomaarsen
I am using the XLMRoberta architecture but training it only for French and English, so I have customized it into a Bilingual model for these languages. The model is still in the experimental step. I am currently training NLI and will share it with you soon. I am also experimenting with Sentence Transformer v3.0.
Tuan

so I have customized it into a Bilingual model for these languages.

Out of curiosity, have you trained a custom tokenizer on English/French data? The XLM-R default tokenizer has a lot of tokens that you won't end up using that'll 1) slow down inference and 2) potentially reduce your performance.

I'm glad that you've discovered Sentence Transformers v3.0, I like to think that it can help make your life a bit easier.
I'll happily follow your progress along.

  • Tom Aarsen
La Javaness org

Hi @tomaarsen
I checked the MTEB leaderboard but only saw Ranking Average (2 datasets) and Summarization Average (1 dataset) displayed.
image.png
The metrics of other tasks are not displayed. Could I ask you where the cause comes from?
Thank you.
Tuan

Heya!
I'm OOO now so it's a bit hard to tell, but it might be possible to figure it out by going to the individual tabs and seeing where 1) this model is missing and/or 2) what tasks exist that you don't seem to have scores for. That might be a good start.

  • Tom Aarsen
La Javaness org

Hi @tomaarsen
Could you refresh leaderboard mteb https://huggingface.co/spaces/mteb/leaderboard, please?
Thank you so much!
Tuan

Sign up or log in to comment