naver
/

trecdl22-crossencoder-rankT53b-repro

Model card Files Files and versions Community

cadurosar commited on Jul 12, 2023

Commit

dc74ffc

•

1 Parent(s): 5f8fbb5

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ Our best attempt at reproducing [RankT5 Enc-Softmax](https://arxiv.org/pdf/2210.
 1. We use a SPLADE first stage for the negatives vs GTR on the paper
 2. We train using Pytorch vs Flaxx on the paper
-3. 	~~We use the original t5-3b vs Flan T5-3b on the paper	~~
 4. The head is not exactly the same, here we add Linear->LayerNorm->Linear and actually make a mistake by not including a nonlinearity. The original paper uses just a dense layer. Fixing this should improve our performance because we have more layers without actually using them correctly
 This leads to what seems to be a slightly worse performance (42.8 vs 43.? on the paper) and seems slightly worse on BEIR as well.

 1. We use a SPLADE first stage for the negatives vs GTR on the paper
 2. We train using Pytorch vs Flaxx on the paper
+3. ~~We use the original t5-3b vs Flan T5-3b on the paper~~ -> Actually the paper also uses t5-3b
 4. The head is not exactly the same, here we add Linear->LayerNorm->Linear and actually make a mistake by not including a nonlinearity. The original paper uses just a dense layer. Fixing this should improve our performance because we have more layers without actually using them correctly
 This leads to what seems to be a slightly worse performance (42.8 vs 43.? on the paper) and seems slightly worse on BEIR as well.