PEFT
English

GradCache implementation?

#4
by serialcoder - opened

Hi @MrLight ,

Your implementation in Tevatron library only uses gradient accumulation and GradCache isn't supported. Is gradient accumulation good enough to enable large batch size instead of GradCache? Thanks.

Castorini org

Hi,

GradCache is not used in the original implementation, as current gradcache do not support deepspeed yet.
Gradient accumulation would be good enough here,

Xueguang

Thank you @MrLight . Btw, in the forward function of RankLlama model, target is set to be zero:

https://github.com/texttron/tevatron/blob/2e5d00ee21d5a7db0bd2ea1463c9150a572106d4/examples/rankllama/modeling.py#L36

Shouldn't this method accept labels parameters which will then be used to calculate the loss function? As far as I can see, there isn't a way to signal "positive" vs "negative" pairs to the model. Am I missing something?

Castorini org

Hi @serialcoder ,

ranker_logits.view(self.train_batch_size, -1)

reranker logits is reshaped so that the first score in each group belongs to the positive pairs.
so the target is set to the 0 index for each group.

Sign up or log in to comment