about self.scaled_masked_softmax in modeling_chatglm.py
#91
by
shl97
- opened
what's the meaning and definition of self.scaled_masked_softmax in modeling_chatglm.py
in 300-302 lines
if self.scale_mask_softmax:
self.scale_mask_softmax.scale = query_key_layer_scaling_coeff
attention_probs = self.scale_mask_softmax(attention_scores, attention_mask.contiguous())
self.scaled_masked_softmax has attribute scale and can be used as a function, but it only appears in line 378 as self.scale_mask_softmax = None