Models Weights not Initialized
Hi,
I am trying to run cdsBERT using the provided code. When loading the model either using CPU or GPU, I am getting some warnings about the weights not loaded from checkpoint. Is this normal behavior? I also get an AttributeError (see below).
2023-09-25 16:40:21.633712: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-25 16:40:40.924721: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Some weights of BertForMaskedLM were not initialized from the model checkpoint at lhallee/cdsBERT and are newly initialized: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "../scripts/test.py", line 52, in
matrix_embedding = model(**example).last_hidden_state.cpu()
AttributeError: 'MaskedLMOutput' object has no attribute 'last_hidden_state'
Hello! This is the feature extraction checkpoint. So use BertModel instead of BertForMaskedLM. I updated the documentation and uploaded the MLM checkpoint. Please see our preprint and/or the model cards for the difference between the checkpoints. Let me know if there are any other issues!
Hi,
Thank you for the suggestion. I updated BertForMaskedLM to BertModel. I can get the features now. I do get this following warning:
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-29 11:25:37.062213: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Some weights of BertModel were not initialized from the model checkpoint at lhallee/cdsBERT and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
I would be grateful if you could let me know this is normal.
Yes, this is normal with some Bert models. You can train the pooler_output, (instead of last_hidden_state) which gives a vector based on the [CLS] token, for fine-tuning other tasks. However, if you use it without training it will be randomized. If you are using the last_hidden_state only this does not cause any problems.