MTL Classifier - Discreptancy between train and validation mappings
Good afternoon,
I would like to contribute my thought as to how the task label mappings are currently created during training. As far as I am able to understand from the code, the mappings for train and validation are created independently of each other in preload_and_process_data (mtl/data.py#L97). But in load_and_preprocess_data (mtl/data.py#L45), they are saved to the same file. If the mappings differ (some classes missing in validation), this causes a lot of issues later in training and validation --- the reported validation loss in hyperparameter tuning is wrong, and load_and_evaluate_test_model (mtl/eval_utils.py#L54) fails if test dataset has different amount of classes too.
I think that only printing the mappings is not enough to show this issue. I think one of the following should be implemented for clarity:
- raise an Error if the mappings do not agree and do not proceed with training,
- create the mappings based on joint training, validation and test dataset, save it and then load the same mapping for all three datasets
Best,
Milos