Image-Text-to-Text
Transformers
PyTorch
English
doubutsu_next
conversational
custom_code
Inference Endpoints

Notebook errors

#1
by pastel1010 - opened

Running the notebook as-is on a local Nvidia 3090 gives the following errors:

  1. File ~/doubutsu-2b-pt-756/venv/lib/python3.10/site-packages/torch/nn/functional.py:3086, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
    3084 if size_average is not None or reduce is not None:
    3085 reduction = _Reduction.legacy_get_string(size_average, reduce)
    -> 3086 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

ValueError: Expected input batch_size (11160) to match target batch_size (3870)."

This is because the notebook patchifies the image, and the modelling file ALSO patchifies that image, so you get N^2 patches instead of N.

  1. Upon fixing that, gets OOM after 2-3 iterations on docci images, which should not be happening because it's only a 2B model in float16 with a Lora. This should be readily trainable on consumer 24GB hardware.

Partially fixed by adding:
model.text_model.gradient_checkpointing_enable()
before the LoraConfig.

(This uses 21GB but that's still far higher than it should be)

qresearch org

all true, the reason why the notebook has this error is that we worked on it without doubutsu_next but with doubutsu, this should be fixed
for gradient checkpointing also true, we forgot, we work with an a100 so it was fine enough.

if you got it working feel free to send a PR to the notebook

I did get it working but I won't submit a PR to anyone who thinks an A100 is "smol".

qresearch org

alright then

qtnx changed discussion status to closed
qresearch org

this is now fixed

Sign up or log in to comment