qresearch/doubutsu-2b-pt-756 · Notebook errors

Jul 23

•

Running the notebook as-is on a local Nvidia 3090 gives the following errors:

File ~/doubutsu-2b-pt-756/venv/lib/python3.10/site-packages/torch/nn/functional.py:3086, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
3084 if size_average is not None or reduce is not None:
3085 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3086 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

ValueError: Expected input batch_size (11160) to match target batch_size (3870)."

This is because the notebook patchifies the image, and the modelling file ALSO patchifies that image, so you get N^2 patches instead of N.

Upon fixing that, gets OOM after 2-3 iterations on docci images, which should not be happening because it's only a 2B model in float16 with a Lora. This should be readily trainable on consumer 24GB hardware.

Partially fixed by adding:
model.text_model.gradient_checkpointing_enable()
before the LoraConfig.

(This uses 21GB but that's still far higher than it should be)

qtnx

qresearch org Jul 23

all true, the reason why the notebook has this error is that we worked on it without doubutsu_next but with doubutsu, this should be fixed
for gradient checkpointing also true, we forgot, we work with an a100 so it was fine enough.

if you got it working feel free to send a PR to the notebook

pastel1010

Jul 23

I did get it working but I won't submit a PR to anyone who thinks an A100 is "smol".

qtnx

qresearch org Jul 23

alright then

qtnx changed discussion status to closed Jul 23

qtnx

qresearch org Jul 23

this is now fixed