Size of tensor a does not match size of tensor b in InSilicoPerturbation
Thank you for making this model available!
I have had previous success running InSilicoPerturber on a subsetted Genecorpus-30M dataset composed of ~2700 cells, using the same parameters (I am simply using the Genecorpus-30M dataset to test how the InSilicoPerturbation functions works). However, after changes were pushed on Aug 2, 2023 to fix the attention mask issue, I now receive the following error:
isp = InSilicoPerturber(perturb_type="delete",
perturb_rank_shift=None,
genes_to_perturb=["ENSG00000135100"],
combos=0,
anchor_gene=None,
model_type="Pretrained",
num_classes=0,
emb_mode="cell",
cell_emb_style="mean_pool",
filter_data=None,
cell_states_to_model=None,
max_ncells=None,
emb_layer=-1,
forward_batch_size=50,
nproc=16,
token_dictionary_file = "/home/ubuntu/Geneformer/geneformer/token_dictionary.pkl")
isp.perturb_data("/home/ubuntu/Geneformer",
"/data/subset_genecorpus/",
"/data/subset_genecorpus/delete_cell/",
"delete_cell_HNF1A")
Filter (num_proc=16): 100%|βββββββ| 2741/2741 [00:12<00:00, 214.74 examples/s]
Map (num_proc=16): 100%|βββββββββββββββ| 37/37 [00:12<00:00, 2.92 examples/s]
Map (num_proc=16): 100%|ββββββββββββββ| 37/37 [00:00<00:00, 159.23 examples/s]
Map (num_proc=16): 100%|ββββββββββββββ| 37/37 [00:00<00:00, 166.24 examples/s]
Map (num_proc=16): 100%|ββββββββββββββ| 37/37 [00:00<00:00, 165.63 examples/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/Geneformer/geneformer/in_silico_perturber.py", line 974,
in perturb_data
self.in_silico_perturb(model,
File "/home/ubuntu/Geneformer/geneformer/in_silico_perturber.py", line 1052,
in in_silico_perturb
cos_sims_data = quant_cos_sims(model,
File "/home/ubuntu/Geneformer/geneformer/in_silico_perturber.py", line 444,
in quant_cos_sims
cos_sims += [cos(minibatch_emb, minibatch_comparison).to("cpu")]
File "/opt/tensorflow/lib/python3.10/site-packages/torch/nn/modules/module.p
y", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/tensorflow/lib/python3.10/site-packages/torch/nn/modules/distance
.py", line 87, in forward
return F.cosine_similarity(x1, x2, self.dim, self.eps)
RuntimeError: The size of tensor a (2047) must match the size of tensor b (204
6) at non-singleton dimension 1
I've referenced Discussion #85 to help with this issue; however changing the batch
size to 200 still raises the same error. I also have the latest version of Geneformer pulled.
Could I get some help with why this error is now raising? Thank you!
Hi there, thanks for bringing this issue up! We've just updated the code to address the issue. Thanks for your interest!