TowerInstruct takes twice as much space as TowerBase
#3
by
bpop
- opened
Hello all,
I've been experimenting with both TowerBase and TowerInstruct. When I load them in python, they both have the expected number of parameters (6.7 billion, give or take). However, they take up drastically different amounts of space in my .cache. TowerInstruct is 26G, while TowerBase is only 13G. I imagine this is because TowerBase's weights are stored as bf16, whereas TowerInstruct's are fp32. But...was this on purpose?