Currently I have 12+24GB VRAM and I get Out Of Memory all the time when try to fine tune 33B models. 13B is fine, but the outcome is not very good so I would like to try 33B. I wonder if it’s worthy to replace my 12GB GPU with a 24GB one. Thanks!
start with Lora rank=1, 4bit, flash-attention-2, context 256, batchsize=1 until your reach your maximum. Qlora 33b definitely works on just 24gb, it worked back a few months ago.
i have some issues with flash attention and with 48gb i can go up to 512 rank with batch size 1 and max len 768. My last run was 1024 max len, batch 2, gradient 32, rank 128 and gives pretty nice results
Thanks for sharing!
Should work on single 24g gpu as well as either qlora or alpaca_lora_4bit. You won’t get big batches or big context but it’s good enough.
Thanks! I have some problems to load GPTQ models with transformer loader.