Currently I have 12+24GB VRAM and I get Out Of Memory all the time when try to fine tune 33B models. 13B is fine, but the outcome is not very good so I would like to try 33B. I wonder if it’s worthy to replace my 12GB GPU with a 24GB one. Thanks!

  • AaaaaaaaaeeeeeB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    start with Lora rank=1, 4bit, flash-attention-2, context 256, batchsize=1 until your reach your maximum. Qlora 33b definitely works on just 24gb, it worked back a few months ago.

  • kpodkanowiczB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    i have some issues with flash attention and with 48gb i can go up to 512 rank with batch size 1 and max len 768. My last run was 1024 max len, batch 2, gradient 32, rank 128 and gives pretty nice results

  • a_beautiful_rhindB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Should work on single 24g gpu as well as either qlora or alpaca_lora_4bit. You won’t get big batches or big context but it’s good enough.

    • tgredditfcOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Thanks! I have some problems to load GPTQ models with transformer loader.