I want to fine tune some LLM models with my own dataset which contains very long examples (a little > 2048 tokens). vRAM usage jumps up several GBs by just increasing the Cutoff Length from 512 to 1024.

Is there a way to feed those long examples into the models without increasing vRAM significantly?

  • wind_dudeB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    you can try changing the attention to something like flash attention