I want to fine tune some LLM models with my own dataset which contains very long examples (a little > 2048 tokens). vRAM usage jumps up several GBs by just increasing the Cutoff Length from 512 to 1024.
Is there a way to feed those long examples into the models without increasing vRAM significantly?
You must log in or register to comment.
you can try changing the attention to something like flash attention