Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything!

vatsadev · 2 years ago

Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything!

hackerllama · 2 years ago

Hey there! I think this is doing offloading?

If so, it’s not a new thing. Check out https://huggingface.co/docs/accelerate/usage_guides/big_modeling for a guide with code and videos about it

Tiny_Arugula_5648 · 2 years ago

one of those cases where proving something can be done doesn’t make it useful. This has to be one of the least efficient ways to do inferencing. Like the people who got Doom running on a HP printer. Great you did it but it’s the worst possible version.