• 1 Post
  • 5 Comments
Joined 1 year ago
cake
Cake day: November 24th, 2023

help-circle
  • I’m late to the party on this one.

    I’ve been loving the 2.4BPW EXL2 quants from Lone Striker recently, specifically using Euryale 1.3 70B and LZLV 70B.

    Even at the smaller quant, they’re very capable, and leagues ahead of smaller models in terms of comprehension and reasoning. Min-P sampling parameters have been a big step forward, as well.

    The only downside I can see is the limitation to context length on a single 24GB VRAM card. Perhaps further testing of Nous-Capyabara 34B at 4.65BPW on EXL2 is in order.


  • Agreed - I’m personally using 70B models at 2.4BPW EXL2 quants, as well. They hold up great even at a small quantization as long as sampling parameters are set correctly, and the models are subjectively more pleasant in prose (Euryale 1.3 and LZLV both come to mind).

    At 2.4BPW, they fit into 24GB of VRAM and inference is extremely fast, and EXL2 also appears to be very promising as a quantization method. I believe the potential upsides are yet to be fully leveraged.