Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse

neuralmagic.com

Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse

neuralmagic.com

nmcfarlB to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse - Neural Magic

neuralmagic.com

Key Takeaways We expanded our Sparse Fine-Tuning research results to include Llama 2. The results include 60% sparsity with INT8 quantization and no drop in accuracy. DeepSparse now supports accelerated inference of sparse-quantized Llama 2 models, with inference speeds 6-8x faster over the baseline at 60-80% sparsity. We used some interesting algorithmic techniques in order

You must log in or register to comment.

Chat