Alien Top
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
nmcfarlB to LocalLLaMA@poweruser.forumEnglish · 2 years ago

Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse

neuralmagic.com

external-link
message-square
1
link
fedilink
1
external-link

Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse

neuralmagic.com

nmcfarlB to LocalLLaMA@poweruser.forumEnglish · 2 years ago
message-square
1
link
fedilink
Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse - Neural Magic
neuralmagic.com
external-link
Key Takeaways We expanded our Sparse Fine-Tuning research results to include Llama 2. The results include 60% sparsity with INT8 quantization and no drop in accuracy. DeepSparse now supports accelerated inference of sparse-quantized Llama 2 models, with inference speeds 6-8x faster over the baseline at 60-80% sparsity. We used some interesting algorithmic techniques in order
alert-triangle
You must log in or register to comment.

LocalLLaMA@poweruser.forum

localllama@poweruser.forum

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !localllama@poweruser.forum

Community to discuss about Llama, the family of large language models created by Meta AI.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 4 users / day
  • 4 users / week
  • 4 users / month
  • 4 users / 6 months
  • 3 local subscribers
  • 4 subscribers
  • 1.03K Posts
  • 5.96K Comments
  • Modlog
  • mods:
  • communick@poweruser.forum
  • BE: 0.19.11
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org