Alien Top
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Balance-B to LocalLLaMA@poweruser.forumEnglish · 2 years ago

Incoming: TensorRT-LLM version 0.6 with support for MoE, new models and more quantization

github.com

external-link
message-square
0
link
fedilink
1
external-link

Incoming: TensorRT-LLM version 0.6 with support for MoE, new models and more quantization

github.com

Balance-B to LocalLLaMA@poweruser.forumEnglish · 2 years ago
message-square
0
link
fedilink
Update TensorRT-LLM by kaiyux · Pull Request #524 · NVIDIA/TensorRT-LLM
github.com
external-link
Model Support Mixture of Experts support Features fMHA support for chunked attention and paged kv cache Baichuan FP8 quantization support Memory optimization Reduced host memory when buildin...
alert-triangle
You must log in or register to comment.

LocalLLaMA@poweruser.forum

localllama@poweruser.forum

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !localllama@poweruser.forum

Community to discuss about Llama, the family of large language models created by Meta AI.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 4 users / day
  • 4 users / week
  • 4 users / month
  • 4 users / 6 months
  • 3 local subscribers
  • 4 subscribers
  • 1.03K Posts
  • 5.96K Comments
  • Modlog
  • mods:
  • communick@poweruser.forum
  • BE: 0.19.11
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org