Incoming: TensorRT-LLM version 0.6 with support for MoE, new models and more quantization

github.com

Incoming: TensorRT-LLM version 0.6 with support for MoE, new models and more quantization

github.com

Balance-B to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

Update TensorRT-LLM by kaiyux · Pull Request #524 · NVIDIA/TensorRT-LLM

github.com

Model Support Mixture of Experts support Features fMHA support for chunked attention and paged kv cache Baichuan FP8 quantization support Memory optimization Reduced host memory when buildin...

You must log in or register to comment.

Chat