is there any other tools like vLLM or TensorRT that can be used to speed up LLM inference?

DataLearnerAI · 2 years ago

Ali opensouced a 72B model called Qwen-72B: Qwen/Qwen-72B · Hugging Face

It supports Chinese and English. The performance on MMLU is remarkable.

DataLearnerAI · 2 years ago

is there any other tools like vLLM or TensorRT that can be used to speed up LLM inference?

DataLearnerAI · 2 years ago

In most scenarios, models with extended context are optimized for long sequences. If the sequence is not very long, it is often recommended to use a regular model