Is anyone using vLLM for inference? Are there any faster inference framework for LLama based models?

kitkatmafiaB to

LocalLLaMA@poweruser.forumEnglish · 11 months ago

Given you have a V100 gpu at your disposal - just curious what different folks here will use for inference Llama based 7b and 13b models. Also would you use fastchat along with vLLM for conversation template?

You must log in or register to comment.

Chat