Given you have a V100 gpu at your disposal - just curious what different folks here will use for inference Llama based 7b and 13b models. Also would you use fastchat along with vLLM for conversation template?