Inference Speed When Running Local Models

Frequent-Let231B to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

I am running a LLaMa 13B instance (via GPT4all) and am finding inference times to be quite slow, especially for summarization. Does anyone have recommendations for models that can do summarization of 4k+ tokens extremely quickly?

Chat