The_One_Who_SlaysB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

Any way to decrease inference time during long chats?(+decrease repetition without breaking things)

5

1

Any way to decrease inference time during long chats?(+decrease repetition without breaking things)

The_One_Who_SlaysB to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

5

Using Oobabooga’s Webui on cloud.

I haven’t noticed that immediately, but apparently once I breach the context limit, or some short time after the fact, the inference time increases significantly. For example, in the beginning of the conversation a single message goes about 13-16 tps. After reaching the threshold, the speed starts decreasing until it becomes around 0.1 tps.

Not only that, but the text also starts repeating. For example, character’s certain features or their actions start coming up in almost every sunsequent message with almost identical wording, like some sort of a broken record. It’s not impossible to stir the plot forward, but it gets tiring, especially considering a huge delay on top of that.

Is there any solution or a workaround to these problems?

Chat

The_One_Who_SlaysOPB
link
fedilink
English
arrow-up
1·
1 year ago
That would be amazing. I think something like that could even be included into ooba’s official extension repo.