Mundane_Definition_8B to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

need advice for reducing inference time

1

1

need advice for reducing inference time

Mundane_Definition_8B to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

1

I’m using mistral-7b to understand LLMs’ procedure.

Does anyone have an idea to improve this process?

do not recommend changing the number of tokens -> 1. :)

You must log in or register to comment.

Chat

Ok_Post_149B
link
fedilink
English
arrow-up
1·
1 year ago
I just wrote a tutorial on how you can scale Mistral-7b to many GPUs in the cloud. I hope this can give you some value. Not sure if you’re looking to do on-demand inference or inference on a bunch of inputs.

https://www.reddit.com/r/LocalLLaMA/comments/17k2x62/i_scaled_mistral_7b_to_200_gpus_in_less_than_5/