Radiant-Practice-270B to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

Why is a single a100 so slow?

9

1

Why is a single a100 so slow?

Radiant-Practice-270B to

LocalLLaMA@poweruser.forumEnglish · 2 years ago

9

I’m using a100 pcie 80g. Cuda11.8 toolkit 525.x

But when i inference codellama 13b with oobabooga(web ui)

It just make 5tokens/s

It is so slow.

Is there any config or something else for a100???

Chat

easyllaamaB
link
fedilink
English
arrow-up
1·
2 years ago
Try use GGUF, this format likes single GPU especially you have 80GB vram. I think you can run 70gb GGUF with all layers in GPU.