Radiant-Practice-270B to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

How can I improve inference performance to a normal range?

2

1

How can I improve inference performance to a normal range?

Radiant-Practice-270B to

LocalLLaMA@poweruser.forumEnglish · 1 year ago

2

At work, we are using four A100 cards (0,1 nvlinked and 2,3 nvlinked) and I am curious about how to connect all four cards. Additionally, when using four A100 cards, the performance seems slower and the token usage is much lower compared to using a 4060 Ti at home. Why might this be? When I check with nvidia-smi, it shows that the VRAM is being fully utilized, but the volatile GPU utilization is not 100% for all four, usually something like 100, 70, 16, 16. (using KVM passthrough rhel8 server)

Chat

Radiant-Practice-270OPB
link
fedilink
English
arrow-up
1·
1 year ago
sry for late reply. i already test about that , it is better than codellama 13b model but , 30token/s …