How to run llama 2 70b on 4 GPU cluster (4x A100)

I have a cluster of 4 A100 GPUs (4x80GB) and want to run meta-llama/Llama-2-70b-hf. I’m a beginner and need some guidance.

- Need a script to run the model.

- Is 4xA100 enough to run the model ? or its more than required?

Need the model for inference only.

Chat