I have a cluster of 4 A100 GPUs (4x80GB) and want to run meta-llama/Llama-2-70b-hf. I’m a beginner and need some guidance.

- Need a script to run the model.

- Is 4xA100 enough to run the model ? or its more than required?

Need the model for inference only.