I’m running Llama-2 7b using Google Colab on a 40gb A100. However it’s using 26.8 gb of vram, is that normal? I tried using 13b version however the system ran out of memory. Yes I know quantized versions are almost as good but I specifically need unquantized.
https://colab.research.google.com/drive/10KL87N1ZQxSgPmS9eZxPKTXnobUR_pYT?usp=sharing
You must log in or register to comment.