I have access to a single 80Gb A100 GPU and would like to train an LLM with GPT-like architecture from scratch. Does anyone know how to calculate the maximum model size.