I come from computer vision tasks with convnets that are relatively small in size and parameters, yet performing quite well (e.g. ResNet family, YOLO, etc.).
Now I am approaching some NLP and architectures based on transformers tend to be huge, so that I have problems to fit them in memory.
What infrastructure you use to train these model (GPT2, BERT or even the bigger ones)? cloud computing, HPC, etc.
You must log in or register to comment.