I am currently trying to build small convolutional regression models with very tight constraints regarding model size (max. a few thousand parameters).

Are there any rules of thumb/gold standards/best practices to consider here? E.g. should I prefer depth of the model over width, do skip connections add anything in these small scales, are there any special training hacks that might boost performance, etc?

Any hints or pointers, where to look are greatly appreciated.