If I instruction tune an LLM with a dataset where each sample is randomly generated and fit into some set of prompt templates so that my dataset is effectively very large in theory, and I train the model for a certain number of steps, is that worse than just training on a dataset of a fixed size? I’d assume it is worse because the LLM won’t see each instruction example more than once most likely, so it probably can’t learn patterns from the data very well. I’ve trained a couple models using this approach for thousands of steps and it seems like the model hasn’t really learned anything that could be applied to complicated test examples.