I’ve been delving into the LLM space recently and have been working with llama-2-7b on HF just trying to understand the model as well as ways to modify it. I played around with a couple financial news/sentiment models which were cool, but these are typically trained models. I’m wondering how much it matters to have a model trained on a specific area, and then to build a RAG process around it for simular documents?

Taking financial analysis as an example, if I train a model to using a training dataset of finance concepts Q&A from a book, progressing to more detailed Q&A’s on specific interactions between those concepts, this should just help the model help predict the ‘next word’ in such situations. I’d think such a model could be useful, but I could see where RAG just does this better.

However, I’m wondering if training (using let’s say training data made from 50% of some set of source data) then using the other 50% for RAG would provide any benefit? The source data would be similar, but would be from various authors/sources so it provides some additional context that wouldn’t be gained in either training, or RAG.

Feel free to let me know if this is a stupid way to think about it, but simplistically, it feels like training is like putting a mask on (it’s still llama, but now it’s Anthony Bourdain llama, or Banker llama) whereas RAG is just Ctrl-F’ing really well. Does putting the mask on before Ctrl-F’ing make your results better, or is it the same as just Ctrl-F’ing?

Intuitively, I’d think the mask first does make a difference, but I’d appreciate any thoughts!