Hello, I’m still new to this, but I want to focus on using a RAG and Vector DB to store all my personal and work-related data.

I’m seeking a better understanding of how things work.

I’m interested in covering multiple domains, such as “Sales,” “Marketing,” and “Security.”

I plan to use an embedding model to create embeddings and then store them in a Vector Database. When I interact with my LLM, it should retrieve relevant data based on my prompt and feed this into my LLM query.

For instance:

“What’s the command for xyz?” or “Create me a good offer for xyz.”

As I understand it, there will be a backend semantic search for “Create me a good offer.” Based on similarity, and possibly nearest neighbors, it will provide the LLM with context based on my prompt. The system’s prompt for the LLM will then be based on this information to deliver the best possible answer.

Now, the big question is… when creating my dataset to store in the Vector DB, should I label the dataset with tags like [M] or [S] for sales? This way, when I type my prompt and add the label [S], the semantic search can more accurately determine where to look.

Does this approach make sense, or could it lead to more problems than it solves?

I mean i asked GPT4 but thas not the same as someone who maybe have some extended knowledge about this.

Thanks!