Point me towards some basic dataset preparation tips for LLM's?

ArtifartX · 1 year ago

Point me towards some basic dataset preparation tips for LLM's?

Tiny_Arugula_5648 · 1 year ago

Go to huggingface and look at the multitude of datsets that have already been prepped and read whatever documentation and papers that have been published. Go through the data and get a sense of what the data looks like and how it’s structured.

ArtifartX · 1 year ago

Yea, doing this is part of what spurred the question, because I began to notice some datasets that were very clean and ordered into data pairs, and others that seemed formatted differently, and others still that seemed like they were fed a massive chunk of unstructured text. It made me confused on if there were some sort of standards or not that I was not aware of.