[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Models

wojcech · 1 year ago

[R] "It's not just memorizing the training data" they said: Scalable Extraction of Training Data from (Production) Language Models

wojcech · 1 year ago

[R] "It's not just memorization" they said: Extracting Training Data from ChatGPT

wojcech · 1 year ago

EPFL releases an open Medical Llama 2 finetune, including weights and training data, within 5%/10% of GPT-4/Med-PaLM-2

wojcech · 1 year ago

Just to be clear, you aren’t doing fine tuning here as in gradient updates, you are using the base model + ICL?

wojcech · 1 year ago

Fine tune as in gradient updates or as in ICL?

wojcech · 1 year ago

Is anyone experimenting with non-instruction tuned models?

wojcech · 1 year ago

Go back to your history: Cauchy is the earliest person I’m aware of to have used gradient descent, and he motivated it as

one ordinarily starts by reducing them to a single one by successive eliminations, to eventually solve for good the resulting equation, if possible. But it is important to observe that 1◦ in many cases, the elimination cannot be performed in any way; 2◦ the resulting equation is usually very complicated, even though the given equations are rather simple

That is, the usefulness of gradient descent is motivated when you have rough idea of when you are close to the minimum, but you don’t want to go through the hassle of algebra. (realistically, if you can solve it with gradient descent, you could probably solve it algebraicly, we just don’t have the same stupidly easy to implement computational routines for it)

https://www.math.uni-bielefeld.de/documenta/vol-ismp/40_lemarechal-claude.pdf

wojcech · 1 year ago