Felladrin/TinyMistral-248M-Alpaca

Thistleknot · 1 year ago

Felladrin/TinyMistral-248M-Alpaca

Thistleknot · 1 year ago

I was going to try to knowledge distill but they modified their tokenizer.

Either way neo has a 125M model, so a 248M model is x2 that. I imagine this could be useful for shorter context tasks. Idk, or to continue training for very tight uses cases

I came across it while looking for tiny mistral config jsons to replicate⁸

https://preview.redd.it/l9l7a39u3a1c1.jpeg?width=720&format=pjpg&auto=webp&s=80589cb6fbb2268b0d8af65b4ec27647185b4780

Felladrin/TinyMistral-248M-Alpaca

Felladrin/TinyMistral-248M-Alpaca

Felladrin/TinyMistral-248M-Alpaca · Hugging Face