Anyone Tried Adding New Languages to Open Source AI Models? Need Advice!

nefarkederki · 1 year ago

AutomataManifold · 1 year ago

I know there’s several projects for finetuning llama for Chinese. I haven’t worked on them but it might be worth looking in to what they did.

nefarkederki · 1 year ago

Hey there! Thanks for the tip. I have made some research and found out this : https://github.com/ymcui/Chinese-LLaMA-Alpaca

So for those who are interested here is what I understand from what needs to be done :

If Llama2 tokenizer does not support your language, you need to expand that vocabulary first
You’ll need data for further fine tuning, and also for instruction tuning.
And you will need money :D For training