I am pretty confused on setting up a SFTTrainer fine tuning of a model like Mistral-7b. I’ve seen a ton of different example notebooks all use different parameters and the documentation on HuggingFace isn’t clarifiying. My questions are the following:

  1. After I setup a LoraConfig i.e.
    peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias=“none”,
    task_type=“CAUSAL_LM”,
    )
    Is it enough to just pass this into the SFTTrainer argument peft_config, or is it required to also call “model = get_peft_model(model, peft_config)”. I’ve seen a bunch of notebooks skip this and “model = prepare_model_for_kbit_training(model)” and I’ve seen others say its important. I assumed passing it as an SFTTrainer argument maybe bypasses the need to call those functions directly. Do i need the call merge and unload after training?
  2. There seems to be no consensus on the tokenizer setup. I’ve seeing people say padding_side=left is correct and others say that doesn’t work. Do i need to add tokenizer.pad_token = tokenizer.eos_token? What is the full proper setup for the tokenizer and where is the source of truth on this anyway? The Mistral website?

Thank you for the help. I am new to LLM finetuning and want to make sure I’m understanding this properly.

  • kpodkanowiczB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I have never gotten Flash Attention to work despite testing both paddings, but I’m due to do a clean installation sometime next month. Currently, I use padding Right without FA.

    afaik you need to run model = get_peft_model, as you need to pass peft model as argument to sfttrainer