Zephyr-7B QLoRA Benchmark for Summarization and Classification

llama-ben · 1 year ago

No-Link-2778 · 1 year ago

Will you consider the non-DPO one? There seems to be a downgrade on NLP tasks compared with the original SFT model.

vasileer · 1 year ago

are the finetuned models published somewhere?

Method	Zephyr-7B-β Zero-Shot	Zephyr-7B-β Few-Shot	Fine-Tuning + QLoRA	Fine-Tuning + QLoRA + NEFTune	Fine-Tuning + QLoRA + Full Module Tuning	Fine-Tuning + QLoRA + NEFTune + Full Module Tuning
ROUGE-1 (in %)	33.93	35.99	52.84	52.97	53.50	53.05
ROUGE-2 (in %)	11.21	12.97	27.75	28.44	29.66	29.23

Training samples (fraction)	Zephyr-7B-β	Zephyr-7B-β w/ NEFTune	Zephyr-7B-β w/ Full Module Tuning	Zephyr-7B-β w/ NEFTune + Full Module Tuning
266 (2.5%)	46.05	49.61	65.36	67.23
533 (5%)	55.66	60.33	72.26	72.94
1066 (10%)	66.48	64.65	73.29	72.82
2666 (25%)	66.73	68.04	74.27	75.85
5332 (50%)	69.54	72.10	74.83	74.40
10664 (100%)	74.90	72.93	77.76	77.86