Hey all! A friend and I have been building with open-source LLMs for a while now (originally for other project ideas) and found that quickly iterating with different fine-tuning datasets is super hard. Training a model, setting up some inference code to try out the model and then going back and forth took 90% of our time.

That’s why we built Haven, a service to quickly try out different fine-tuning datasets and base-models. Going from uploading a dataset to chatting with the resulting model now takes less than 5 minutes (using a reasonably sized dataset).

We fine-tune the models using low-rank adapters, which not only means that the changes made to the model are very small (only 30mb for a 7b parameter LLM), it also allows us to host many fine-tuned models very efficiently by hot swapping adapters on demand. This helped us reduce cold-start times to below one second. Research has shown that low-rank fine-tuning performance stays almost on-par with full fine-tuning.

We charge $0.004/1k training tokens. New accounts start with $5 in free credits so you can get started for free. You can export all the models to Huggingface.

Right now we support Llama-2 and Zephyr (which is itself a fine-tune of Mistral) as base-models. We’re gonna add some more soon. We hope you find this useful and we would love your feedback!

This is where to find it:
https://haven.run/

  • torque-mcclydeOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Yes, our datasets usually have a few hundred examples. We do support arbitrarily large datasets though, the fine-tuning just takes a little longer.

    For deploying and scaling we’re using Modal, it’s a “serverless” GPU provider that we found to be very user-friendly.