Hey r/MachineLearning!

Last year, u/rajatarya showcased how we scaled Git to handle large datasets. One piece of feedback we kept getting is that people didn’t want to move their source code over to XetHub.

So we built a GitHub app & integration that lets you continue storing code in GitHub while XetHub handles the large datasets & models.

https://about.xethub.com/blog/xetdata-scale-github-repos-100-tb

We’ve enjoyed using it to host open source LLM’s like Llama2 and Mistral with our finetuning code side-by-side.

The whole thing is in beta so we’re eager for any feedback you have to offer :)