Hey LLM enthusiasts! Together AI just rolled out some incredible updates that are game changers in the world of large language models. Here’s a quick rundown:
-
Together Inference Engine Launched: Prepare to be blown away! This new inference engine integrates cutting-edge techniques like FlashAttention-2, Flash-Decoding, and Medusa. It’s touted as the fastest inference service around, significantly outpacing competitors. Plus, they’ve slashed prices:
- 7B model at $0.0002/1K tokens
- 13B model at $0.000225/1K tokens
- 70B model at $0.0009/1K tokens
-
Introducing Together Custom Models: This is huge for AI teams! Build your own state-of-the-art LLM with total ownership after creation. The process includes top-tier tech like FlashAttention-2, DoReMi, and DSIR. They’ve already helped Arcee develop a custom legal LLM at lightning speed.
-
Together GPU Clusters Expansion to 20 Exaflops: Known earlier as Together Compute, this offers dedicated, high-speed GPU training clusters with NVIDIA’s finest like H100 and A100 GPUs. It’s perfect for those needing flexible, scalable, and efficient model training with robust support. Startups like Pika Labs have already saved over $1M in 5 months!
Together AI is pushing the frontiers in LLM performance and scalability. Check out their new website and the updated inference stack at api.together.ai. Can’t wait to see what the community builds with these tools!
This post was shamelessly generated with GPT4.Got the newsletter and thought I would share, cheap inference of open source LLM is very relevant to this sub even if it is not “local”.