tl;dr: I’m considering building a budget machine for tinkering with LLMs, but I’m not sure if this is a good idea and how to go about it.

For context: I work in a university department. I currently have access to a 2080 Ti on a shared machine, and we’re in the process of acquiring a small server with 2 L40 cards. So for any larger experiments, I will be able to use this shared machine.

However, I think I would like to have my own small machine for tinkering: trying different models and techniques, and just playing around, and preparing larger experiments to be run on the server. My focus is on teaching and education not on state-of-the-art research.

With aiming for a good amount of VRAM, the 4060 Ti 16GB seems to be the most obvious choice; I also like the low power requirements (regarding energy and cooling). But this card seems to have a poor reputation overall. I’m also not sure what currently the sweet spot w.r.t. the the CPU and memory is – I completely lost track of Intel’s and AMD’s generations over the last years.

Some additional comment regarding some common opinions

  • I simply like to have my own hardware and cloud services seem to be more expensive in the long run.
  • There is not really a good market of used GPUs where I’m located (Singapore), so the common suggestion “go with as used 3090” does not really work.

Any good suggestions, or am I naive with my idea of a budget machine? Thanks a lot!

  • ttkciarB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    You can absolutely do interesting and useful things with very little hardware, with quantized models, especially if you don’t mind if inference is slow. My preferred quantization is q4_K_M (with GGUF and llama.cpp).

    I started with a spare Lenovo T560 Thinkpad with 8GB of RAM, which handled 7B models no problem. That’s a $120 eBay purchase. Once I was hooked, I shifted to one of the Dell T7910 in the homelab and moved up to larger models.

    I’m still not using a GPU for anything. It’s been CPU inference, which is slow but otherwise great.

    You could get just about any $300 desktop and put a decent GPU in it (16GB VRAM will allow fast inference with 13B models, and 24GB should allow heavily-quantized 30B) and enjoy fast inference. The most expensive bit is the GPU.

    See this sub’s wiki for more detailed hardware tips.