So for background I’ve had some interest in LLMs and other AI for a year or so. I’ve used online LLMs like ChatGPT but haven’t tried running my own due to 10 year old hardware. I’m considering getting a new PC and want to know whether to splash for one that can do high end LLM stuff.

I’ve read up a fair bit but have some questions that hopefully aren’t too stupid.

1.) It looks like VRAM is the biggest hardware limit for model size. What are some good hardware options at different price points? Are there really expensive options that blow consumer stuff out of the water? Is now a good time to buy or is there something worth waiting for?
2.) Open source models seem to be dependent on the trainers giving away their expensively acquired work. Are you anticipating model releases to replace LLAMA2, and when?
3.) Is retraining or fine tuning possible for ordinary users? Is this meaningfully different from having a ‘mission’ or instruction set added to the beginning of each prompt/context? 3.) I think I understand parameter size and compression, but what determines the token context size a model can handle? GPT4s new massive context size is very handy.
4.) I’m interested in ‘AutoGPT’ type systems (or response + validation etc). Can this work in series mode, where you only have 1 model running a time? It seems like having specialised models could be useful. Would loading different models most suited to each particular ‘subroutine’ slow things down a lot? Are these systems difficult to set up or is it just a matter of feeding the output of one query into the input of the next (while adding on previous relevant context).
5.) Is the same type of hardware setup good for both LLMs and Stable Diffusion, or do they have separate setups for good bang/buck?

Many thanks to anyone who can help!

  • EvokerTCGOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I haven’t tried Mac and don’t know what the software ecosystem is like. Have you tried it or seen it working?

    It looks like it doesn’t have dedicated VRAM, but shared memory. I would guess this is slower than dedicated GPU memory but faster than RAM sticks on a normal PC?