It’s working great so far. Just wanted to share and spread awareness that running multiple instances of webui (oobabooga) is basically a matter of having enough ram. I just finished running three models simultaneously (taking turns of course). Only offloaded one layer to gpu per model, used 5 threads per model, and all contexts were set to 4K. (the computer has 6 core cpu, 6GB vram, 64GB ram)
The models used were:
dolphin-2.2.1-ashhlimarp-mistral-7b.Q8_0.gguf
causallm_7b.Q5_K_M.gguf
mythomax-l2-13b.Q8_0.gguf (i meant to load a 7B on this one though)
I like it because it’s similar to the group chat on character.ai but without the censorship and I can edit any of the responses. Downsides are having to copy/paste between all the instances of the webui, and it seems that one of the models was focusing on one character instead of both. Also, I’m not sure what the actual context limit would be before the gpu would go out of memory.