DeadliborB to LocalLLaMA@poweruser.forumEnglish · 1 year agoWhat UI do you use and why?message-squaremessage-square57fedilinkarrow-up11arrow-down10file-text
arrow-up11arrow-down1message-squareWhat UI do you use and why?DeadliborB to LocalLLaMA@poweruser.forumEnglish · 1 year agomessage-square57fedilinkfile-text
minus-squaremcmoose1900BlinkfedilinkEnglisharrow-up1·1 year ago I don’t know of a model that fits in a 3090 and takes that much time to inference on Yi-34B-200K is the base model I’m using. Specifically the Capybara/Tess tunes. I can squeeze 63K context on it at 3.5bpw. Its actually surprisingly good at continuing a full context story, referencing details throughout and such. Anyway I am on linux, so no gpu swap like windows. I am indeed using it in a chat/novel style chat, so the context does scroll and get cached in ooba.
Yi-34B-200K is the base model I’m using. Specifically the Capybara/Tess tunes.
I can squeeze 63K context on it at 3.5bpw. Its actually surprisingly good at continuing a full context story, referencing details throughout and such.
Anyway I am on linux, so no gpu swap like windows. I am indeed using it in a chat/novel style chat, so the context does scroll and get cached in ooba.