Silly questions about GGUF and exl2

Desm0nt · 2 years ago

Hm. I just load gguf yi-34b-chat q4_k_m in oobabooga via llama.cpp with default params and 8k context and it’s just work like a charm. Better (more lively language) than any 70b from openrouter (my local machine can’t handle 70b)

Desm0nt · 2 years ago

This technology simply helps to circumvent the vicious circle of modern media platforms “to be promoted you need to be popular, and to be popular you need to be promoted”. As a result, accounts with already low reach get even less coverage, while the already popular accounts are additionally promoted by the system.

Desm0nt · 2 years ago

By loading a 20B-Q4_K_M model (50/65 layers offloaded seems to be the fastest from my tests) i currently get arround 0.65 t/s with a low context size of 500 or less, and about 0.45t/s nearing the max 4096 context.

Sound suspicious. A use Yi-Chat-34b-Q4_K_M on old 1080ti (11 gb VRAM) with 20 layers offloaded and got around 2.5 t/s.But it is on Threadripper 2920 with 4 channel RAM (also 3200). However I don’t think it would make that much difference. Ofcourse in 4 channel I have ram bandwidth x2 of your’s but I run 34b and I load only 20 layers on gpu…

Desm0nt · 2 years ago

Silly questions about GGUF and exl2

Desm0nt · 2 years ago

Answers like this (I can do no harm) to questions like this clearly show how dumb LLMs really are and how far away we are from AGI. They have absolutely no idea basically what they are being asked and what their answer is. Just a cool big T9 =)

In light of this, the drama in OpenAI with their arguments about the danger of AI capable of destroying humanity looks especially funny.

Desm0nt · 2 years ago

Mind if we use this as a default chain response on Anthropic’s twitter account along with that “we can’t write stories about minorities writing about their experiences being oppressed” response?

Now tell the model that the process had child processes and ask its opinion about it =)

Desm0nt · 2 years ago

Is it still only 4k context size?

I hope one day someone somehow find a way to extend context of Tiefighter atleast to 8k.
Because it’s the perfect model for real-time RP and stories even on weak PCs. It’s smarter than all 7b and 13b models and smarter than many 30b models, but the modest context of 4k tokens is eaten up faster than you can enjoy its potential…