• 1 Post
  • 6 Comments
Joined 10 months ago
cake
Cake day: November 17th, 2023

help-circle


  • By loading a 20B-Q4_K_M model (50/65 layers offloaded seems to be the fastest from my tests) i currently get arround 0.65 t/s with a low context size of 500 or less, and about 0.45t/s nearing the max 4096 context.

    Sound suspicious. A use Yi-Chat-34b-Q4_K_M on old 1080ti (11 gb VRAM) with 20 layers offloaded and got around 2.5 t/s.But it is on Threadripper 2920 with 4 channel RAM (also 3200). However I don’t think it would make that much difference. Ofcourse in 4 channel I have ram bandwidth x2 of your’s but I run 34b and I load only 20 layers on gpu…



  • Answers like this (I can do no harm) to questions like this clearly show how dumb LLMs really are and how far away we are from AGI. They have absolutely no idea basically what they are being asked and what their answer is. Just a cool big T9 =)

    In light of this, the drama in OpenAI with their arguments about the danger of AI capable of destroying humanity looks especially funny.