So I literally just got Zephyr 7b beta GPTQ running

(tried GGUF but i was getting 0.5 tokens a second, vs 8 now, if anyone knows what was up with that please let me know, im on a 3070RTX laptop 16GB VRAM, 32GB RAM with a high end i7 12700 (20 threads)

Settings were: gpu-layers 30, n_ctx tried 2k, 4k, 32k, threads 20).

Anyhow, I immediately wanted to test Zephyr by telling it to create a new joke that doesn’t exist (tested simply by googling the joke without the punchline). I was very impressed. Here it is:

You

Create a joke

AI

How does Santa Claus enter homes without being detected by cameras? He uses claymation! (Laughter)

You

Please explain the joke

AI

The joke plays off of two popular concepts - Santa Claus and home security cameras. In this case, it suggests that Santa could be using stop-motion animation (called “claymation”) to avoid detection by cameras placed around people’s houses during Christmas time. It’s a silly, lighthearted pun that makes use of wordplay to create humor.

You

How does stop-motion animation make sense for not being detected by cameras?

AI

The punchline relies on the fact that most people associate claymation with stop-motion animation, which involves moving objects frame by frame to create the illusion of motion. This technique requires physical manipulation rather than digital alterations, meaning that no matter how sophisticated modern camera technology becomes, it won’t be able to capture Santa as he moves around people’s houses in this way.

You

I still don’t understand why stop-motion animation would stop the cameras from detecting Santa. Please explain like i’m 5.

AI

Well, let’s say that Santa uses claymation to move around at night when everyone is sleeping. Since claymation is done by physically moving things, the cameras can’t see him because they only take pictures when everything is still. So, even though Santa might leave presents under the tree, the cameras wouldn’t be able to catch him because they think nothing is happening!"

  • Evening_Ad6637B
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Threads 20 is too high. Set t=8 or t=10 but not more.

    Gpu layers should be 35 I think, I mean all layers should fit in gpu.

    This way you achieve 10x inference speedup compared to your current setup