Let’s say hypothetically that I’m a GPU poor and I’m a simpleton who has never gone beyond oobaboogaing and koboldcpping, and I want to run models larger than mistral at more than 2 tokens per second. Speculative decoding is my only option, right? What’s the easiest way to do this? Do any UIs support it out of the box?