Hi, as Speculative Decoding runs a small model and a large model at the same time with a sampler in between, but in this instance the sampler’s job is to NOT skew the probability distributions while doing so. There’s a fairly simple python implementation of this idea here. Is there a way we can adjust the probability distributions of either the small model or the large model for the task of generation?
You must log in or register to comment.