In addition to what others said, exl2 is very sensitive to the quantization dataset, which it uses to choose where to assign those “variable” bits.
Most online quants use wikitext. But I believe if you quantize models yourself on your own chats, you can get better results, especially below 4bpw.
Probably a Vulkan driver quirk. I would report it to Huawei, somehow, if you can. They probably have some business interest in running LLMs on their phones, and would fix it if they knew.