netikas

netikas

Hi all!

I’m doing an experiment, in which I use mistral-7b-instruct to describe an algorithm of doing certain stuff, and then prompting it to write Python code, using said algorithm as guide rails of sorts and then shooting the code into Python REPL to check the results.

It works reasonably well, but sometimes it produces code which outputs exceptions. I pipe the exceptions back to model and ask to fix them, but after couple of iterations the model OOMs. I had lucky streaks in the beginning, when the model output good quality code that just worked, but now I see that the probability of it generating working code is quite low — about 30%.

So, I have two questions:

How can I clean the memory between questions and how do you do chats with more than 2-3 questions in the model, so the model does not OOM? I can theoretically get more VRAM (I can try to get A100 80GB and now I’m on V100s with 32GB of VRAM), but I suspect that there is a better way.
Is there any better small-ish model than mistral-7b-instruct for Python code generation? I am a-okay with using more than one model in the pipeline and having a (possibly quantised) 13b model in the pipeline, but I really really really don’t want to bother with bigger models since I’m Russian and renting GPUs is crazy expensive over here (>3$ per hour for V100!).

Thanks in advance!

How to clean memory between model queries + best smaller model for code generation

How to clean memory between model queries + best smaller model for code generation