Looking for speed and accuracy. Any suggestions on cloud hosts?
Guding output was already mentioned but maybe I will mention how this can be done even with very weak model.
You use text complete end point where you will be constructing your prompts.
You specify context and make it stand out as a separate block
Then in a prompt you ask to fill a specific detail (just one to the JSON)
In the completeion part (i.e. after assistant) you already pre-write out put in JSON format with first value,
You stop streaming after " sign
change the prompt to ask for the next value, add it as next atribute to the JSON you are generating and again start generation and stop with "Very, very fast -you barely generate any tokens mostly eval prompts.
Test manually once you you have good result ask GPT4 to write you a python wrapper to do it.