• 1 Post
  • 16 Comments
Joined 1 year ago
cake
Cake day: October 30th, 2023

help-circle


  • HumanEval is 164 function declarations and corresponding docstrings, and evaluation happens by set of unit tests while code is running in docker. Extra is coming from HumanEvalPlus that added several unit tests per each on the top.

    Merging models might improve its capabilities, but this one was not able to find out of bounds of wrongly declared vector - there is no chance it magically is able to complete complex python code on the level that is basically on GPT4 level






  • Guding output was already mentioned but maybe I will mention how this can be done even with very weak model.

    You use text complete end point where you will be constructing your prompts.
    You specify context and make it stand out as a separate block
    Then in a prompt you ask to fill a specific detail (just one to the JSON)
    In the completeion part (i.e. after assistant) you already pre-write out put in JSON format with first value,
    You stop streaming after " sign
    change the prompt to ask for the next value, add it as next atribute to the JSON you are generating and again start generation and stop with "

    Very, very fast -you barely generate any tokens mostly eval prompts.

    Test manually once you you have good result ask GPT4 to write you a python wrapper to do it.


  • This is really interesting work!!! I’m doing research on Contrastive Decoding and have pretty good results so far, moreover reading this post I realized it might fix my issues with picking the right alpha.

    I have a suggestion to make to OP and people reading this post - could we start collecting “goto” questions that this community uses for testing? IT will be easier to automate and then publish all outputs at once and let people rank whether they like the output or not.

    This way it will be much easier for small teams and individuals to conduct meaningful progress