Hello after a long time :)
I am TokenBender.
Some of you may remember my previous model - codeCherryPop
It was very kindly received so I am hoping I won’t be killed this time as well.
Releasing EvolvedSeeker-1.3B v0.0.1
A 1.3B model with 68.29% on HumanEval.
The base model is quite cracked, I just did with it what I usually try to do with every coding model.
Here is the model - https://huggingface.co/TokenBender/evolvedSeeker_1_3
I will post this in TheBloke’s server for GGUF but I find that Deepseek coder’s GGUF sucks for some reason so let’s see.
EvolvedSeeker v0.0.1 (First phase)
This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on 50k instructions for 3 epochs.
I have mostly curated instructions from evolInstruct datasets and some portions of glaive coder.
Around 3k answers were modified via self-instruct.
Recommended format is ChatML, Alpaca will work but take care of EOT token
This is a very early version of 1.3B sized model in my major project PIC (Partner-in-Crime)
Going to teach this model json/md adherence next.
I will just focus on simple things that I can do for now but anything you guys will say will be taken into consideration for fixes.
Interesting, is Partner in Crime (PIC) like an open source co-pilot type project? I haven’t heard of it before (did you coin this phrase yourself, or is it well known)?
I ask because the tasks you describe (json/md/function calling/empathy) and then the name itself, all basically make it sound like the “open source” models equivalent of a co-pilot model.
Do you plan to fine tune the 7b as well?
Ok, it finally downloaded and I’ve spent a few minutes with it. It keeps getting into endless pathways of jaron (e.g., “fair play make world communal environment tolerant embraces diversity embrace equity promote unity instill resilience proactive leadership” and it just goes on like that–no punctuation, no connecting words–until it reaches the token limit.) What loader and settings work best with this model?
Try the chat inference code mentioned in the model card if you’re running it on GPU. The size is good enough to test on free colab as well.
That definitely works better. I wouldn’t trust it too far though. It just told me I can remove the first part of a file with one seek() and one truncate() call…
Thank you. Really interesting. I have a question for you. Do you happen know of there are any trained from scratch coding model projects? The reason I ask is I have a very specific idea about how to best teach an LLM to program, but it requires changing some details at the very base encoding level and a change in presentation of the training data. I’ve been programming for over 30 years now and I strongly suspect there is this fairly simple trick to improving coding models, so I’d like to look at something open source that starts from the very beginning. Then I can investigate how hard it would be to implement what I’m thinking. The design I have should result in very small but capable models.