Hello fellow llamas!!!

Here is what I am hacking on….

I am exploring new ways to build generative AI foundational models without traditional math-centric training costs and resources. I am trying to lower the bar for anyone looking to build and share models that are:

- task-trained - models are trained to do very specific task(s) with only the required datasets (explicitly-overfitting for known use case(s) instead of generalized/underfitting and having to wait to search through the entire internet to respond)

- modular - because the models only know about these smaller, task-trained dataset(s) the models will hopefully be faster at responding than today’s

- device-native - models are targeted for constrained environments that do not have gpu clusters, excess ram/cpu/storage/connectivity

- open source - since the weights are public domain, the derived intelligence should be public domain

- type of foundational model: weight-derived (blog: https://matlok.ai/ docs: https://bampe-weights.readthedocs.io/en/latest/)

I believe there may be some math/stats proofs that are missing (see the smooth-brain), but I want to push this modular/lego block like approach in hopes of reaching parity with a new generation of foundational models. One of my fundamental assumptions is that if I substantially-reduce the training corpus, a smaller/overfit model will hopefully be faster than a traditionally-trained large language model. The initial, slimmer model building process should also hopefully run on IoT devices and plug-in to existing distributed architectures (device-native).

What are you doing next - Initial use case?

I need help with a good initial use case (please let me know if you have better ones!). Current best idea of the week/last 3 days: I believe this approach and knowledge system of assembling weight-derived models should be shared so we can ensure concepts like an “ethical watermark” for Asimov’s Laws of Robotics are always present in all pre-trained AI model weights using cosine similarity searches. As this approach matures, we should be able to audit and report on what these models know, and I think we need a community-driven project to tackle it.

tl;dr

It’s early days, but I believe we can reuse existing AI tensor weights complemented with smaller “fine-tuning”-sized datasets to build small, high-quality fast generative models.

PoC repository:

https://github.com/matlok-ai/bampe-weights

Inputs

Extracted tensor weight from a GPT2 model.safetensors file:

extracted tensor weight

https://raw.githubusercontent.com/matlok-ai/gen-ai-datasets-for-bampe-weights/main/docs/images/safetensors/gpt2/in/idata__h.0.attn.c_attn.weight.png

Outputs

Predicted weight-derived file for use in a new type of foundational generative AI model

This screenshot is an example of \“trained weights\” for a new type of foundational generative AI model (referred to as a weight-derived model)

https://raw.githubusercontent.com/matlok-ai/gen-ai-datasets-for-bampe-weights/main/docs/images/safetensors/gpt2/out/gpu-generated_predicted-model-weights__layer__h.0.attn.c_attn.weight__chunk__0.png

Thanks for the help, guidance and assistance staying up with the insane speed of this ecosystem!

Reach out if you want more info - my email is in the profile

  • buildinstuff5432OPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Wow. This project is off to a great start and is reusing today’s generation of ai models/techniques to explore alternative models for a new generation.

    I am excited to see I’m not the only one fired up about addressing today’s model limitations like context size/window (https://github.com/arthurwolf/llmi/blob/main/README.md#recursive-redaction). Once we pop the weights out, we can reuse the weights in a new model configuration that has a larger context size (hopefully haha!).

    Are you thinking about using a multimodal transformer for the “Thinking with code” section or something new and exciting I’ve never heard of (https://github.com/arthurwolf/llmi/blob/main/README.md#thinking-with-code)? I like the “Checking with Accuracy” section too (https://github.com/arthurwolf/llmi/blob/main/README.md#checking-for-accuracy), this is what I’m thinking of as a watermark for verifying a model’s at-rest weights have “trained knowledge” kind of like security scanning container images at rest in the CICD space vs verification the model answered the question(s) correctly while running/in-memory.

    I could keep going, but what do you think are the next steps for your project?