Hardware question: combining a 3090 and a p40

Noxusequal · 2 years ago

okay thank you guys so this only really makes sense if i want to run different models on the different gpus or if i have something so big i need the 48gb of vram for and i can deal with the slower speeds :) thanks for the feedback

Noxusequal · 2 years ago

the question is is that still faster then system memory or not ?

Noxusequal · 2 years ago

Hardware question: combining a 3090 and a p40

Noxusequal · 2 years ago

slow prompt eval time on 3060 normal or some bug ?

Noxusequal · 2 years ago

slow prompt eval time on 3060 normal or some bug ?

Noxusequal · 2 years ago

Its a 3060 laptop so only 6gb and model plus embedding etc. Is at like 5.8gb

Noxusequal · 2 years ago

I know that cuda is used vram is full and i get the message in the beginning. What is your hardware setup ?

Do you also use llama_index and then langchain or did you build it more or less from llama_cpp and langchain without llama_index ?

Noxusequal · 2 years ago

Need help estimating if my speed is expected. Llama_index

Noxusequal · 2 years ago

Okay its working now i need to install nvcc seperatly and change the CUDA_HOME evironment. Also to install nvcc i needed to get the simlinks working manually. but with 15 minutes of google search i got it to work :D thank you all :)

Noxusequal · 2 years ago

jup that was part of it :) its working now tahnk you

Noxusequal · 2 years ago

iam looking to do something similar using RAG piplines might be usefull as far as i understand to give the model extra context about the sides you want to summarize.

https://agi-sphere.com/retrieval-augmented-generation-llama2/

maybe you already know all this but i am also new and just recently stumbled upon this :)

Noxusequal · 2 years ago

I did this :) should have specified but when reinstalling i set both flags as env variables again.

Noxusequal · 2 years ago

cant get cuda to work with llama-cpp-python in wsl ubuntu.

Noxusequal · 2 years ago

Beginner question: Is there any way to use quantized gguf models in python under windows since auto gptq doesnt work ? I swear i tired searching but did not find an answer.