Need help estimating if my speed is expected. Llama_index

Noxusequal · 2 年前

Noxusequal · 2 年前

I know that cuda is used vram is full and i get the message in the beginning. What is your hardware setup ?

Do you also use llama_index and then langchain or did you build it more or less from llama_cpp and langchain without llama_index ?

harrro · 2 年前

I’m using langchain with qdrant as the vector store.

VRAM is full

How is a 7B model maxing out your VRAM? A 7B model at 4bit and 4k context should not use the 12GB VRAM on a 3060.

Noxusequal · 2 年前

Its a 3060 laptop so only 6gb and model plus embedding etc. Is at like 5.8gb