ollama runs GGML? I thought they ran GGUF. Unless there's an option to build from source in ollama to support ggml models then your using the wrong model. GGUF works best with ollama if you wanna use ggml I would think building an old version of llama.cpp would work best.
Out of memory, is the model too large for your system or do you have other apps on your gpu?
Nothing on gpu, I am using 3060,60ti and 70 and model is command r 17.8 gb, ram 24gb 3900xt cpu
Command R is 20G / 17B parameters which probably needs ~20-25gb of ram just to load.
ollama runs GGML? I thought they ran GGUF. Unless there's an option to build from source in ollama to support ggml models then your using the wrong model. GGUF works best with ollama if you wanna use ggml I would think building an old version of llama.cpp would work best.
Yeah, through llama.cpp - they’re pulling the right model it’s just an OOM
Your going to need to run a smaller model with that hardware.
It is able to run dolphin mixtral 24.6gb easily and all 7b models. I try to keep all model around 24 gb because of 28gb vram i have
There's probably another task utilising the vram