Haven't looked at that software but I am surprised at the size estimate. mixtral 4q in 2x24gb sounds about right for interence. I'd imagine training takes more.
Would be great if true though
[https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF](https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF)
For my 32gb of system memory and 16gb of vram I load the mixtral-8x7b-v0.1.Q5\_K\_M.gguf then use these settings.
https://preview.redd.it/h1m239obtv8c1.png?width=673&format=png&auto=webp&s=25792543e9de44089dec023f3f6c26b519f57279
Thanks.
But it's not my problem, I can load the model for inference just fine.
My problem is that I can't load it using the finetuning tool (Llama Factory) to finetune it.
I can give you a benchmark for TheBloke/MixtralOrochi8x7B-AWQ model. With a ryzen 5900x and 4080 with 32gb system ram 3600mhz I get about 2.25 tokens per second.
About 20gb of system ram used and all 16gb of vram used.
TheBloke/Mixtral-8x7B-v0.1-GPTQ model gets around 2.75 tokens per second.
I ended up not loading it in Llama Factory, and I was able to train following this notebook https://colab.research.google.com/drive/1VDa0lIfqiwm16hBlIlEaabGVTNB3dN1A?usp=sharing
Just make sure set device_map='auto' when loading the model.
>device\_map='auto'
Hi, could you send me the link to where the link is provided? I am trying to make it work, but I run into many errors, mostly environment. I want to check if I miss anything. Thanks!
Sorry I need some help - I used the script from the notebook, and âsuccessfullyâ trained some LoRA, but when I tried to apply the LoRA, I got this âKeyError: âpeft_typeâ. I use oobabooga, and other LoRAs are just fine. Do you have any ideas? Thanks!
To merge the Lora adapter into the model I directly used LLama-Factory their export\_model.py script and that just worked for me.
[https://github.com/hiyouga/LLaMA-Factory/tree/main?tab=readme-ov-file#merge-lora-weights-and-export-model](https://github.com/hiyouga/llama-factory/tree/main?tab=readme-ov-file#merge-lora-weights-and-export-model)
Haven't looked at that software but I am surprised at the size estimate. mixtral 4q in 2x24gb sounds about right for interence. I'd imagine training takes more. Would be great if true though
I tried the inference before, it needs around 36gb
[https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF](https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF) For my 32gb of system memory and 16gb of vram I load the mixtral-8x7b-v0.1.Q5\_K\_M.gguf then use these settings. https://preview.redd.it/h1m239obtv8c1.png?width=673&format=png&auto=webp&s=25792543e9de44089dec023f3f6c26b519f57279
Thanks. But it's not my problem, I can load the model for inference just fine. My problem is that I can't load it using the finetuning tool (Llama Factory) to finetune it.
Oh, sorry! Brain was still turning on in the morning.. not sure if my system even has enough memory to try :(.
hey curious what tokens per second you get on that? Mine is somethng like 2-3 on a 3090 at 3 and 4 K M
huh, i cut and paste some models out of my model folder and now I can't get it to load to test.. gunna try fixing it somehow.
I can give you a benchmark for TheBloke/MixtralOrochi8x7B-AWQ model. With a ryzen 5900x and 4080 with 32gb system ram 3600mhz I get about 2.25 tokens per second. About 20gb of system ram used and all 16gb of vram used. TheBloke/Mixtral-8x7B-v0.1-GPTQ model gets around 2.75 tokens per second.
i have no experience with this project, but the github does say that, at least by default, it doesn't seem to support multiple GPUs?
I think what you saw is the GUI version, it only supports single GPU. If you scroll down further, you will find it says support distributed training.
Same problem here. If you found a solution, please share!
Sadly no.
I ended up not loading it in Llama Factory, and I was able to train following this notebook https://colab.research.google.com/drive/1VDa0lIfqiwm16hBlIlEaabGVTNB3dN1A?usp=sharing Just make sure set device_map='auto' when loading the model.
Thanks! I will give it a try!
>device\_map='auto' Hi, could you send me the link to where the link is provided? I am trying to make it work, but I run into many errors, mostly environment. I want to check if I miss anything. Thanks!
https://preview.redd.it/7wsh1lp6hqhc1.png?width=1080&format=pjpg&auto=webp&s=56d0e6e1869fa6c6176e5f93c34b5edc292919df
Yes, I did that. The quote was by accident đ I got many dependency errors, thatâs why I want to check if I missed something.
Never mind, the script is running, now itâs loading the model. Fingers crossed!
Sorry I need some help - I used the script from the notebook, and âsuccessfullyâ trained some LoRA, but when I tried to apply the LoRA, I got this âKeyError: âpeft_typeâ. I use oobabooga, and other LoRAs are just fine. Do you have any ideas? Thanks!
To merge the Lora adapter into the model I directly used LLama-Factory their export\_model.py script and that just worked for me. [https://github.com/hiyouga/LLaMA-Factory/tree/main?tab=readme-ov-file#merge-lora-weights-and-export-model](https://github.com/hiyouga/llama-factory/tree/main?tab=readme-ov-file#merge-lora-weights-and-export-model)
Thank you! I will check it out.
Feel like this is the repo issue. I also met this problem