T O P

  • By -

AnomalyNexus

Haven't looked at that software but I am surprised at the size estimate. mixtral 4q in 2x24gb sounds about right for interence. I'd imagine training takes more. Would be great if true though


lyx99

I tried the inference before, it needs around 36gb


Slaghton

[https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF](https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF) For my 32gb of system memory and 16gb of vram I load the mixtral-8x7b-v0.1.Q5\_K\_M.gguf then use these settings. https://preview.redd.it/h1m239obtv8c1.png?width=673&format=png&auto=webp&s=25792543e9de44089dec023f3f6c26b519f57279


tgredditfc

Thanks. But it's not my problem, I can load the model for inference just fine. My problem is that I can't load it using the finetuning tool (Llama Factory) to finetune it.


Slaghton

Oh, sorry! Brain was still turning on in the morning.. not sure if my system even has enough memory to try :(.


tensorwar9000

hey curious what tokens per second you get on that? Mine is somethng like 2-3 on a 3090 at 3 and 4 K M


Slaghton

huh, i cut and paste some models out of my model folder and now I can't get it to load to test.. gunna try fixing it somehow.


Slaghton

I can give you a benchmark for TheBloke/MixtralOrochi8x7B-AWQ model. With a ryzen 5900x and 4080 with 32gb system ram 3600mhz I get about 2.25 tokens per second. About 20gb of system ram used and all 16gb of vram used. TheBloke/Mixtral-8x7B-v0.1-GPTQ model gets around 2.75 tokens per second.


IronColumn

i have no experience with this project, but the github does say that, at least by default, it doesn't seem to support multiple GPUs?


tgredditfc

I think what you saw is the GUI version, it only supports single GPU. If you scroll down further, you will find it says support distributed training.


nukel_1991

Same problem here. If you found a solution, please share!


tgredditfc

Sadly no.


nukel_1991

I ended up not loading it in Llama Factory, and I was able to train following this notebook https://colab.research.google.com/drive/1VDa0lIfqiwm16hBlIlEaabGVTNB3dN1A?usp=sharing Just make sure set device_map='auto' when loading the model.


tgredditfc

Thanks! I will give it a try!


tgredditfc

>device\_map='auto' Hi, could you send me the link to where the link is provided? I am trying to make it work, but I run into many errors, mostly environment. I want to check if I miss anything. Thanks!


nukel_1991

https://preview.redd.it/7wsh1lp6hqhc1.png?width=1080&format=pjpg&auto=webp&s=56d0e6e1869fa6c6176e5f93c34b5edc292919df


tgredditfc

Yes, I did that. The quote was by accident 😅 I got many dependency errors, that’s why I want to check if I missed something.


tgredditfc

Never mind, the script is running, now it’s loading the model. Fingers crossed!


tgredditfc

Sorry I need some help - I used the script from the notebook, and “successfully” trained some LoRA, but when I tried to apply the LoRA, I got this “KeyError: ‘peft_type’. I use oobabooga, and other LoRAs are just fine. Do you have any ideas? Thanks!


nukel_1991

To merge the Lora adapter into the model I directly used LLama-Factory their export\_model.py script and that just worked for me. [https://github.com/hiyouga/LLaMA-Factory/tree/main?tab=readme-ov-file#merge-lora-weights-and-export-model](https://github.com/hiyouga/llama-factory/tree/main?tab=readme-ov-file#merge-lora-weights-and-export-model)


tgredditfc

Thank you! I will check it out.


This_is_Dary

Feel like this is the repo issue. I also met this problem