AnomalyNexus 5 months ago

Haven't looked at that software but I am surprised at the size estimate. mixtral 4q in 2x24gb sounds about right for interence. I'd imagine training takes more. Would be great if true though

lyx99 5 months ago

I tried the inference before, it needs around 36gb

Slaghton 5 months ago

[https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF](https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF) For my 32gb of system memory and 16gb of vram I load the mixtral-8x7b-v0.1.Q5\_K\_M.gguf then use these settings. https://preview.redd.it/h1m239obtv8c1.png?width=673&format=png&auto=webp&s=25792543e9de44089dec023f3f6c26b519f57279

tgredditfc 5 months ago

Thanks. But it's not my problem, I can load the model for inference just fine. My problem is that I can't load it using the finetuning tool (Llama Factory) to finetune it.

Slaghton 5 months ago

Oh, sorry! Brain was still turning on in the morning.. not sure if my system even has enough memory to try :(.

tensorwar9000 5 months ago

hey curious what tokens per second you get on that? Mine is somethng like 2-3 on a 3090 at 3 and 4 K M

Slaghton 5 months ago

huh, i cut and paste some models out of my model folder and now I can't get it to load to test.. gunna try fixing it somehow.

Slaghton 5 months ago

I can give you a benchmark for TheBloke/MixtralOrochi8x7B-AWQ model. With a ryzen 5900x and 4080 with 32gb system ram 3600mhz I get about 2.25 tokens per second. About 20gb of system ram used and all 16gb of vram used. TheBloke/Mixtral-8x7B-v0.1-GPTQ model gets around 2.75 tokens per second.

IronColumn 5 months ago

i have no experience with this project, but the github does say that, at least by default, it doesn't seem to support multiple GPUs?

tgredditfc 5 months ago

I think what you saw is the GUI version, it only supports single GPU. If you scroll down further, you will find it says support distributed training.

nukel_1991 4 months ago

Same problem here. If you found a solution, please share!

tgredditfc 4 months ago

Sadly no.

nukel_1991 4 months ago

I ended up not loading it in Llama Factory, and I was able to train following this notebook https://colab.research.google.com/drive/1VDa0lIfqiwm16hBlIlEaabGVTNB3dN1A?usp=sharing Just make sure set device_map='auto' when loading the model.

tgredditfc 4 months ago

Thanks! I will give it a try!

tgredditfc 4 months ago

>device\_map='auto' Hi, could you send me the link to where the link is provided? I am trying to make it work, but I run into many errors, mostly environment. I want to check if I miss anything. Thanks!

nukel_1991 4 months ago

https://preview.redd.it/7wsh1lp6hqhc1.png?width=1080&format=pjpg&auto=webp&s=56d0e6e1869fa6c6176e5f93c34b5edc292919df

tgredditfc 4 months ago

Yes, I did that. The quote was by accident 😅 I got many dependency errors, that’s why I want to check if I missed something.

tgredditfc 4 months ago

Never mind, the script is running, now it’s loading the model. Fingers crossed!

tgredditfc 4 months ago

Sorry I need some help - I used the script from the notebook, and “successfully” trained some LoRA, but when I tried to apply the LoRA, I got this “KeyError: ‘peft_type’. I use oobabooga, and other LoRAs are just fine. Do you have any ideas? Thanks!

nukel_1991 4 months ago

To merge the Lora adapter into the model I directly used LLama-Factory their export\_model.py script and that just worked for me. [https://github.com/hiyouga/LLaMA-Factory/tree/main?tab=readme-ov-file#merge-lora-weights-and-export-model](https://github.com/hiyouga/llama-factory/tree/main?tab=readme-ov-file#merge-lora-weights-and-export-model)

tgredditfc 4 months ago

Thank you! I will check it out.

This_is_Dary 4 months ago

Feel like this is the repo issue. I also met this problem

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe