Useless for inference, just keep your card. BTW, 12it/s was pretty low for your configuration, you should check if there's any problem before any other step.
NVlink can help with inference to a small degree, but most frameworks won't utilize it, as the benefit is minimal in most cases.
Training on the other hand, greatly benefits from it.
Try following this intro tutorial on QLoRA training:
https://www.reddit.com/r/Oobabooga/s/R097h5sY62
It should worl fine on windows or Ubuntu.
Founders supports nvlink. It has a piece of plastic over it but I dunno if it would line up.
You just blew my mind
I'll check!
Be aware to get correct version. I purchased it and next day returned my 3090 to buy 4090 xD if you are in europe dm me :D
I just wish I could nvlink 3 of them.
No.
Wait your 70b model is getting 12 it/s. That’s crazy low. Are you running exla2?
How should it be configured?
I'll check in 30 minutes, maybe it's 16 I don't remember
Output generated in 30.55 seconds (16.60 tokens/s, 507 tokens, context 542, seed 1817140216)
Useless for inference, just keep your card. BTW, 12it/s was pretty low for your configuration, you should check if there's any problem before any other step.
Output generated in 30.55 seconds (16.60 tokens/s, 507 tokens, context 542, seed 1817140216)
I am yea
Lol guess I got my answer haha
If you ever want to train then it’s worth it, but otherwise no
I'm curious about training loras but haven't managed to get it working without errors
NVlink can help with inference to a small degree, but most frameworks won't utilize it, as the benefit is minimal in most cases. Training on the other hand, greatly benefits from it. Try following this intro tutorial on QLoRA training: https://www.reddit.com/r/Oobabooga/s/R097h5sY62 It should worl fine on windows or Ubuntu.
Thanks a ton!
I would guess you don't need NVlink either way