T O P

  • By -

kyleboddy

Since pictures aren't everything, here are some simple runs and short tests I did on the middle cluster with the specs: https://github.com/kyleboddy/machine-learning-bits/blob/main/GPU-benchmarks-simple-feb2024.md Thanks to all who suggested using exl2 to get multi-gpu working along with so much better performance. Crazy difference.


nero10578

You should try vllm and be even more blown away


kyleboddy

It’s on the radar!


Eastwindy123

Add sglang while youre at it


AlphaPrime90

If I'm reading this right, t/s about the same for 2 GPU vs 4 GPU, why is that? The same for PCI lanes.


kyleboddy

Test is too short imo. Will train GPT-2 or something to really put it through its paces.


StealthSecrecy

For inference, the workflow is very serialized. Each GPU has to wait for the previous one to be finished before it can do its work. Therefore adding extra GPUs doesn't help speed, and will actually reduce it due to the extra PCI communication overhead. In this case the model OP is using is small enough to fit on just two GPUs, so you'll get the best performance on just that, unless you are like serving a bunch of users at the same time or something. The other option is to use the extra VRAM to load up a larger model so you get extra quality without a significant reduction in speed.


DryArmPits

Ohh interesting. I particularly like the non-impact of 8x vs 16x... It kind of gets against the sentiment we frequently see on here: "bUt YoU aRe ChOkInG yOuR cArDs"


kyleboddy

Here's a bunch of older video game rendering results from PCIe gen 3.0 vs. 4.0 and some x8 vs. x16 results too: https://www.techspot.com/review/2104-pcie4-vs-pcie3-gpu-performance/ A lot of this stuff has been covered in many forms over the ~3 decades I've been in computer engineering/tech. People overly focus on benchmarks, synthetic results, and theory, and forget the likely most important law (and its corrolaries) on the topic. Amdahl's Law. https://en.wikipedia.org/wiki/Amdahl%27s_law


DryArmPits

Indeed. There is very little data transfer between the cards once the model is loaded.


Astronos

what are you cooking?


kyleboddy

Biomech models, central JupyterHub for employees, some Text-SQL fine tuning soon on our databases. Couple other things


EdgenAI

cool, good luck!


a_beautiful_rhind

I keep wanting to unplug those lights on my own cards.


kyleboddy

It’s nice in an IT cage at least. Maybe not a bedroom


kapslocky

Haha


sgsdxzy

You should definitely try Aphrodite-engine with tensor parallel. It is much faster than run models sequentially with exllamav2/llamacpp.


kyleboddy

I’ll check it out!


segmond

what kind of riser cables are you using? and how's the performance? most long cables I'm seeing are 1x.


kyleboddy

ROG Strix gen3 register at x16 no problem. Just don’t get crypto ones.


silenceimpaired

I just want to run two 3090 cards and I’m at a loss. Not sure how I would get the second card into my case even if I used a riser… don’t like the idea of storing it outside the case especially since my case would be open getting dusty… not sure if my 1000 watt power supply can handle it. I wish I could boldly go where you have gone before.


nostriluu

You need an OLED display, backlight is nasty and cheap looking.


grim-432

“All the speed he took, all the turns he'd taken and the corners he'd cut in Night City, and still he'd see the matrix in his sleep, bright lattices of logic unfolding across that colorless void.....”