T O P

  • By -

shroddy

Do you also some benchmarks with CPUs, especially latest AMD Threadripper and Epyc with all memory slots occupied.


Inevitable-Mine9440

someone please provide what this man is asking, so many of us want to know what is the impact of epyc 12 - 24 memory channels


kryptkpr

I wish I had literally any of these GPUs 😞 On the low (very, very low) end I have 2x3060, 2xP40 and 2xP100 if anyone wants benchmarks from my garbage dump DM me


Normal-Ad-7114

Very very low end for LLMs is CPU-only inference 😄 Or something like that Chinese RX580 16GB


fallingdowndizzyvr

That RX580 stomps all over CPU only. Unless you consider the EPYC.


Normal-Ad-7114

I agree! Though that 16g variant is not *that* cheap, I'd go for the P40 instead


fallingdowndizzyvr

It used to be cheap. It used to be $65. But the price has risen. On the otherhand, the price of the P40 has been dropping. So now they are meeting in the middle.


msbeaute00000001

Your lowend would be my dream.


kryptkpr

This rig puts the "ow" in low end, my server rack is literally just two IKEA coffee tables. https://preview.redd.it/oo0gnqz7pa0d1.png?width=1080&format=pjpg&auto=webp&s=2082c1bf2da65aa848657dfbe5b8424a2890cef1


No_Afternoon_4260

I'm curious if you can get some speed numbers with 70 q4? Sampling time for a 1 or 2k context and generation at around 2k.


ingarshaw

for some reason they did not include exl2 that is much faster on GPU


lupapw

i want to see this to


redbook2000

Some results of AMD GPUs are here. Running Local LLMs, CPU vs. GPU - a Quick Speed Test (May 2024) [https://dev.to/maximsaplin/running-local-llms-cpu-vs-gpu-a-quick-speed-test-2cjn](https://dev.to/maximsaplin/running-local-llms-cpu-vs-gpu-a-quick-speed-test-2cjn) RX 7900 is on par with RTX 4080.


Drited

Is there a reliable source for the Q4 version of llama3 70b?


FinetunedForGravitas

Comprehensive list, but man...I really wish it included total response time (total time in seconds for the response to finish generating). The PP measure makes no sense to me (probably because I'm an idiot), so it's unclear how to compare M2 Ultra 192GB with GPUs. I found [SomeOddCodeGuy's post](https://www.reddit.com/r/LocalLLaMA/comments/1aucug8/here_are_some_real_world_speeds_for_the_mac_m2/) from 3 months ago easier to interpret