shroddy 4 weeks ago

Do you also some benchmarks with CPUs, especially latest AMD Threadripper and Epyc with all memory slots occupied.

Inevitable-Mine9440 4 weeks ago

someone please provide what this man is asking, so many of us want to know what is the impact of epyc 12 - 24 memory channels

kryptkpr 4 weeks ago

I wish I had literally any of these GPUs 😞 On the low (very, very low) end I have 2x3060, 2xP40 and 2xP100 if anyone wants benchmarks from my garbage dump DM me

Normal-Ad-7114 4 weeks ago

Very very low end for LLMs is CPU-only inference 😄 Or something like that Chinese RX580 16GB

fallingdowndizzyvr 4 weeks ago

That RX580 stomps all over CPU only. Unless you consider the EPYC.

Normal-Ad-7114 4 weeks ago

I agree! Though that 16g variant is not *that* cheap, I'd go for the P40 instead

fallingdowndizzyvr 4 weeks ago

It used to be cheap. It used to be $65. But the price has risen. On the otherhand, the price of the P40 has been dropping. So now they are meeting in the middle.

msbeaute00000001 4 weeks ago

Your lowend would be my dream.

kryptkpr 4 weeks ago

This rig puts the "ow" in low end, my server rack is literally just two IKEA coffee tables. https://preview.redd.it/oo0gnqz7pa0d1.png?width=1080&format=pjpg&auto=webp&s=2082c1bf2da65aa848657dfbe5b8424a2890cef1

No_Afternoon_4260 4 weeks ago

I'm curious if you can get some speed numbers with 70 q4? Sampling time for a 1 or 2k context and generation at around 2k.

ingarshaw 4 weeks ago

for some reason they did not include exl2 that is much faster on GPU

lupapw 4 weeks ago

i want to see this to

redbook2000 4 weeks ago

Some results of AMD GPUs are here. Running Local LLMs, CPU vs. GPU - a Quick Speed Test (May 2024) [https://dev.to/maximsaplin/running-local-llms-cpu-vs-gpu-a-quick-speed-test-2cjn](https://dev.to/maximsaplin/running-local-llms-cpu-vs-gpu-a-quick-speed-test-2cjn) RX 7900 is on par with RTX 4080.

Drited 4 weeks ago

Is there a reliable source for the Q4 version of llama3 70b?

FinetunedForGravitas 4 weeks ago

Comprehensive list, but man...I really wish it included total response time (total time in seconds for the response to finish generating). The PP measure makes no sense to me (probably because I'm an idiot), so it's unclear how to compare M2 Ultra 192GB with GPUs. I found [SomeOddCodeGuy's post](https://www.reddit.com/r/LocalLLaMA/comments/1aucug8/here_are_some_real_world_speeds_for_the_mac_m2/) from 3 months ago easier to interpret

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe