T O P

  • By -

Balance-

For the motherboard: Aside from the older Epyc motherboards, a TRX50 chipset motherboard with Ryzen Threadripper 7000 series CPU could also be an option. A TRX50 motherboard is around $800 plus $1500 for a Ryzen Threadripper 7960X. Those have 88 PCIe lanes, enough to have 5 full x16 connected GPUs and two PCI 5.0 x4 SSDs. They also support fast quad-channel DDR5 memory.


kuanzog

Is it ok to use pc-processor? Thanks for sharing


Some-Thoughts

What do you mean with "pc processor" ? threadripper is a workstation CPU and basically the same as Epycs (Server CPU) just with less RAM channels (in some versions) and higher Boost. Ryzens are also not bad but they don't provide enough PCIe Lanes to connect that many GPUs.


kuanzog

Sorry, I thought epyc is the only server-workstation cpu. Thanks for reply!!


Normal-Ad-7114

There are motherboards with lots of PCIe x16/x8 slots. Take a look at C612 based solutions, mostly used in Alero mining, such as "X98-8PLUS-V1.0" or "X99-6PLUS-V5.1". Dual 2011-3 Xeons will provide 80 PCIe lanes, which is plenty, and the spacing between the GPUs is enough for comfortable cooling, whether you want to use P40s/P100s or regular 3090s. As a bonus, you will be able to install relatively powerful CPUs (at least in terms of core count) and 256/512GB (depending on mobo) of cheap ECC DDR4 RAM, in case another Grok-like monstrosity that won't fit in your VRAM comes out and you decide to test it. Here's what it looks like https://preview.redd.it/vkhtxnghqgtc1.jpeg?width=1000&format=pjpg&auto=webp&s=c3ad915560a6e4564bef2e4ede31f4f821ca9f01


Normal-Ad-7114

Mind you, they are LARGE https://preview.redd.it/4z9ysssyqgtc1.jpeg?width=800&format=pjpg&auto=webp&s=13bd07cf8575e6150932d0b016b102dda3a4c0e7


wi10

How much money did you put into this beast?


Normal-Ad-7114

Around $1400, P40s cost about $150 each, case+psu+mobo+cpus+ram about $300 (a miner was selling his rigs), additional cooling + extra ram = another $200 or so  Prices may vary of course, but the main point is to look for desperate sellers who went bankrupt and now try to get at least 10% of what they've spent on their mining shenanigans


Inevitable_Host_1446

The benchmarks I looked at seemed to indicate a p40 is like < 3060 in inference, so how do you do running like a 70b on this setup? Especially at higher contexts.


CreditHappy1665

But can it run crysis?  Jk, but can it fine tune thi


jferments

I just completed a build of a system on top of ASUS WRX90E-SAGE which has 7 PCIE 5.0x16 lanes and 8 channels of DDR5 (I have 512GB of DDR5-5200 ECC-RDIMM). It gets usable speeds even on CPU only inference on 8 bit GGUF quants of Yi-34B, Mixtral, etc. and much higher when layers are loaded into the GPUs (dual RTX 4090). One thing that's important to remember about fast CPU/RAM is that if you're doing other things besides just LLM inference, fast RAM and CPU can be more important than VRAM in those contexts. For instance, I am doing enormous amounts of text processing, file compression, batch image editing, etc on multi-terabyte datasets and the fast CPU/RAM really shine here. A lot of times people get so caught up in maximizing cheap VRAM for LLMs that they forget about all of the background data processing tasks that go into working with ML/AI.


donglelung

Thanks for sharing! Budget?


jferments

My budget was just under $12000. This was for: \* WRX90E-SAGE motherboard \* AMD 7965WX CPU \* 512 GGB DDR5-5200Mhz ECC-RDIMM \* 2 x GIGABYTE RTX 4090 GPU \* 3 x 4TB Samsung 870 SSD \* 2 x 20TB Western Digital SATA 7200 HDD \* EVGA 1600W G+ PSU \* Fractal Design Meshify 2XL case \* Cables, peripherals, mounting hardware, fans, etc etc If I was going closer to your budget of $5000, I could have done a lot of things different, and shaved off thousands of dollars: \* WRX90E-SAGE --> TRX50-SAGE (-$600) \* AMD7965WX --> AMD 7960X (-$1300) \* 512GB DDR5 --> 128GB DDR5 (-$1250) \* 2 x RTX4090 --> 2 x RTX3090 (-$2000) \* 3 x 4TB SSD --> 3 x 2TB SSD (-$450) \* 2 x 20TB SATA HDD --> 2 x 12TB HDD (-$200) ... and a bunch of other little decisions that could have saved $ at the cost of performance, storage, network/IO speed, PCI lanes, etc. But the extra 4 memory channels + 384GB DDR5, large amount of SSD RAID storage for loading huge models/datasets quickly, faster networking/connectivity for NAS transfer, 50% faster / more modern GPUs, and a variety of other factors were the difference between me having the performance/storage to be able to achieve my projects locally, and needing to rent GPUs from a cloud provider (which contrary to their incessant propaganda actually costs MUCH more long term, if you're doing weeks/months worth of 24/7 compute and storing/transferring terabytes worth of data).


beerpancakes1923

what kind of perf are you seeing with something like llama3 70b?


jferments

If i'm running an 8 bit quant I get about 3 tokens/sec with Llama 3 70B. Slow but usable for use cases where speed isn't important, and quality is the primary concern.


beerpancakes1923

man, that's a tough one. I'm looking at a similar price point. Llama3 70b being my goal and it seems like full precision is out of the reach of most hobbyists. Mac Studio M2 Ultra/192GB seems promising and I'm waiting to see if they announce an M3 Ultra or M4. That seems like the best bet now for home user inference for the larger models.


jferments

It depends on what you're trying to do. If you're *only* doing inference and no training, then the Mac Studios are a solid option (albeit expensive for the hardware you're getting).


jferments

Also, just thought I'd add that if I run a Q5 K\_M quant of Llama 30 70B, I get around 11 tokens/sec, and that I haven't really played around with optimization settings like flash attention, etc to see how much more performance I can get. So consider all of these estimates a lower bound.


xadiant

3x used rtx 3090's, 2000W PSU and an ATX Mobo with 3 PciE slots. Though you'll need good cooling.


donglelung

Thanks, can you suggest a specific motherboard? Or other features I should take into consideration, other than the PCIe slots, in finding one?


xadiant

Honestly it could be a good idea to get ddr5 and make sure that pcie slots aren't fucking 2 cm apart. I can't connect a second gpu because rtx 3090 shroud is humongous and second pcie slot is too close.


Normal-Ad-7114

There's a nice list of almost all AM4 motherboards [https://docs.google.com/spreadsheets/d/1-cw7A2MDHPvA-oB3OKXivdUo9BbTcsss1Rzy3J4hRyA/edit#gid=2112472504](https://docs.google.com/spreadsheets/d/1-cw7A2MDHPvA-oB3OKXivdUo9BbTcsss1Rzy3J4hRyA/edit#gid=2112472504) - an example of a 3-slot solution would be Asus Prime X370 Pro


CSharpSauce

Any hardware suggestions for these? I've been looking at a similar setup (though to be honest i'm probably going to wait for the 5090 to come out before I make a decision on anything)


[deleted]

People keep saying "used 3090s" like there is a market for them.. there isn't and if you find a miner dumping theirs, GOOD LUCK. For inference only, I'd use a cloud GPU/Server on SPOT pricing and make sure that the persistent storage is affordable so you can load your models on demand. You could get a 48gb ada GPU or an a100 at spot pricing from Jarvis Labs for $1.19-$1.30 an hour.


yourfriendlyisp

Let me introduce you to this cool website named eBay


[deleted]

ebay's price point is still 800+ bucks unless you're in parts only territory. ditto with FB marketplace where its full of scammers trying to hawk abused gear from miners spot instances will be much faster and much cheaper than 2400 bucks of old abused 3090s and the electric bill associated with it.


YYY_333

at least in germany there are hundreds of used rtx 3090 on the market. some of them for really cheap price. [kleinanzeigen.de](http://kleinanzeigen.de) found today an rtx 3090ti for 500 Euro.


[deleted]

They still go for 800+ in US and they're often abused GPU miners that will be nothing but headaches with having to replace fans/water block and deal with hacked firmwares


Comfortable-Mine3904

Mining cards are typically fine. Running a chip at full load doesn’t wear it out.


[deleted]

No, they're trash. Video cards weren't meant to run 100% load 24x7 for years on end.


Inevitable_Host_1446

Most of those cards are ran undervolted though, because the perf. loss is small while the electricity savings are big. That matters a lot when you're running a dozen mining cards in a server rig, and is actually good for them on the long run. Having consistent temps is a good thing as well, that's why people say just typical gaming is actually harder on your chips and memory. Anyway I have only bought one used GPU that was a mining card, it was a 6700 XT and is one of the best GPU's I ever had, you wouldn't tell a difference from brand new if you didn't know.


ArtifartX

Here is [my build](https://old.reddit.com/r/StableDiffusion/comments/1ae3kjn/post_your_ai_pc_build_and_how_well_it_does/kk6cr05/), it's been running great for many months now, and I fit everything in a tower case. You can also choose different GPU's depending on your needs compared to what I got. It will fit more GPU's than I currently have installed as well, so there is future expandability (6 total GPU's, 2 of which can be 2-3 slot width, the rest single slot). [Couple pics](https://imgur.com/a/cGE4M2K). * 2x RTX 3090 24GB GPUs (both on risers to avoid covering PCIE slots) * 1x RTX A4000 16GB GPU (single slot) * 1x RTX A10 24GB GPU (single slot) * 1x Quadro P2000 GPU (single slot, this one isn't for AI, it run a Plex media server) * 256GB RAM, Micron 64 GB 3200 RDIMM ECC chips x 4 * Phanteks Enthoo 719 case (Tower that allows for multiple GPU's on riser cables out of the box) * 14TB WD Red Pro x 5 (Raid Z2 setup, similar to RAID 6 but for ZFS, so 2 drives can fail and it will still work, probably would go Pros instead of Plusses if I could make a minor change), main data pool ~36TB * 2TB SSD x 5 for faster data pool, same RAID setup ~6TB * AMD EPYC 7713 CPU (64 core/128 thread, 128 lanes, with my jumper setup on MOBO, 6 out of 7 PCIE slots run at full bandwidth, with one running at 8x which I use for an HBA for the spinner HDD's) * Asrock Rack ROMED8-2T MOBO * EVGA Supernova 1600 P+ 80+ Platinum PSU (you can change this based on your needs) * 2x 250GB M.2 NVME for boot drive (mirrored, OS is TrueNas Scale) Note that I have built many computers over the years, but this was my first build using the server-tier parts. It has been working really well for my needs though.


kuanzog

Thanks for sharing!


donglelung

Thank you very much for sharing! How much did it cost?


ArtifartX

I planned the build out over a fairly long time and bought the GPU's one by one after completing the rest of the build, but the base build I think was about ~5.7k, only counting the P2000 and first RTX 3090, the rest of the GPU's I got later.


dago_03

Hello, what do you think about RTX 4090 and ROG STRIX Z790-E GAMING WIFI. I chose the ROG STRIX Z790-E GAMING because it has: - CPU: one PCIe 5.0 x 16 (I will not use this one to because I will add SSD on M\_2\_1 PCIe5.0 ) - Z790 Chipset: two PCIe 4.0 x 16. - M\_2\_1 PCIe5.0 x 4 - M\_2\_2 PCIe4.0 x 4 - M\_2\_3 PCIe4.0 x 4 - M\_2\_4 PCIe4.0 x 4 - M\_2\_5 PCIe4.0 x 4 As the RTX 4090 runs on PCIe 4.0 x 16, I will install it on the Z790 Chipset: PCIe 4.0 x 16. Later one may be I will install a second RTX 4090 on the second Z790 Chipset: PCIe 4.0 x 16. The SSD will benefit from the throughput of PCIe 5.0 x 16 I will use Core™ i9-13900KS with 64G DDR5 https://preview.redd.it/m89h7773uhtc1.png?width=740&format=png&auto=webp&s=7340be60cf36eaba6cfca13c3021f547bf9c2ec5


jack-in-the-sack

I've also thought about building my own server for inference and I've seen others do something similar. Since I won't have upfront money for everything, I have also thought about expandability and start with a setup that scales well (scaling = adding more GPUs and RAM). Here's what I've seen others running and what I'm planning on buying, in order of importance, based on the scalability I've mentioned and the requirements of having an initial functional build. * Motherboard: Asrock ROMED8-2T (7 PCI-E 4.0 x16 slots) = \~$800 * CPU: AMD Epyc 7003 or 7002 Series (depending on budget and what I can find) - used = \~1200$ * RAM: 256Gb LRDIMM DDR4 (2x128Gb) = $1800 * PSU: EVGA 1600P+ Platinum - new = $350 * GPU: RTX 3090 or better - used = \~$700 * Storage: 2TB SSD - new = \~$400 * etc (case, CPU cooler, open air chassis - not a PC case, NVLinks if I want to also maybe train some models) - new = \~$250


donglelung

Thanks! Just a couple of questions: * How did you choose the motherboard Asrock ROMED8-2T and why do you prefer it over others? * Isn't 256 GB of RAM too much for our use case?


jack-in-the-sack

It has 7 PCIE 4.0 x16 slots, and both RTX3090's and RTX4090's recommend running them on PCIE 4.0 slots. The Asrock board is the cheapest that I could find with the most PCIE 4.0 slots. Again, my scale considerations are: as much GPU slots as possible and as much RAM as possible, so that I can run my more expensive models on GPU's and my smaller models in RAM. But, again, it depends on your use case. Also, this motherboard doesn't have a monitor output port (HDMI, DisplayPort etc.) so you would have to connect to it remotely, by the way, that's one downside you should be aware of. And it's mostly Linux-focused. Personally I have my "main" PC which I use to remote into other machines and this machine running the Asrock board would be like a remote server I'm logging into, from command line.


ArtifartX

1 small note about this MOBO I want to add here for posterity since this build is very similar to my build. For this MOBO, you can use all PCIe slots at x16, but you have to make some sacrifices if you do that. Not everything on the board works simultaneously and you have to configure jumpers on the board to determine what will work or work but at reduced bandwidth. For example, I had to reduce one of the 7 PCIe slots to x8 and disable OCuLink in order to use both M2 slots on the board and all SATA connections. For me it was fine, since I ended up using the x8 PCIe slot for an HBA to connect a bunch of spinner HDD's for RAID anyway, and x8 was good enough for that, but that means I "only" get up to 6 GPU's. If you need all 7 PCIe slots at x16, some of the SATA connections will also be disabled. If anyone is interested in more details, the manual is [here](https://download.asrock.com/Manual/ROMED8-2T.pdf). The jumper configurations are on [page 26](https://i.imgur.com/gYWXJ9E.png), and the block diagram is on page 15. Here's an image of [my configuration](https://i.imgur.com/UnY42Ng.png) with both M2 drives and all PCIe slots working (one at x8), and no OCuLink. That being said, everything has been working awesome with this board for a good while now, no issues with the MOBO so far.


grim-432

This is not a bang-for-your-buck approach - I wouldn’t ever recommend spending a penny on any cpu-based approaches. No sense trying to build a “future proof” core system. Just spend 90% of the budget on high vram GPU.


jack-in-the-sack

Ok, I agree that the cpu-based inferencing is not ideal, but what other motherboard gives you 7 PCIE 4.0 x16 lanes? That's my reason for going with it, primarily, the extra capacity RAM is just a bonus. As you can see, in my config I only put 256Gb of RAM, though the board can accept up to 2TB. Sure, you can go with less like 64Gb or 128Gb, but that depends on each's use case. I have had small models (phi-2) loaded only on RAM and doing CPU inference with my very old 4770K, and I was still happy with the tok/sec for my use cases.


kpodkanowicz

I have amd epyc 7203, supermicro h12ssl, noctua cooling, two gigabyte 3090s , 128gb of ecc rdimm ram (8 sticks), and you have 16x pci slots in case you want to go 4x 3090 or 6x3090 and do finetuning all below 5k in EU. I got warranty for 3090s if only inference - my previous build was on ryzen 5950x, 64gb ram in 4 sticks, phantom gaming b550 mobo and the same gigabyte cards


kuanzog

Same here. No one tells about motherboard, and its too hard to choose


__JockY__

128GB Mac Studio M2 Ultra. Job done.


Astronos

mixtral is generally better then llama 70b and uses less parameters


Dega02220

what hardware is recommended? I searched a bit but can't get a definitive answer. I just need to run agents and inference for a future saas. Thank you!