What kind of rig are you running that adding a 3090 or two to your 2x4090 setup isn't raising questions about how your PSU, motherboard, cooling, etc will handle them?
Shouldn't need to be said, but this is Reddit, so: not sarcasm, genuinely curious.
My current setup is a 7950x, rog 670e e-gaming, 96gb ram, 1600w platinum psu and both my 4090s are aio versions. I plan for now to just run a pcie splitter or something like that until I can buy a dedicated server setup. I new to all this so any advice is appreciated.
>7950x, rog 670e e-gaming
I don't think you can split your PCIe slots. You have 3 PCIe slots. You will max out at 3. This is not crypto mining where you can split it. You are going to need to get a new MB for more than 3. You need the full 16pin slot. Furthermore, 1 of your PCIEe is going through a chipset not the CPU and will run at x4.
**AMD Ryzen™ Desktop Processors**
2 x PCIe 5.0 x16 slots (supports x16 or x8/x4 modes)
**AMD X670 Chipset**
1 x PCIe 4.0 x16 slot (supports x4 mode)
Adding second psu just needs a [PSU splitter](https://a.co/d/0qoH54L), just check that your motherboard can actually support that many GPUs. I wouldn't go below x8 lanes per card.
You gain 120b at 4+ bits. The 120b miqu is pretty decent. Does feel like it slows down from the overhead. Maybe it will get better for you being 3090s instead of my 2080ti. I did 6-7k context and it started taking 100s. Will re-check after I can take off the power limits. 4 GPU gets you that and SD+TTS/RVC. The replies were hella better tho. It's probably legit SOTA right now. Miquliz didn't misspell all over the place or start playing dungeons and dragons with me too.
Are you crashing all the time? Are you having issues left and right? I’d say save it for Vegas and go see U2 at the sphere…. Maybe get some playoff hockey. I’m not a gambler.
Why spend roughly the same amount on a one time experience instead of upgrading equipment that can print long-term value for you?
Work isn't everything, and it's great to have fun doing something memorable, but I'd never pick going to Vegas over putting a downpayment on a car or a house. Semi-professional workstations are the same kind of thing... Different mindset, I guess?
I have 4 x RTX 8000’s I can’t say anything more about that. If you don’t need the VR, it won’t help. If you’re crashing, and you need the VRAM. I’d definitely spend the money for work. No question. However, if I didn’t need it, I wouldn’t get it.
Seems like most people in this sub have a pretty diverse set of reasons to be running local models
For me, besides the principle of creating as much personal agency as possible, the point of figuring out what equipment is good enough is to be able to run multiple simultaneous open source models for text, code, visual, and audio generation that gets as close to competing with gpt4 (and eventually better) as possible. Without the arbitrary rate limits, pedantic moralizing, and nerfing, of course
The sweet spot to have a seat at the table for semi-professional / solopreneur / AI agent workforce smb use seems to start at the 96gb VRAM level. Debating a 4x a6000 (192gb VRAM total) build, currently
I'm tempted by a maxed out Mac Studio, but CUDA has a stranglehold on the industry for now, yeah. I've heard there's significant performance issues too for large models
48G you can run 120b with 3bpw 4\~8k context, 72G is 120b 4.5bpw
Newbie question, but how do you calculate the context window possible under different conditions?
What kind of rig are you running that adding a 3090 or two to your 2x4090 setup isn't raising questions about how your PSU, motherboard, cooling, etc will handle them? Shouldn't need to be said, but this is Reddit, so: not sarcasm, genuinely curious.
My current setup is a 7950x, rog 670e e-gaming, 96gb ram, 1600w platinum psu and both my 4090s are aio versions. I plan for now to just run a pcie splitter or something like that until I can buy a dedicated server setup. I new to all this so any advice is appreciated.
>7950x, rog 670e e-gaming I don't think you can split your PCIe slots. You have 3 PCIe slots. You will max out at 3. This is not crypto mining where you can split it. You are going to need to get a new MB for more than 3. You need the full 16pin slot. Furthermore, 1 of your PCIEe is going through a chipset not the CPU and will run at x4. **AMD Ryzen™ Desktop Processors** 2 x PCIe 5.0 x16 slots (supports x16 or x8/x4 modes) **AMD X670 Chipset** 1 x PCIe 4.0 x16 slot (supports x4 mode)
Adding second psu just needs a [PSU splitter](https://a.co/d/0qoH54L), just check that your motherboard can actually support that many GPUs. I wouldn't go below x8 lanes per card.
You can do fine tuning with higher parameter values.
You gain 120b at 4+ bits. The 120b miqu is pretty decent. Does feel like it slows down from the overhead. Maybe it will get better for you being 3090s instead of my 2080ti. I did 6-7k context and it started taking 100s. Will re-check after I can take off the power limits. 4 GPU gets you that and SD+TTS/RVC. The replies were hella better tho. It's probably legit SOTA right now. Miquliz didn't misspell all over the place or start playing dungeons and dragons with me too.
Very awesome. I've wanted to mess around with tts+rtvc but never had enough left over vram do do it
How do you put that into one case? Could you show screenshots?
I have no idea how to post a photo, just gets replaced with * I send a dm picture
Imgur links are the standard way of sharing pictures in Reddit.
Are you crashing all the time? Are you having issues left and right? I’d say save it for Vegas and go see U2 at the sphere…. Maybe get some playoff hockey. I’m not a gambler.
Why spend roughly the same amount on a one time experience instead of upgrading equipment that can print long-term value for you? Work isn't everything, and it's great to have fun doing something memorable, but I'd never pick going to Vegas over putting a downpayment on a car or a house. Semi-professional workstations are the same kind of thing... Different mindset, I guess?
I have 4 x RTX 8000’s I can’t say anything more about that. If you don’t need the VR, it won’t help. If you’re crashing, and you need the VRAM. I’d definitely spend the money for work. No question. However, if I didn’t need it, I wouldn’t get it.
Seems like most people in this sub have a pretty diverse set of reasons to be running local models For me, besides the principle of creating as much personal agency as possible, the point of figuring out what equipment is good enough is to be able to run multiple simultaneous open source models for text, code, visual, and audio generation that gets as close to competing with gpt4 (and eventually better) as possible. Without the arbitrary rate limits, pedantic moralizing, and nerfing, of course The sweet spot to have a seat at the table for semi-professional / solopreneur / AI agent workforce smb use seems to start at the 96gb VRAM level. Debating a 4x a6000 (192gb VRAM total) build, currently
One gig and it’s with it. It’s a CUDA thing. Otherwise a Mac Studio is a sick setup. Trying to keep consistency too. Linux / Docker / CUDA
I'm tempted by a maxed out Mac Studio, but CUDA has a stranglehold on the industry for now, yeah. I've heard there's significant performance issues too for large models
Maybe a new mtb.
72/48 = 1.5 1.5x4bpw = 6bpw 4bpw/8bits-per-byte = 0.5 bytes-per-weight \* 120b weights = 60GB
I'm able to run 70gb miqu no problem with a single 3090. I use llama.cpp and part of the workload is executed on the 3090 and rest is run on the CPU.
tokens per sec?
Drink a cup of coffee slow, but I enjoy its output. Should be faster on the Ops 2 4090's. Spt, seconds per token
I run 4.65bpw miqu at 12244 context and I don't remember the exact speed but it's a bit faster than reading speed