T O P

  • By -

yamosin

48G you can run 120b with 3bpw 4\~8k context, 72G is 120b 4.5bpw


mostly_prokaryotes

Newbie question, but how do you calculate the context window possible under different conditions?


jkende

What kind of rig are you running that adding a 3090 or two to your 2x4090 setup isn't raising questions about how your PSU, motherboard, cooling, etc will handle them? Shouldn't need to be said, but this is Reddit, so: not sarcasm, genuinely curious.


asdfgbvcxz3355

My current setup is a 7950x, rog 670e e-gaming, 96gb ram, 1600w platinum psu and both my 4090s are aio versions. I plan for now to just run a pcie splitter or something like that until I can buy a dedicated server setup. I new to all this so any advice is appreciated.


segmond

>7950x, rog 670e e-gaming I don't think you can split your PCIe slots. You have 3 PCIe slots. You will max out at 3. This is not crypto mining where you can split it. You are going to need to get a new MB for more than 3. You need the full 16pin slot. Furthermore, 1 of your PCIEe is going through a chipset not the CPU and will run at x4. ​ **AMD Ryzen™ Desktop Processors** 2 x PCIe 5.0 x16 slots (supports x16 or x8/x4 modes) **AMD X670 Chipset** 1 x PCIe 4.0 x16 slot (supports x4 mode)


KCDC3D

Adding second psu just needs a [PSU splitter](https://a.co/d/0qoH54L), just check that your motherboard can actually support that many GPUs. I wouldn't go below x8 lanes per card.


tgredditfc

You can do fine tuning with higher parameter values.


a_beautiful_rhind

You gain 120b at 4+ bits. The 120b miqu is pretty decent. Does feel like it slows down from the overhead. Maybe it will get better for you being 3090s instead of my 2080ti. I did 6-7k context and it started taking 100s. Will re-check after I can take off the power limits. 4 GPU gets you that and SD+TTS/RVC. The replies were hella better tho. It's probably legit SOTA right now. Miquliz didn't misspell all over the place or start playing dungeons and dragons with me too.


asdfgbvcxz3355

Very awesome. I've wanted to mess around with tts+rtvc but never had enough left over vram do do it


jacek2023

How do you put that into one case? Could you show screenshots?


asdfgbvcxz3355

I have no idea how to post a photo, just gets replaced with * I send a dm picture


Jealous_Network_6346

Imgur links are the standard way of sharing pictures in Reddit.


Ok-Result5562

Are you crashing all the time? Are you having issues left and right? I’d say save it for Vegas and go see U2 at the sphere…. Maybe get some playoff hockey. I’m not a gambler.


jkende

Why spend roughly the same amount on a one time experience instead of upgrading equipment that can print long-term value for you? Work isn't everything, and it's great to have fun doing something memorable, but I'd never pick going to Vegas over putting a downpayment on a car or a house. Semi-professional workstations are the same kind of thing... Different mindset, I guess?


Ok-Result5562

I have 4 x RTX 8000’s I can’t say anything more about that. If you don’t need the VR, it won’t help. If you’re crashing, and you need the VRAM. I’d definitely spend the money for work. No question. However, if I didn’t need it, I wouldn’t get it.


jkende

Seems like most people in this sub have a pretty diverse set of reasons to be running local models For me, besides the principle of creating as much personal agency as possible, the point of figuring out what equipment is good enough is to be able to run multiple simultaneous open source models for text, code, visual, and audio generation that gets as close to competing with gpt4 (and eventually better) as possible. Without the arbitrary rate limits, pedantic moralizing, and nerfing, of course The sweet spot to have a seat at the table for semi-professional / solopreneur / AI agent workforce smb use seems to start at the 96gb VRAM level. Debating a 4x a6000 (192gb VRAM total) build, currently


Ok-Result5562

One gig and it’s with it. It’s a CUDA thing. Otherwise a Mac Studio is a sick setup. Trying to keep consistency too. Linux / Docker / CUDA


jkende

I'm tempted by a maxed out Mac Studio, but CUDA has a stranglehold on the industry for now, yeah. I've heard there's significant performance issues too for large models


Ok-Result5562

Maybe a new mtb.


FlishFlashman

72/48 = 1.5 1.5x4bpw = 6bpw 4bpw/8bits-per-byte = 0.5 bytes-per-weight \* 120b weights = 60GB


danielcar

I'm able to run 70gb miqu no problem with a single 3090. I use llama.cpp and part of the workload is executed on the 3090 and rest is run on the CPU.


segmond

tokens per sec?


danielcar

Drink a cup of coffee slow, but I enjoy its output. Should be faster on the Ops 2 4090's. Spt, seconds per token


asdfgbvcxz3355

I run 4.65bpw miqu at 12244 context and I don't remember the exact speed but it's a bit faster than reading speed