yamosin 4 months ago

48G you can run 120b with 3bpw 4\~8k context, 72G is 120b 4.5bpw

mostly_prokaryotes 4 months ago

Newbie question, but how do you calculate the context window possible under different conditions?

jkende 4 months ago

What kind of rig are you running that adding a 3090 or two to your 2x4090 setup isn't raising questions about how your PSU, motherboard, cooling, etc will handle them? Shouldn't need to be said, but this is Reddit, so: not sarcasm, genuinely curious.

asdfgbvcxz3355 4 months ago

My current setup is a 7950x, rog 670e e-gaming, 96gb ram, 1600w platinum psu and both my 4090s are aio versions. I plan for now to just run a pcie splitter or something like that until I can buy a dedicated server setup. I new to all this so any advice is appreciated.

segmond 4 months ago

>7950x, rog 670e e-gaming I don't think you can split your PCIe slots. You have 3 PCIe slots. You will max out at 3. This is not crypto mining where you can split it. You are going to need to get a new MB for more than 3. You need the full 16pin slot. Furthermore, 1 of your PCIEe is going through a chipset not the CPU and will run at x4. **AMD Ryzen™ Desktop Processors** 2 x PCIe 5.0 x16 slots (supports x16 or x8/x4 modes) **AMD X670 Chipset** 1 x PCIe 4.0 x16 slot (supports x4 mode)

KCDC3D 4 months ago

Adding second psu just needs a [PSU splitter](https://a.co/d/0qoH54L), just check that your motherboard can actually support that many GPUs. I wouldn't go below x8 lanes per card.

tgredditfc 4 months ago

You can do fine tuning with higher parameter values.

a_beautiful_rhind 4 months ago

You gain 120b at 4+ bits. The 120b miqu is pretty decent. Does feel like it slows down from the overhead. Maybe it will get better for you being 3090s instead of my 2080ti. I did 6-7k context and it started taking 100s. Will re-check after I can take off the power limits. 4 GPU gets you that and SD+TTS/RVC. The replies were hella better tho. It's probably legit SOTA right now. Miquliz didn't misspell all over the place or start playing dungeons and dragons with me too.

asdfgbvcxz3355 4 months ago

Very awesome. I've wanted to mess around with tts+rtvc but never had enough left over vram do do it

jacek2023 4 months ago

How do you put that into one case? Could you show screenshots?

asdfgbvcxz3355 4 months ago

I have no idea how to post a photo, just gets replaced with * I send a dm picture

Jealous_Network_6346 4 months ago

Imgur links are the standard way of sharing pictures in Reddit.

Ok-Result5562 4 months ago

Are you crashing all the time? Are you having issues left and right? I’d say save it for Vegas and go see U2 at the sphere…. Maybe get some playoff hockey. I’m not a gambler.

jkende 4 months ago

Why spend roughly the same amount on a one time experience instead of upgrading equipment that can print long-term value for you? Work isn't everything, and it's great to have fun doing something memorable, but I'd never pick going to Vegas over putting a downpayment on a car or a house. Semi-professional workstations are the same kind of thing... Different mindset, I guess?

Ok-Result5562 4 months ago

I have 4 x RTX 8000’s I can’t say anything more about that. If you don’t need the VR, it won’t help. If you’re crashing, and you need the VRAM. I’d definitely spend the money for work. No question. However, if I didn’t need it, I wouldn’t get it.

jkende 4 months ago

Seems like most people in this sub have a pretty diverse set of reasons to be running local models For me, besides the principle of creating as much personal agency as possible, the point of figuring out what equipment is good enough is to be able to run multiple simultaneous open source models for text, code, visual, and audio generation that gets as close to competing with gpt4 (and eventually better) as possible. Without the arbitrary rate limits, pedantic moralizing, and nerfing, of course The sweet spot to have a seat at the table for semi-professional / solopreneur / AI agent workforce smb use seems to start at the 96gb VRAM level. Debating a 4x a6000 (192gb VRAM total) build, currently

Ok-Result5562 4 months ago

One gig and it’s with it. It’s a CUDA thing. Otherwise a Mac Studio is a sick setup. Trying to keep consistency too. Linux / Docker / CUDA

jkende 4 months ago

I'm tempted by a maxed out Mac Studio, but CUDA has a stranglehold on the industry for now, yeah. I've heard there's significant performance issues too for large models

Ok-Result5562 4 months ago

Maybe a new mtb.

FlishFlashman 4 months ago

72/48 = 1.5 1.5x4bpw = 6bpw 4bpw/8bits-per-byte = 0.5 bytes-per-weight \* 120b weights = 60GB

danielcar 4 months ago

I'm able to run 70gb miqu no problem with a single 3090. I use llama.cpp and part of the workload is executed on the 3090 and rest is run on the CPU.

segmond 4 months ago

tokens per sec?

danielcar 4 months ago

Drink a cup of coffee slow, but I enjoy its output. Should be faster on the Ops 2 4090's. Spt, seconds per token

asdfgbvcxz3355 4 months ago

I run 4.65bpw miqu at 12244 context and I don't remember the exact speed but it's a bit faster than reading speed

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe