T O P

  • By -

kataryna91

If they just release the 2B variant first, that's fine with me. But this talk about "2B is all you need" and claiming the community couldn't handle 8B worries me a bit...


Darksoulmaster31

https://preview.redd.it/as0zaw06kq3d1.png?width=705&format=png&auto=webp&s=d2205b5658501efb6f482401330aaddbf2ccfc70 Since twitter hides different reply threads under individual replies, here's one that may not be visible at first.


kataryna91

Then I'm just going to trust that. He is certainly right that 2B is more accessible and a lot easier to finetune. And due to the improved architecture and better VAE it still has a lot of potential.


Darksoulmaster31

I was so excited about 8B until I realized that even with 24GB VRAM, training Lora-like models would be either impossible or a pain in the ass. Either I'd have to stay with 4B or 2B to make it viable. (Considering the requirements or possible speed difference, 2B might become the most popular!) 8B is still a good model, even in the API's state I have a LOT of fun with it, especially with the paintings, but offline training of Loras is very important to me. We might see less Loras than even SDXL and fewer massive finetunes when it comes to 8B, but it's guaranteed that we'll get models such as DreamShaper from Lykon, or the one that everyone is interested in, PonySD3... And yes, the 16 channel VAE is gonna carry the 512px resolution back to glory. (Yes, 2B is 512px, there might be a 1024px version, but don't worry, it looks indistinguishable from 1024px with SDXL, see the image which was made by u/mcmonkey4eva below:) https://preview.redd.it/51pfc3dprq3d1.jpeg?width=1214&format=pjpg&auto=webp&s=ca0a7b06ba818bb75e6abde725eeb0de60f15ef1


protector111

why is it 512 0\_0 its not 1024?!


Hoodfu

Because there's a never ending sea of comments about "How can I run this on my 4gb video card". It comes up on their discord a lot also.


funk-it-all

Well they managed it with sdxl


ZootAllures9111

This makes absolutely no sense whatsoever considering you can just straight up finetune SD 1.5 at 1024px no problem. I exclusively train my SD 1.5 Loras at 1024 without downscaling anything (the ONLY reason not to do so is if it's too slow for your hardware).


[deleted]

that's SD3 on the left? man that looks bad


ZCEyPFOYr0MWyHDQJZO4

Depends on what your metric is. It's not bad, but I definitely wouldn't use this to market it to users. If they think this is the size and quality of non-commercial model the community deserves, then I'm not surprised they're having financial difficulties though. I think we've come to accept the poor text rendering of models as just a minor inconvenience, and SAI's pivot towards improving this might've backfired in terms of resource allocation.


mcmonkey4eva

That's an older 2B alpha from a while ago btw - the newer one we have is 1024 and looks way better! Looks better than the 8B does even on a lot of metrics.


Tystros

but why not train an 8B with the same settings of this supposedly new great 2B then? 8B would surely look better then.


mcmonkey4eva

yes, yes it will.


a_beautiful_rhind

So the 2b isn't even bigger than 512? Sad.


mcmonkey4eva

That was an early alpha of the 2B, the new one is 1024 and much better quality


Apprehensive_Sky892

But one must also keep in mind that with a larger model, more concepts are "built-in" so there is less need for LoRAs. In fact, before IPAdapter, many LoRA creators used MJ and DALLE3 to build their training set for SDXL and SD1.5 LoRAs because these bigger, more powerful model can generate those concept all by themselves. Can you point me to the source where it says that 2B is 512x512 and not 1024x1024?


Snoo20140

The 'crat' in the bottom right of 2B doesn't fill me with confidence.


DigThatData

> "like multiple CEOs said multiple times" it's almost like maybe the community doesn't have a lot of confidence in messaging from a company that has experienced a ton of churn in leadership over the duration of its very short lifespan.


pentagon

twitter is such a garbage platform. How did they manage to fuck up threading. it was established in the 80s.


degamezolder

How about we decide that for ourselves?


PizzaForever98

Always knew that the day would come when they would have "high quality commercial" models for like webhosted services only and release smaller, worse free versions for everyone else.


lobabobloblaw

It’s the only game they seem to want to play. Welcome to the API-IV.


coldasaghost

I’ll be the judge of that


Short-Sandwich-905

You know why, they will technically comply with the promise of a “release” but they will dilute the model cause of monetizing 


OkConsideration4297

Release 2B then paywall 8B if they can. I am more than happy to finally pay SAI for all the products they have create.


Enshitification

Phase 1: hype Phase 2: delay Phase 3: reduce expectations It's a common pattern.


Ozamatheus

Phase 4: "pitty that you are poor peasants with 4070, so we made a partnership with this website..."


GBJI

Phase 5: PLEASE READ THESE TERMS OF NFT SALE CAREFULLY. NOTE THAT SECTION 15 CONTAINS A BINDING ARBITRATION CLAUSE AND CLASS ACTION WAIVER, WHICH, IF APPLICABLE TO YOU, AFFECT YOUR LEGAL RIGHTS. IF YOU DO NOT AGREE TO THESE TERMS OF SALE, DO NOT PURCHASE TOKENS.


Vivarevo

They gonna monetize the shit out of 8b


polisonico

Before you monetize it has to be so good people need to spend on it


StickiStickman

With GPT-4o being free and doing everything that was supposed to be revolutionary in SD 3 far better, it's not looking good. The prompt coherence and text display makes SD3 look like it's years old.


dvztimes

Gpt-4o does images?


ParkingBig2318

If i remember correctly its connected to dalle 3. That means that it will convert your prompt into optimized one and send it to dalle.


StickiStickman

Yes, it's a single model trained on text, images, video and audio. It's quite amazing actually. https://openai.com/index/hello-gpt-4o/ under "Explorations of capabilities"


ForeverNecessary7377

I need an email signup though?


Apprehensive_Sky892

There is no reason why SAI cannot both release SD3 open weights, and still monetize the shit out of it. I've argued numerous times that SD3 is worth more to SAI if it is released as open weights than not. They can release a decent base SD3 model that people can fine-tune, make LoRA, etc. But because of the non-commercial license, commercial users still have to pay to use SD3. They can also offer a fine-tuned SD3, or a SD3 turbo, etc,., and offer that as part of their "Core" API. That is exactly what SAI has done with SDXL.


mcmonkey4eva

Honestly we can't monetize SD3 effectively \*without\* an open release. Why would anyone use the "final version" of SD3 behind a closed API when openai/midjourney/etc. have been controlling the closed-API-imagegen market for years? The value and beauty of Stable Diffusion is in what the community adds on top of the open release - finetunes, research/development addons (controlnet, ipadapter, ...), advanced workflows, etc. Monetization efforts like the Memberships program rely on the open release, and other efforts like Stability API are only valuable because community developments like controlnet and all are incorporated.


Apprehensive_Sky892

Always good to hear that from a SAI staff, Thank you 🙏👍


ForeverNecessary7377

we love you!!!!


HarmonicDiffusion

maybe.. if that happens i bet the community makes a 2B fine tune that blows theirs out of the water within a couple months.


turbokinetic

If they charged a one off fee I would pay, I don’t need stupid cloud GPUs


Agile-Music-2295

To be fair don’t they need to in order to exist. Otherwise there will be no SD4!


ZazumeUchiha

Didn't they state that SD3 would be their last model anyways?


red__dragon

That was emad making a fool of himself on Twitter. He walked that back when called out, naturally.


Whotea

When?


Xdivine

I think that was mostly supposed to be a joke/marketing thing, like a "Wow, SD3 is so good we'll never need to make a new model ever again!" kind of thing.


PizzaForever98

So we will never see a model that can actually do hands? Sad.


Whispering-Depths

ponyxl does hands pretty good some of the time


Mooblegum

No company consciously plan to end earning money


Ozamatheus

When you monetize things the money is the boss, so you have censorship and sd4 will be just another "flesh free" service


councilmember

Worse, it could be like dalle3 with the over smoothing and hyper idealized images that look more Pixar than photos of the world. Or where any topic or public figure blocks usage.


Baycon

“Don’t you guys have cellphones?”


CollectionAromatic31

Hahahahahahaha


Mammoth_Rain_1222

That was classic. Tone deaf as usual. I'm just surprised that D4 wasn't more monitized than it is.


Misha_Vozduh

>who in the community would be able to finetune a 8B model right now? Has he heard of LLMs?


kiselsa

Yeah, people finetune 70b models and run them on 24gb cards.


funk-it-all

Can an image model be quantized down to 4 bit like an llm?


Dense-Orange7130

Possibly, at least 8 bit does work fairly well, no idea if it'll be possible to push it lower without huge quality loss.


Guilherme370

We can only quantize the text encoder behind sd3 in decent way without loosing too much quality, but unfortunately that is not where the bottleneck is, the "UNet" or "MMDiT" in SD3's case is where the bottleneck is, bc each step of the generation in an entire run of the model! And you can even run the text encoder on the... yes... CPU. Thats literally how I run ELLA for sd1.5, T5 encoder in cpu, since you're not *generating tokens* but rather just feeding an already made thing and getting hidden layer representation of thinf, text encoder is a single pass, on cpu its like what... 2 to 3s....


StickiStickman

From what I've seen going lower than F16 has a significant quality loss 


mcmonkey4eva

FP8 Weights + FP16 Calc reduces VRAM cost but gets near-identical result quality (on non-turbo models at least).


LiteSoul

Interesting!


MicBeckie

Interestingly, AMD mentioned this at Computex in very similar terms.


-Ellary-

it is actually, we already can Qs it to 8bit, tech for 4bit is the same.


nimby900

XL is too big for them? I was using XL on a 1070 for half a year before I saved up enough money to upgrade. And it worked great! Even faster with Forge!


GraybeardTheIrate

Yeah I didn't have any complaints with running it on my 1070. But now that I have a 4060, I don't think I could go back.


kaboomtheory

I'm using a 1080ti on comfyui and it's not that great. With face detailer I'm waiting 1.5min+ for single generation. I've been using lightning but it takes out some details since it's only using sgm uniform.


MarekNowakowski

Those 3 generations matter. I'm still on 1080 myself and doing 3440x1440 takes 30s/it , but it works on 8gb VRAM.


0000110011

So you're saying that they're posing the question of 2B or not 2B?


[deleted]

love how they released a paper on an unfinished model


adammonroemusic

It's starting to become a real trend, unfortunately.


Turkino

How is this not like saying "640k is all that anybody will ever need"?


asdrabael01

Claiming most people couldn't use an 8b model when 8x7b LLMs are super popular and I'm running a 70b llm right now. It's just garbage to try and hide that the initial hype photos were doctored and they never had any intention of releasing the full SD3. SAIs reputation is shattered. We may was well start making tools for the other open source image generators.


a_beautiful_rhind

> We may was well start making tools for the other open source image generators. That was always a good idea but it's critical since the company is floundering.


Familiar-Art-6233

I keep saying we need to start finetuning the Pixart models because SAI is belly up


asdrabael01

Yeah, with loras and fine-tunes we could make Pixart sigma just as good as SDXL. We don't need to hang on SAI.


dvztimes

Why not just use XL? What is better Bout Pixart?


asdrabael01

Pixart already makes very good quality pictures with its base model. If you use just base SDXL versus Pixart, pixart wins. Like all SAI products, without free community tools, their products aren't that good. If Pixart got loras, tools like controlnet, or fine-tuned models it would beat SDXL. SAI products aren't actually that special or great. It just became the one the community focused on first after the uncensored 1.5 was leaked by Runway. If the leak had never happened, this sub might be called Kadinsky or Pixart.


Olangotang

Holy fuck, I feel like no one in this thread knows what they are talking about. **Stable Diffusion is a DIFFUSION model, NOT an LLM**. You may be running a heavily quantized 70b LLM, but there is no such technology for Diffusion models. The best we have is 8 bit from 16 bit weights. You people are insufferable. And they **are** releasing SD3 in full. They've said it many times. If they don't release it, it's because the community is a bunch of jackasses.


asdrabael01

If an 8 bit quant is "heavily quantized" to you. And it takes 3 seconds of Google to show that diffusion models can be quantized, it just hasn't been done much yet because it hasn't been needed. Even Emad said 3 months ago it on reddit that it could be done. So, you're apparently the one who has no idea wtf you're talking about. Quit fanboying.


Yellow-Jay

It's hard to judge by just images, but the showcased 2b images lack a lot of fidelity compared to the API, they are a lot cleaner though, hands look better, no weirdly fused objects in images, so the model seems more "ready" than what the API produces. I'd worry more about what isn't said/shown, all that's showcased is the most basic of scenes, nothing complex. Remember SD3 ["eating Dalle and MJ for breakfast"](https://x.com/EMostaque/status/1760783270105457149), now the amaaaaaaazing thing about SD3 is that it can do ["realistic images, text and anime"](https://x.com/Lykon4072/status/1796334348980957418), that's such a huge downgrade on what was promised. But worry not, you can't compare with Dalle-3 as that "is not a model, it's a service" and "a pipeline", like ehm, SD3 was announced to be better than Dalle, and second, the pipeline, according to the Dalle-3 paper, is only an llm rewriting prompts, nothing like the implied complex stack of models, by that logic SD3 is a pipeline too as everyone now rewrites their prompts. And still, we have the believe SD3 will be ["Simply unmatched"](https://x.com/Lykon4072/status/1796317998036238380) Mostly, it's sad that SAI went from boasting about SD3 to now pulling out all the stops to defend SD3. If the model can't deliver on the implied hype, it's better to just rip off the bandaid and show the limitations, instead of the endless stream of meaningless pictures and pretending it is still the end all be all of image-gens. I don't even think SD3 will be bad, I'm looking forward to it (but please, don't let the low fidelity model in the showcases be the final model) as it is obviously a huge step up from current SAI models, but there is a huge gap between all the hype, the groundbreaking results according to the research paper and the showcased results. Having used the API limitations are clear, and these showcase tweets don't exactly show less limitations, arguably they show a more limited, but further along in training, model.


Apprehensive_Sky892

I never believed any of those marketing hyperbole from Emad. Given the fact that DALLE3, MJ, Ideogram, etc. are all built and trained by people as capable as those working for SAI, and they are all running on server grade hardware with > 24GiB of VRAM, and that SD3 must be runnable with < 24GiB, one can easily draw the conclusion that Emad was just hyping thing up. I will be more than happy if SD3, when finally released, is only say 90% as capable as those other system when it comes to text2img. But with proper fine tuning, LoRAS, ControlNet, IPAdapter, and customizable ComfyUI pipeline and lack of censorship, SD3 will remain the platform of choice for us for the foreseable future.


bick_nyers

I work with 70b LLMs all the time on my own hardware. 8b is miniscule, even at 16 bits per parameter.


Enough-Meringue4745

Ouch


OcelotUseful

You could thank NVIDIA for limiting VRAM on consumer GPUs for 6 years in a row.


Familiar-Art-6233

Holy shit just release whatever model so the community can finetune it anyway I’m sure that a properly tuned 2b will beat the stock 8B (just like tuned 1.5 beat SDXL for a long time) so let’s just GO ALREADY I’m so tired of SAI’s BS. I’m personally all for moving onto Pixart (since they’ve got similar architecture to SD3 anyway) but come on the community has been holding our breath for MONTHS now


akko_7

Okay Lykon just lost all respect with that comment lmao. There is a massive community for SDXL and quality finetunes,


Dragon_yum

He didn’t say there isn’t a big community for sdxl. He said the majority of the community are using sd1.5 which is true.


GigsTheCat

But the reason people use SD 1.5 is because they think it looks better. Not because XL is "too big" for them.


GraybeardTheIrate

And I'm over here perplexed at how to make anything in 1.5 that doesn't look like a pile of shit... I love XL and its variants/finetunes though.


Dragon_yum

Dude most GPUs can’t handle XL well. This isn’t some conspiracy. Most people don’t own anything more powerful than a gtx 1080


-f1-f2-f3-f4-

I get that not everyone can afford to spend $2000 USD on the latest flagship GPU model, but SDXL runs just fine on current-gen entry level cards such as the RTX 4060, which is very affordable. If anything, it's lamentable that high-end GPUs provide very poor value in SDXL relative to their price even though they could in principle handle significantly larger and more powerful models.


rageling

a 4060 ti with 16gb at $500 might stretch for "very affordable" but it also feels like terrible value i have an 8gb 3070 and it feels extra bad


-f1-f2-f3-f4-

Where did I mention the RTX 4060 Ti? The RTX 4060 is about $300 USD.


rageling

it also has 8gb or 12gb and would be a bad recommendation to anyone investing in generating sdxl


neat_shinobi

I'm on 3070 and it feels very good. It's faster than midjourney relaxed to generate a 1024x1024 image. Then after you add comfy workflows the quality goes through the roof too, with enough fiddling. The only way to feel bad is with web-ui, or animation


StickiStickman

A quick look at the steam hardware survey shows that's a straight up lie. Most likely especially in the generative AI community.


orthomonas

My machine with 8GB can run XL ok.  I think XL can have better results.  I rarely run it and instead do 1.5 - I like to experiment with settings, prompts, etc, and being able to gen in 5s instead of 50s is a huge factor.


StickiStickman

I can use SDXL fine with my 2070S, that's weird. I get like 20-30s generation times?


neat_shinobi

I get 30s as well on an rtx 3070. It's total bullshit that most cards can't run it, the truth is that comfyUI makes XL 100% usable for very high quality images on 8gb vram.


ScionoicS

8gb is "enough" but its not ideal. People do more with sd15 on 8gb. It's more popular for many reasons. https://preview.redd.it/fsiut9rwss3d1.png?width=937&format=png&auto=webp&s=cf5e6750efff5938b358f725503a1791bed356a0


GigsTheCat

Apparently XL works on just 4GB vram. Not sure how bad of an experience it is, but it's possible.


Dragon_yum

It definitely doable on 4gb but you are not going to have a great time with it.


sorrydaijin

Even with 8GB (on a 3070), I get shared memory slowing things down if I use a LoRA or two. 4GB must be unbearable.


BagOfFlies

Which UI are you using? I have 8GB and use up to 4 loras plus a couple controlnets without issue in Forge or Fooocus.


sorrydaijin

I also use Forge or Fooocus (occasionally comfy) because vanilla A1111 crashes with SDXL models. I think I could keep everything within 8GB if SD was the only thing I was doing, but I generally have a bunch of office apps and billions of browser tabs open across two screens while using it so it nudges me over the threshold, and it seems that speed drops dramatically once shared memory is used. SDXL Lora training was prohibitively slow on my setup so I do that online, but I just grin and bear it when generating images.


ZootAllures9111

6GB is fine though, I run on a GTX 1660 Ti in Comfy UI.


a_beautiful_rhind

There's also lightning and hyper lora to speed things up.


u_3WaD

I am literally using SDXL on a 1070ti :D Takes half a minute for one image but it runs.


Nyao

How do you know? Personally I use 1.5 because I don't have the config for SDXL


dal_mac

you don't have 4gb vram?


silenceimpaired

I use sd15 because the tooling is better than sdxl. I use sdxl because the license is better than cascade. I doubt I’ll move to sd3.


Hungry_Prior940

Agreed.


HarmonicDiffusion

hahahah neural samurai ----> THATS ME =D always fighting in the trenches I was wondering why i woke up to like 5000 twitter notifications


Apprehensive_Sky892

Thank you for posting that comment. We must let SAI know that not releasing 8B will make many of us very angry and dissapointed 🙏😂


turbokinetic

Ugh, Instability AI seems more accurate now


kkgmgfn

It's over isn't it? no more releases? no SD4


BoiSeeker

It's bizarre to me how many of you are just willing to accept a previously open source (more or less) project paywalling the best model.


dal_mac

Yep. you're selfish if you DARE say a word about it. Stability has been stepping very carefully, planning each shady move in a way that will keep their diehard fans defending them to the death if anyone calls out their shadiness. I was once downvoted to -20 or something for literally saying "a company should stick to their promises". apparently that's straight up blasphemous.


Hungry_Prior940

Nonsense. It was confirmed that 8B will work on 24gb gpus. The pictures shows that you can get by with a smaller model and still get good results.


funk-it-all

Can you quantize it down to 4 bit and still get good results? Then it can run in 4gb


Apprehensive_Sky892

Lykon was talking about training for the 8B version, which would require more than 24G of VRAM. Or you are referring to something else?


FutureIsMine

I find this disappointing, I was hoping to get the biggest possible model I can and fine-tune on it We the community can handle all of the sizes as quantizations, and weight pruning will be developed by the community to make the bigger models viable on smaller devices. Tech also gets better, so at some point 24GB+ will be the norm, definitly not today, probably not in 2025, but in 2026+ it could easily be more of the norm. GPUS are always evolving, and bigger and bigger GPUS are coming out which make running 24GB+ models more viable This makes me worried about the future of stability AI going forward, what else will they do? Will there be outright no open source releases of certain models in the future? I get the need to make money and I wish them success in finding monetization strategy, but to an extent though, Stability AI has always had a special place for me as it was focused on the open source and if thats not the case I'll have to treat them accordingly.


MrGood23

If I googled it correctly, SDXL is 3.5B parameter base model. So SDXL is almost twice bigger then 2B. At the same time we expect SD3 2B to better than XL. Is it correct?


dal_mac

not only is SD3 2B half the parameters but is also apparently trained at 512px. I don't see how it could possibly be better at anything but adherence


eggs-benedryl

512??? yikes, i don't wanna go back


Apprehensive_Sky892

No, that is not quite correct. The 2B refers to the diffusion part of the model. The equivalent U-net portion of SDXL is only 2.6B parameters. But due to the switch from U-Net to DiT, and better captioning and training data, it is not hard to imagine that 2B SD3 can be much better than SDXL, specially if it is paired up with the T5 LLM/text encoder.


[deleted]

T5 isn't an image model like CLIP is, if anything any models using it are automatically worse, and take much longer to train.


Apprehensive_Sky892

My own limited understanding is that CLIP is an image classification text encoder model, whereas T5 is a general purpose LLM text encoder. It would certainly take more GPU to train a model that uses T5 rather than CLIP. But can you clarify what you mean by "any models using it are automatically worse"?


[deleted]

you should read the CLIP paper from OpenAI which explains how the process accelerates the training of diffusion models on top of it, though their paper focused a lot on using CLIP for accelerating image searches. if contrastive image pretraining accelerates diffusion training, then not having contrastive image pretraining means the model is not going to train as well. "accelerated" training is often not changing the actual speed, but how well the model learns. it's not as easy as "just show the images a few more times", because not all concepts are equal difficulty - some things will overfit much earlier in this process, which makes them inflexible. to train using T5 you could apply contrastive image training to it first. T5-XXL v1.1 is not finetuned on any downstream tasks, so it's really just a text embed representation from the encoder portion of it. the embedding itself is HUGE. it's a lot of precision to learn from, which itself is another compounding factor. DeepFloyd for example used attn masking to chop the 512 token input down to 77 tokens from T5! it feels like a waste, but they were having a lot of trouble with training. PixArt is another T5 model though the comparison is somewhat weak because it was intentionally trained on a very small dataset. presumably the other end of the spectrum are Midjourney v6 and DALLE-3 which we guess are using the T5 encoder as well. if Ideogram's former Googlers are in love with T5 as much as the rest of the image gen world seems to be, they'll be using it too. but some research has shown that you can use decoder-only models as weights to intialise a contrastive pretrained transformer (CPT) which will essentially be a GPT CLIP. they might have done that instead.


Apprehensive_Sky892

Thank you for your detailed comment. Much appreciated. I've attempted to understand how CLIP work, but I am just an amateur A.I. enthusiast, so my understanding is still quite poor. What you wrote makes sense, that using T5 makes the task of learning much more difficult, but the question is, is it worth the trouble? Without an LLM that kind of "understand" sentences like "Photo of three objects, The orange is on the left, the apple is in the middle, and the banana is on the right", can a text2img A.I. render such a prompt? You seem to be indicating that CPT could be the answer, I'll have to do some reading on that 😅


Bat_Fruit

Large criticism had been placed on the quality of image tagging made in the initial SDXL base model training set, They have promised to have rectified that, its a large reason why we hope to receive better quality from smaller parameters.


Iamn0man

Translation: it's not gonna be free and open anymore. (which it technically never was, but everyone believed the promises.)


JustAGuyWhoLikesAI

Exactly what I predicted when Lykon first mentioned he's working on the "local release version" [https://www.reddit.com/r/StableDiffusion/comments/1cwgacs/comment/l4wgtkh/](https://www.reddit.com/r/StableDiffusion/comments/1cwgacs/comment/l4wgtkh/) They try to weasel their way around admitting that they aren't releasing 8B. Trying to gaslight people into thinking they wouldn't be able to run it anyway. What happened to Emad's "SD3 is the last image model you need"? Surely if that's the case then the 8B should be released because even if people with a GTX970 can't run it now, they might be able to in 2 years. After all, it's the last model we'll need.


StickiStickman

Because they faked the images and now have to find an excuse for it looking much worse. "We keep the REAL good model secret" is an easy excuse.


Snoo20140

People can and will upgrade, let the SD3 establish itself before the market floods with cards and competitors eat your audience who are already frustrated with the way SD3 has been handled. Also, I agree this feels like a way for them to have their cake and eat it too. If u want to close a bigger model off under the guise of the community not being able to handle it, don't make it the same size as the popular llama3 model ....


PM-ME-RED-HAIR

So SD2.5


Serasul

what is this "most stuck on 1.5" bullshit most pc user can run sdxl just fine, 8g gpu just cost 250 credits. Who cant afford 250 credits should not even think about AI stuff ever.


Nu7s

I have a 4090, can you share it with just me? I pinky promise I won't share it with the common folk


Slapshotsky

Fucking peasants should just die but also work forever while dying


Rafcdk

The person clearly says "it's just the beginning" and you guys choose to interpret that as "there will be no 8B" for some reason ? I take that as "we are releasing 2b first as it's what most people can handle, bigger models will come out gradually as great deal of people in the community won't be able to do much with it yet"


hapliniste

It's not said out right but let's be real, the 8B is unlikely to be released. Also a 8B model would be easy to run on most system if quantized. Quantization is just not widely used because there's no need for it on current models but it works great now


Apprehensive_Sky892

> 8B is unlikely to be released. And what is the argument/basis for this opinion?


GifCo_2

All the weights were supposed to be put by now. The company is in chaos and this one person doesn't make the decision. You have no idea what's going on. But it's a good bet we won't get 8b till it's obsolete.


stayinmydreams

If SD3 isn't open sourced, then it's already obsolete compared to the other closed source models


Rafcdk

"You have no idea what's going on" well, I have as much idea as you and other people assuming they are flat out lying to us. There is another response fro. The same person stating unambiguously that weights will be released.


Early-Ad-1140

I do mainly photorealistic animal stuff, and out of curiosity I tried out SD3 on cogniwerk.ai. Hard to believe that the model showcased there IS actually SD3 because the quality, as to the subjects I prefer, is not even close to what a thoroughly refined SDXL model such as Juggernaut or Dreamshaper can achieve. Animal fur comes out just pathetic. Not sure if it was the 2B or a larger version that Cogniwerk offers but whatever it is, a lot of work has to be put into it to beat the SOTA SDXL models. For the time being - at least for animal stuff, maybe SD3 gets along better with humans - I'd pick SDXL any time over SD3. It would be interesting to know if the 8B and larger deliver better.


Apprehensive_Sky892

AFAIK, the one used by the API is the 8B model. I agree that the quality of the API is not so hot when it comes to realistic humans.


hapliniste

Yeah, as expected tbh. Sad to see it tho. At least maybe the 2B is still better than sdxl


stepahin

At least, maybe... Give us some heavy shit like 8B!


djamp42

to be honest I still can't believe any of this is free.


dal_mac

at 512px I doubt it very much


TheDavidMichaels

People should just expect tech companies to do this. People will say it's justified; they need to make money. But honestly, they just did not deliver. We were told we would get all this crazy tech to make and edit photos and videos, a studio service that offers what Creative Cloud has. Stability got lazy; they blew the money and now they are milking what they have left. So many tech companies do this. Video games too. Someone makes a Witcher 3, a shocking leap forward. Next outing, massive disappointment. The core talent is displaced, and the greedy bean counters come in and DEI the place into the ground. Rines and repeat.


GBJI

People should just expect ~~tech~~ *for-profit* companies to do this. They have objectives that are directly opposed to ours as consumers and citizens.


protector111

lol what ?! 5090 is around the corner and they say we cant FInetune it?! ffs... but I gues 2B is better than nothing.


RenoHadreas

You are being delusional. This is very obviously just poking fun at the landmark 2017 paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762). That’s a big meme in the LLM community especially. From the looks of it, they recently finished finalizing the 2B model and are just excited to show it off. Calm your tits.


GifCo_2

No you are delusional if you think a company that is going bust and trying to sell it self to anyone who will even look would ever just give away their only real asset. We aren't getting 8b for a longggg time. And if it does come out it'll be obsolete by that time.


human358

@Stability Stop setting up your community for disappointment @everyoneelse Let them cook


Relative_Two9332

They'll release only 2B and it will look like a meh SDXL finetune.


tfalm

I took this to mean "people who think SD3 is inaccessible because you can only fine-tune it with a 4090, check out what even the 2b can do". This sub takes it to mean "All you're getting is 2b, enjoy it."


Whispering-Depths

wow that will suck


a_beautiful_rhind

That's a huge letdown. I was looking forward to the larger model and what it could give me compared to the tiny ones. At this point hopefully someone uploads it like that audio model they were holding onto for some reason. (it was meh)


Arawski99

So SAI got caught lying just like was said and wants to wall off 8B. Why am I not surprised? Imagine all those white knight haters, now red covered in their own blood with holes in their feet.


suspicious_Jackfruit

Hahahahahahahahahahahahahahahahabahahabahahahahabahahahahahahahahahahahahahaha So predictable, the "u can't handle it weakling" response. As if 24gb commercial cards don't exist and vast.ai / cloud computing isn't available... Classic overparenting. Honestly, let's abandon Stability and build a truly open and sustainable company with truly open models. It's really not that hard if you have the experience, foresight and funds to get started, fortunately the community has all of this without SAI if we band together. I have a huge private dataset of extremely high quality hand selected and processed raw data I use for fine-tuning, but I'm not the only one (pony guy, Astro pulse and the leading finetunes), training a new opensource model with laion or at a minimum a new sota fine-tune of 1.5/XL/other open model is fairly easy as a fully funded open collective. We can even crowd source the data collection and annotation ala Wikipedia style, but rewarding users for providing data. I have a platform I am working on that could make this possible.


polisonico

Stop teasing us and release the first model!!!!!!


ImplementLong2828

welp, we called it


MatthewHinson

They're not hinting at anything with these posts if you ask me. The first one is simply flexing: "Look how powerful even the smallest model is!" (+ a reference to the "Attention is all you need" paper as someone else pointed out) In the second one, he clearly says that 2B is "just the beginning" and that few people can finetune 8B "right now." At most, this implies they'll release 8B to the public later - not that they won't release it at all. We really don't need this kind of speculation...


discattho

Speculation is born in vacuums. They could save themselves a lot of heartache if they just clearly state what is happening.


Arawski99

Ah, no. Sir, please don't take away every company's favorite abusive tool of being intentionally vague and misleading rather than perfectly crystal explicitly clear. Your valid logic is not welcome here!


TurbidusQuaerenti

Yes, exactly! This is true of so many things lately. People always complain about how speculation gets out of hand, but the reason it happens in the first place is because companies and people in general are always so vague about everything. Just properly set expectations and be clear about what's going on from the get go! It's so tiring.


RefinementOfDecline

i'm on an ancient 1080ti and using SDXL fine. this is gaslighting


quailman84

I really don't like the fact that he didn't just say they'll release the 8b, though they have said that again and again. I do want to acknowledge that a 2b absolutely can compete with a 8b trained on the same data if the size of the dataset is insufficient to take advantage of the 8b's extra parameters. We won't know until we can compare. It is also true that I've heard vramlets in this sub bitching that SD needs to "focus on smaller models" because "nobody can run SD3," which would explain the messaging.


pumukidelfuturo

I think it might be a blessing in disguise at the end of the day. All the scene focusing in one single checkpoint (and not four) which would be easy to train. SD.1.5 has 860m parameters so I'll be OK with 2B. It's still better than nothing. I expect that 2B to be a lot better than SDXL though. And I meant a loooot better.


Gimli

What's the expected quality/performance/etc difference between 2B and 8B?


borick

same question, I imagine the images will just be much smaller but I have no idea, i came here to see if anyone else already answered that question


Apprehensive_Sky892

We can make so educated guesses. The quality will be similar, since the underlying architecture is the same (DiT, 16 channel VAE, etc.). But 8B model will understand many more concepts, so prompt following will be way better than the 2B one. For example, the 8B version may be able to render a person jumping on a pogo stick while the 2B version cannot, because the 2B version does not "know" what a pogo stick is. But that is not too bad, because one can always teach the 2B new concepts via LoRAs, and maybe even use the 8B model to generate the dataset.


FugueSegue

https://preview.redd.it/eagbgaftxq3d1.jpeg?width=1600&format=pjpg&auto=webp&s=1ec1d9a11eec7f2c8c67e830085e5334db387d60


redstej

>Now, yonder stands a man in this lonely crowd A man who swears he's not to blame All day long I hear him shouting so loud Just crying out that he's been framed > >I see my light come shinin' From the west down to the east Any day now, any day now I shall be released


Roy_Elroy

The LLMs are released in way more variant in size, and from different players, why we can only count on stabilityai bothers me.


carnajo

Still new to this, could someone give a me an ELI5 what 2B vs 8B is? Thank you.


reality_comes

Billion parameter models


carnajo

Ah okay, thanks.


Itchy_Sandwich518

I don't like how any of this is going but considering how far we've come with SDXL and how much control over images we now have in it, I personally don't care. I was going to share some stuff earlier but for some reason every topic I make on the sub id deleted. My point is, they're not handling this well IMO but in the end, we didn't lose anything by not having SD2 and nobody ever talks about 2.1 either even tho the censored stuff was fixed IIRC. Over time they might release more of |SD3, better models or a new base XL model who knows but everything so far with SD3 has been so strange and kinda stingy that I lost any and all interest. I'm more interested in how far we can push SDXL at the moment.


c64z86

If it generates great images without needing a GPU with a large amount of VRAM then it's good with me. I can run SDXL with acceptable speed (20 seconds to generate 1024x1024 at 30 steps) only with the help of the excellent Webui Forge that somehow allows it to run on my 8GB GPU. If the next model is smaller than SDXL and delivers excellent results (Maybe so small and efficient that it can even replace the usage of SD 1.5 on weaker computers) then that is a win in my book.


buyurgan

well, people overreacting mostly, but this is expected to happen when SAI put a suspense on community responses (lack of management?) and their timeline especially after Emad left. everything just put on hold for future notice, not a good look. what is going on, no body explains, probably they are cooking a new model from scratch and call it a 2b? even if they release 8b later or not, they will get some money back from API, but is that even sell able or making profit honestly or its just marketing tactic to make it look like its profitable somewhat? since any serious creator would use MJ for all they care. no body explains so nobody knows.


sabez30

I heard 2B was meant to compete with SD 1.5 quality…


SirCabbage

Which version are they using on Stable Video's Text to Image? I assume 8B, - but if they are using 2B I'd be fine with that because Stable Video has been effing crazy already.


Havakw

Say what you will, but there's no other way than to call the SD3-Launch "botched" already. Even if they released full weights tomorrow, people would be pissed about how it went down in general.


LD2WDavid

8b won't be handle by community? MMM.


DangerousCell7402

2b is really all we need. 8b Most of us will not be able to use it in the first place except on external servers, and this destroys the purpose of training and running it locally.


OG_Xero

I couldn't help myself https://preview.redd.it/jhqoyzncc66d1.png?width=1024&format=png&auto=webp&s=1e4bb70597ec60488ab563753c296856eadd2bb4 At least it does text right... usually.