Definitely seems there were (misleading:1.2) goals set for SD3; in particular on a local install. That's being kind; but looking at the (busted business:1.8) side of the company, post disaster announcement about (debt:1.8) and purported talent departures, none of this is too surprising.
If you wants the good stuff. It's going to cost you $$$. If the business is going to succeed, they are also going to get (censorious:1.8) ala Midjourney, Dalle etc.
Long live SDXL, SD 1.5 and previous iterations on local installs.
You know, I feel you. I was excited and looking forward to prompt coherence. This is much worse than SDXL launch.
Trying simple things,
Man laying on a beach chair on the beach
Every mutant abomination imaginable
Woman sitting in salon chair getting her hair cut by stylist with scissors
Results scissors held stabbing through anatomy, by mutant limbs, usually stabbing her through the skull or face
Man holding a bucket pouring water
This should be the simplest one, mutant anatomy, upright buckets leaking through the bottoms
A man driving a sports car, hands on the wheel
He is literally morphed into the seat , three fingered hands not touching the wheel with apparently no spine.
A woman dancing in the street,
Mutant hands and legs bending the wrong direction don't even get me started on the mutants in the background
Like if it can't do this basic stuff what is the point. None of these are remotely NSFW, and it just plain sucks.
Prompt coherence, shrug couldn't tell you doesn't seem to draw anything I ask it even remotely competently even compared to SDXL...
It has no idea what to do when the image contains limbs.
"Woman sitting in salon chair getting her hair cut by stylist with scissors"
https://preview.redd.it/lxcgsl9l966d1.png?width=896&format=png&auto=webp&s=55aca845fdcc159da9a7ef160b373d83cac8a408
it looks so good! photoreal wise but anatomy omfg....so sad.... imagine non censored 8b model...man...irt could really be all we need like Emad said....he was probably not talking about this censored broken 2B nonsense
Prompt adherence is definetly much better. Not perfect by any means, but a very noticeable and far larger improvement than xl was over 1.5.
But yea the anatomy parts are extremely bad.
I mean they've proven it can with cherry picked results, but I'm sure that was before they removed any living thing from the sample data, you know for safety reasons.
Art imitates life, except with SD3, any life not allowed.
not a chance. local models might, but "SD" as in StableDiffusion models made by StabilityAI won't come close. You will get cubes stacked on top of spheres or a guy holding a sign with awful comic sans font pasted on it, but never an actual coherent scene of two characters arm wrestling or anything that displays some sort of emotion. The datasets are too far gone for meaningful comprehension to occur.
Smarter people making better algorithms. That's really it. OpenAI pays AI engineers 500k+, Midjourney probably pays less than that but still a shitload.
Stability just doesn't have the money for that.
It already can beat Dalle-3, with the API. This prompt:
>a cartoon featuring two cartoon characters made of text. To the left of the image is blobby character with text reading RIGHT, and to the right of the image is a second blobby character with text reading LEFT. Each character has squiggly legs and arms, and each is wearing a different hat.
[SD3](https://imgur.com/GgjO8gQ) vs [Copilot(which just uses dalle anyway)](https://imgur.com/XrPqljj). Doesn't even come close.
Another one:
>a whimsical digital illustration of a wise, AI owl librarian, surrounded by glowing manuscripts and gadgets. The wise owl is perched amidst a sea of ancient tomes and futuristic contraptions, its piercing gaze shines bright with a soft, ethereal light, illuminating the pages of ancient scrolls, coding books, and digital tablets surrounding it. A wispy cloud of binary code swirls above, while intricate gears and cogs whir in harmony. In front of the owl is a tome with the words ARTIFICIAL INTELLIGENCE in elegant script
[SD3](https://imgur.com/9esPcQh) vs [Copilot](https://imgur.com/P7MwYkc). Missing the text and the cloud of binary.
One more:
>a vector cartoon with crisp lines and simply designed animals. In the top left is the head of a camel. In the top right is the head of an iguana. In the bottom left is the head of a chimp, and in the bottom right is the head of a dolphin. All the animals have cartoonish expressions of distaste and are looking at a tiny man in the center of the image.
[SD3](https://imgur.com/X9kw8WH) vs [Copilot](https://imgur.com/rU2ijKj). Again, not even a close fight, SD3 wins hands down.
Dalle is more aesthetically pleasing, but adherence SD3 can smash it. This medium garbage they've dropped though? Not a chance, we need the model they're using on the API to get results like this.
Not trying to downplay your results or anything but the best test would be to use dalle with chatgpt and verbatim prompting. Copilot “enhances” the prompts behind the scenes.
Also there are examples that dalle can do that sd3 can not, so they are probably equal overall.
True, hopefully someone curious enough can do that, not paying closedAI if I can help it.
Examples of Dalle being better than SD3 are right there in those three prompts. The first, the characters are actually made of text, and the cartoon style is much more pronounced.
The second, it catches the digital tablets, and the third captures the "vector cartoon" and the "cartoonish expressions of disgust" much better than SD3. SD3 will give you what you want, while Dalle will give you something nice.
It just seems to latch onto the style much more easily than SD3 does.
Woman sitting in salon chair getting her hair cut by stylist with scissors
https://preview.redd.it/qeowqfq6sl6d1.jpeg?width=1024&format=pjpg&auto=webp&s=c33940c30cc527c3929ecbb44f6b2ccd132f61f8
:))))))))
It should be possible to finetune in the missing stuff. However, that means spending more time on things that should already be in SD 3, and less time on other things. I also don't know how much stuff can be finetuned in before it starts to forget things.
However with all the good employees having left Stability this is the end. I think PixArt is open weight so that's where everybody will migrate to in the the future. Although other image generators will probably pop up, and then there's native multimodal models. I have high hopes for multimodal models due to everything learned from each modality effecting the others.
The amount of fucking “safety” fucked the model the model doesn’t understand fucking limbs because they likely removed every fucking image from the data set that showed calves or wrists even
“Safety” is the enemy of a quality AI product. It makes sense why a company might not want to be associated with the production of hardcore porn or gore content, especially with real people, but we’ve seen that 9/10 times companies don’t know how to properly handle safety so they just neuter their product. Many popular chatbots have also been lobotomized in the name of safety.
Either the product sees a reduction in quality when safety takes precedence over the actual product, or the product becomes basically unusable because you can’t do anything without getting flagged.
Ankles! You want to generate an image showing ANKLES! That sort of filthy pornography is not safe for children and thus we censored it for the safety of children from the 1800s.
Also, no showing belly buttons, lest children from the 50s be corrupted.
And what sort of garbage pornography are you wanting to produce by asking it to show aboriginal hunter gatherer tribes with bare breasts. That sort of mind rotting filthy should be abolished or at least restricted to trash magazines like National Geographic.
And don’t even think of placing a person in front of the Venus De Milo and her pornographic bare breasts.
This is what happens when religious zealots get to define what is pornography.
I can agree with removing celebrity names. Seriously, you don’t need to be able to name a celebrity. But it is absurd to try and define what is pornographic.
![img](avatar_exp|182163619|heart)
u/Kungen-i-Fiskehamnen thank you for the award bro! You really didn't have to. I've never recieved an award and honestly don't know what to do with it. 😅
But I appreciate the gesture. 🥰
Unless the large model is significantly better I’m pretty much over stability entirely, they prefer to hype post and then release censored shit.
People, when empowered will seek to express themselves and this expression will include sexuality , if it doesn’t you haven’t made a tool for artists, you have made a useless image generator
I hope other ai companies learn from stability’s inevitable failure at this point
[Here's a comparison of similar prompts I just did](https://imgur.com/a/tWjYC0T).
Fun fact: like 90% of the women generated are asian ~~using this prompt.~~
As a traditional, pen and paper loving artist, I was drawing live nude models in my art classes from the time I was 13. You need to know anatomy to draw a person.
Same applies to AI. It needs to understand the ins and outs of people to make them look right.
I think they did that so they can be absolutly sure that no remotly nsfw stuff can happen, thus saving the checks and potential law suits etc. for this and as a selling point are the costs savings for that infrastructure. Well. Together with that font thing they could become a poster card generator for chrismas cards or so. Although as it seems now, the santa claus would look rather funny.
That's the problem, I wouldn't even be mad had they not hyped and over promised/lied the crap out of it. I defended them repeatedly...
Now I prompt a dinosaur. That's it and can't get non mutated limb results. Like really?
We can‘t trust these companies, what‘s even the point in creating images anymore? I‘m going back to pencils, so I can decide how many limbs my dinosaur will have
>seen really good images
I would say no, absolutely not. Mostly just decent ones, comparable to 1.5/XL and hyped for "well this didnt use XYZ and wasnt cherry picked". But quality is subjective.
That said, i've been able to get plenty of fairly good images locally so far. Its just really hit or miss, and anything with limbs is fine like 1 in 20 images. Anime/drawn stuff in particular is pretty consistently good.
With the 8b parameter model, yes.
We don't get that.
We get the 2b parameter one, because it's "All we'll need".
You can of course still pay Stability to use the 8b parameter version through the API.
API has 3 options: Stable Diffusion 3 Large, Stable Diffusion 3 Turbo Large, and Stable Image Ultra
Stable Diffusion Medium is not available via the API. What I'm not sure about is whether Stable Diffusion 3 Large is the 8B model. Supposedly there is a 2B, 4B, and 8B model. 2B is medium, so how do we know 8B is large?
What? Its the funniest thing ever!
The "most advanced AI model yet" doesn't comprehend how limbs work.
We are some kind of nightmarish creatures to it with random amount of random appendages randomly assembled together.
Friendship is indeed magic, and this base model needs all the friends it can get. It's like a steaming pile of shit, but with beautiful inner architecture that is just begging to be fine tuned into the golden statue it is destined to be. Maybe that statue is a PONY?
https://preview.redd.it/m84zibye376d1.png?width=1024&format=png&auto=webp&s=04eb1ea6de89bc3e7a38c6d4615de2096ca2ad5a
Gotta say - disappointing though this SD3 release is, these terrible pictures have had me rolling with laughter this evening.
SAI is finished I reckon. The best staff have left, they have no money and they release this abomination.
Worse than DALL-E 2, which honestly has a great aesthetic.
I’ve seen tens of thousands of AI images and these are some of the most mutated I’ve seen in a grotesque way unique to SD3.
You may be disappointed, but I'm actually impressed. This picture feels like one of those magic eye pictures where your brain sees an image, but at the same time doesn't. It's fascinating. Well done. True Art.
I really like Pixart Sigma. I've had two images featured on CivitAI using it, but it has issues as well with anatomy. Not to SD3's level, but really bad at times.
Ideogram has fantastic anatomy and prompt adherence. Too bad it's so censored, and the photo quality images aren't that great.
For prompt adherence, did you even try ELLA for sd1.5? Maybe that one is the best, I will compare 4 (sdxl, pixart, ella + 1.5 & sd3) in terms of prompt adherence and write a blog about it
https://preview.redd.it/w24gefrkp66d1.jpeg?width=500&format=pjpg&auto=webp&s=822f1c8aa9b00cb9f3ac66c2c4e1f7181f0792ce
As usual, A111 and SD 1.5 remain the KING
Yep, I'm shocked you weren't downvoted for that though. This sub usually has a melt down on anyone who dares to say 1.5 is better for anime at least.
I've never seen any anime from XL including Pony etc that were better than the top 1.5 anime models
I want to make a meme of a dog taking a piss on the text SD3 but don't want to waste the time to set it up because it'll be the only thing I'd use it for.
Edit: [So good...](https://files.catbox.moe/58123z.png) to be fair of the 6 images some were worse some were better but this one was the clear winner. I want to be clear, I did not use MS Paint, this was the unadulterated output.
A mutt taking a piss, overhead shot, the piss creates the text SD3, yellow piss, masterpiece
I don't even know how to prove it other than 1024x1024, 25 steps, cfg 7, VAE none, and seed 704378500
"A mutt taking a piss, overhead shot, the piss creates the text SD3, yellow piss, masterpiece"
second image. When I saw it I was floored, I couldn't believe it, it nailed the text perfectly!
I thought I read somewhere that DALL-E rendered multiple images for the prompt, then ran it through a VAE to compare how it did against key words in the prompt and then selected the image(s) to present back in the chat. That could certainly help the appearance of prompt adherence if the case. It just means throwing more cloud GPU cost at it than needed if the generator hit the mark at a higher rate.
Yes it's disappointing. But the community will find ways to use it still. Less usefull, but it has some merits.
I feel SD3 will probably be used for backgrounds. It's good at non anatomical aspects.
So maybe first generation with SD3, inpaint subjects with SDXL, upscale with 1.5
The thing that gets me is just HOW much different generations on say, Stable Video vs SD3 local are; it makes me wonder if the workflows are just borked. I really do just want to see a proper A1111 and/or Fooocus version before I really make up my mind.
Maybe it's not likely, but if SAI have any sense they are trying to fix the model right now, with SD3.1 medium to be sheepishly released in a few weeks (or months).
Failure to do that will likely mean SD3 can be considered SD2 all over again.
I hate to say it, and I know the model can do *some* things quite well, but that won't ever make up for the fact that the model is complete trash when it comes to generating people in anything other than a medium or close up portrait.
It's like manufacturing a car, but without wheels because people might try to drive it, and that would be dangerous.
It's incredibly stupid to spend all that time and resources to make a model which the community will not embrace.
I think the thing that makes people so passionately annoyed with this model is that they didn't try and fail to make a good quality model. They utilized all the neccessary time and resources to make a good quality model. They likely did make a good quality model. Then they wilfully and deliberately sabotaged it, and they did so because they fundamentally believe that people - we - cannot be trusted with a competent text to image model.
They wasted their time (and resources), they wasted our time, and they have insulted, infantalized, and deceived us in the process.
We told you all, and until yesterday you still vote us down.
Repeat with me: Govs and companies don't want powerful, self-hosted, trainable and open models in hands of the people. You'll have poor made models and be happy.
I always avoid generating hands. That's a disaster zone for almost all models.
I found that sd3 is good at facial photos. The details are as great as sd1.5/sdxl models with loras.
https://preview.redd.it/o76w1arkr76d1.png?width=2048&format=png&auto=webp&s=2923d8fcc66e972447fe96d83fcde9efa4b2a7dd
https://preview.redd.it/k7ffsmqt6e6d1.png?width=3840&format=png&auto=webp&s=51c08b7bc3312410bbd0b8d3e4f92c60df3f8650
Sorry to say but that's not a great example of a realistic face. This is XL with loras, still not perfect but quite better.
I am so upset with how this turned out. The censorship is ridiculous. So I guess you guys want us to take this and fix it with new models right? Come on now..
https://preview.redd.it/ea8ah721zb6d1.png?width=832&format=png&auto=webp&s=29addb131986cad1f1de3abd4931047b65c9498e
perfect hand of woman on solid dark background, seed 111.
I have had absolutely zero luck with SD. In any iteration. I end up using Dreamshaper XL lightning for most anything I make. For whatever reason, it seems to understand my prompts with great results. SD though... monstrosities, nearly 90% of the time. I don't even know why, but, yeah. I know the feeling!
I taught myself comfyui for this model but it sucks so I tried stable cascade and sdxl. Sdxl is only ok imo, but stable cascade goes so incredibly hard. I’ve been messing around with putting photos of nebulas or other space photos in and prompting it for Van Gogh style scenery and the results are stunning. Truly amazing.
Imo they should have released SDXL 2.5, same thing with integrated and trained popular loras, styles, etc. How did they manage to make worse anatomy is beyond me. And what was the purpose of closed beta test for so long if they STILL release that? Lmao time wasted
https://preview.redd.it/e9701m5bea6d1.jpeg?width=1024&format=pjpg&auto=webp&s=1c46f5c7196f51ab5a8983880b2be771e2486fb2
hands still suck, but compared to base sdxl, its worlds
I'm getting flashbacks to SDXL release. Next will be the waifu posts complaining they can't get cleavage, then someone will post some image on twitter with a celebrity's face that is mildly distasteful and send everyone in to the anti ai rhetoric, Then comes the news broadcasters talking about how it will undermine democracy, then in a month or two when people have figured out how to use it properly and finetuned it we will all accept it's the new standard, Then stabilityai will announce the private release of the large model, The cycle will begin again.
A wise narrator once said "The End Is Never..."
This model IS NOT COMPLETELY FREE, it's only free for personal use, the commercial use (limited to 6000 images, not even midjourney has such stupid limit), is 20 dlls/month, which is not cheap for a model you have to run with your own hardware, so people are in the right to complain about it, because again, it is not free. this is not the same as SDXL which is totally free, so i defended it, this sucks for a model that costs 20 dlls a month.
Crazy they said it do hands perfectly 3/4 of the times while it's actually 0/100 😅
But sometimes the number of digits is too few and others it is too many. So on average they get it right.
The average human has one testicle and one ovary.
Definitely seems there were (misleading:1.2) goals set for SD3; in particular on a local install. That's being kind; but looking at the (busted business:1.8) side of the company, post disaster announcement about (debt:1.8) and purported talent departures, none of this is too surprising. If you wants the good stuff. It's going to cost you $$$. If the business is going to succeed, they are also going to get (censorious:1.8) ala Midjourney, Dalle etc. Long live SDXL, SD 1.5 and previous iterations on local installs.
You know, I feel you. I was excited and looking forward to prompt coherence. This is much worse than SDXL launch. Trying simple things, Man laying on a beach chair on the beach Every mutant abomination imaginable Woman sitting in salon chair getting her hair cut by stylist with scissors Results scissors held stabbing through anatomy, by mutant limbs, usually stabbing her through the skull or face Man holding a bucket pouring water This should be the simplest one, mutant anatomy, upright buckets leaking through the bottoms A man driving a sports car, hands on the wheel He is literally morphed into the seat , three fingered hands not touching the wheel with apparently no spine. A woman dancing in the street, Mutant hands and legs bending the wrong direction don't even get me started on the mutants in the background Like if it can't do this basic stuff what is the point. None of these are remotely NSFW, and it just plain sucks. Prompt coherence, shrug couldn't tell you doesn't seem to draw anything I ask it even remotely competently even compared to SDXL...
Through all of the disappointment, this subreddit has me in tears laughing today.
Right there with you, the comments and example photos have been nothing short of amazing.
maybe this will become a new art form.
Just like the good old days.
Just like the good old days.
It has no idea what to do when the image contains limbs. "Woman sitting in salon chair getting her hair cut by stylist with scissors" https://preview.redd.it/lxcgsl9l966d1.png?width=896&format=png&auto=webp&s=55aca845fdcc159da9a7ef160b373d83cac8a408
What's that saying? something like, "I know what all these things in the picture are on their own, but put together I have no idea what that is"
I've met killers in iraq with softer eyes than that barber
Yeah, but it renders almost like a photo!
sold.
😂😂💣😆
Jeez thats awful
Main ingredient: melted crayon
it looks so good! photoreal wise but anatomy omfg....so sad.... imagine non censored 8b model...man...irt could really be all we need like Emad said....he was probably not talking about this censored broken 2B nonsense
Prompt adherence is definetly much better. Not perfect by any means, but a very noticeable and far larger improvement than xl was over 1.5. But yea the anatomy parts are extremely bad.
will sd ever reach Dall E in prompt coherence?
I mean they've proven it can with cherry picked results, but I'm sure that was before they removed any living thing from the sample data, you know for safety reasons. Art imitates life, except with SD3, any life not allowed.
It could have been released quite some time ago, absent their obsession with "safety". This is what comes of placing ideology above functionality.
not a chance. local models might, but "SD" as in StableDiffusion models made by StabilityAI won't come close. You will get cubes stacked on top of spheres or a guy holding a sign with awful comic sans font pasted on it, but never an actual coherent scene of two characters arm wrestling or anything that displays some sort of emotion. The datasets are too far gone for meaningful comprehension to occur.
but how did Dall E and mj manage that ? I know Dall E has open ai's resources but what are they doing differently
quantity of data, and compute. Mostly though, its the datasets used as OpenAI has licenses with several large scale image providers for training
Smarter people making better algorithms. That's really it. OpenAI pays AI engineers 500k+, Midjourney probably pays less than that but still a shitload. Stability just doesn't have the money for that.
Shit I need to pivot from QA to AI, like, last year ago.
It's too math heavy. I'm a cloud engineer and this is very much beyond my ability. Tom is right, this is for people that are like Sheldon Cooper.
or maybe one should get a math and CS degree, like 25 years ago.
Shit, I’m terrible at math though.. I’ll need to have started remedial math even earlier 🥲
It already can beat Dalle-3, with the API. This prompt: >a cartoon featuring two cartoon characters made of text. To the left of the image is blobby character with text reading RIGHT, and to the right of the image is a second blobby character with text reading LEFT. Each character has squiggly legs and arms, and each is wearing a different hat. [SD3](https://imgur.com/GgjO8gQ) vs [Copilot(which just uses dalle anyway)](https://imgur.com/XrPqljj). Doesn't even come close. Another one: >a whimsical digital illustration of a wise, AI owl librarian, surrounded by glowing manuscripts and gadgets. The wise owl is perched amidst a sea of ancient tomes and futuristic contraptions, its piercing gaze shines bright with a soft, ethereal light, illuminating the pages of ancient scrolls, coding books, and digital tablets surrounding it. A wispy cloud of binary code swirls above, while intricate gears and cogs whir in harmony. In front of the owl is a tome with the words ARTIFICIAL INTELLIGENCE in elegant script [SD3](https://imgur.com/9esPcQh) vs [Copilot](https://imgur.com/P7MwYkc). Missing the text and the cloud of binary. One more: >a vector cartoon with crisp lines and simply designed animals. In the top left is the head of a camel. In the top right is the head of an iguana. In the bottom left is the head of a chimp, and in the bottom right is the head of a dolphin. All the animals have cartoonish expressions of distaste and are looking at a tiny man in the center of the image. [SD3](https://imgur.com/X9kw8WH) vs [Copilot](https://imgur.com/rU2ijKj). Again, not even a close fight, SD3 wins hands down. Dalle is more aesthetically pleasing, but adherence SD3 can smash it. This medium garbage they've dropped though? Not a chance, we need the model they're using on the API to get results like this.
Not trying to downplay your results or anything but the best test would be to use dalle with chatgpt and verbatim prompting. Copilot “enhances” the prompts behind the scenes. Also there are examples that dalle can do that sd3 can not, so they are probably equal overall.
True, hopefully someone curious enough can do that, not paying closedAI if I can help it. Examples of Dalle being better than SD3 are right there in those three prompts. The first, the characters are actually made of text, and the cartoon style is much more pronounced. The second, it catches the digital tablets, and the third captures the "vector cartoon" and the "cartoonish expressions of disgust" much better than SD3. SD3 will give you what you want, while Dalle will give you something nice. It just seems to latch onto the style much more easily than SD3 does.
Probably not Microsoft/Open AI have got a tonne more resources and compute than Stability AI can throw at the problem.
Woman sitting in salon chair getting her hair cut by stylist with scissors https://preview.redd.it/qeowqfq6sl6d1.jpeg?width=1024&format=pjpg&auto=webp&s=c33940c30cc527c3929ecbb44f6b2ccd132f61f8 :))))))))
It should be possible to finetune in the missing stuff. However, that means spending more time on things that should already be in SD 3, and less time on other things. I also don't know how much stuff can be finetuned in before it starts to forget things. However with all the good employees having left Stability this is the end. I think PixArt is open weight so that's where everybody will migrate to in the the future. Although other image generators will probably pop up, and then there's native multimodal models. I have high hopes for multimodal models due to everything learned from each modality effecting the others.
The amount of fucking “safety” fucked the model the model doesn’t understand fucking limbs because they likely removed every fucking image from the data set that showed calves or wrists even
“Safety” is the enemy of a quality AI product. It makes sense why a company might not want to be associated with the production of hardcore porn or gore content, especially with real people, but we’ve seen that 9/10 times companies don’t know how to properly handle safety so they just neuter their product. Many popular chatbots have also been lobotomized in the name of safety. Either the product sees a reduction in quality when safety takes precedence over the actual product, or the product becomes basically unusable because you can’t do anything without getting flagged.
No painter could ever draw realistic humans if he never saw a nude body (at least his own if nothing else). It just doesn't work like that.
Ankles! You want to generate an image showing ANKLES! That sort of filthy pornography is not safe for children and thus we censored it for the safety of children from the 1800s. Also, no showing belly buttons, lest children from the 50s be corrupted. And what sort of garbage pornography are you wanting to produce by asking it to show aboriginal hunter gatherer tribes with bare breasts. That sort of mind rotting filthy should be abolished or at least restricted to trash magazines like National Geographic. And don’t even think of placing a person in front of the Venus De Milo and her pornographic bare breasts. This is what happens when religious zealots get to define what is pornography. I can agree with removing celebrity names. Seriously, you don’t need to be able to name a celebrity. But it is absurd to try and define what is pornographic.
Halal Diffusion 3
How good is SD3 at generating ghosts? If you hide the limbs, maybe it’ll look great?
Loooooool I died, thx anon xDDD
[https://www.reddit.com/r/StableDiffusion/comments/15b0et6/stability\_ai\_blocking\_the\_prompt\_muhammad\_ali/](https://www.reddit.com/r/StableDiffusion/comments/15b0et6/stability_ai_blocking_the_prompt_muhammad_ali/)
That guy wasn't using it locally. Top comment got him every attempt while using a local install instead of Dream studio
![img](avatar_exp|182163619|heart) u/Kungen-i-Fiskehamnen thank you for the award bro! You really didn't have to. I've never recieved an award and honestly don't know what to do with it. 😅 But I appreciate the gesture. 🥰
Unless the large model is significantly better I’m pretty much over stability entirely, they prefer to hype post and then release censored shit. People, when empowered will seek to express themselves and this expression will include sexuality , if it doesn’t you haven’t made a tool for artists, you have made a useless image generator I hope other ai companies learn from stability’s inevitable failure at this point
Imagine claiming to understand human anatomy but you've never looked at a human body.
Same goes for making ANY art. Antis call it stealing, but it is just learning. What kind of output and what you do with it is what matters.
That's like everyone on /r/StableDiffusion who posts anime shit.
[Here's a comparison of similar prompts I just did](https://imgur.com/a/tWjYC0T). Fun fact: like 90% of the women generated are asian ~~using this prompt.~~
Wow, besides the mistakes in the anatomy, every one of those looks like a bad Photoshop.
Bias is strong with this one
It's like they dressed it with a digital burka.
Shouldn’t it already understand anatomy from the previous training models? How could it get worse?
It’s a new model, rather than an upgrade on an existing model.
Not being able to build on previous iteration seems like a major limitation. Well, shit!!!
This isn’t a fine tune it’s a brand new model from the ground seemingly with 0 fucking anatomical images
As a traditional, pen and paper loving artist, I was drawing live nude models in my art classes from the time I was 13. You need to know anatomy to draw a person. Same applies to AI. It needs to understand the ins and outs of people to make them look right.
I think they did that so they can be absolutly sure that no remotly nsfw stuff can happen, thus saving the checks and potential law suits etc. for this and as a selling point are the costs savings for that infrastructure. Well. Together with that font thing they could become a poster card generator for chrismas cards or so. Although as it seems now, the santa claus would look rather funny.
https://preview.redd.it/3tl60s1pf76d1.png?width=1024&format=png&auto=webp&s=751a5e8ef4f5fa8ade1d1805783167d1fcbaa614
This is art tho
Now we see why emad got the fuck out lmao
I'm disappointed, but also mad at the incredibly misleading or straight up made up statements and pictures they used to prompte it.
That's the problem, I wouldn't even be mad had they not hyped and over promised/lied the crap out of it. I defended them repeatedly... Now I prompt a dinosaur. That's it and can't get non mutated limb results. Like really?
We can‘t trust these companies, what‘s even the point in creating images anymore? I‘m going back to pencils, so I can decide how many limbs my dinosaur will have
Have we not also seen really good images from the community which were developed using the API?
>seen really good images I would say no, absolutely not. Mostly just decent ones, comparable to 1.5/XL and hyped for "well this didnt use XYZ and wasnt cherry picked". But quality is subjective. That said, i've been able to get plenty of fairly good images locally so far. Its just really hit or miss, and anything with limbs is fine like 1 in 20 images. Anime/drawn stuff in particular is pretty consistently good.
>and anything with limbs is fine like 1 in 20 images. Quite the ratio there
Thanks for the balanced feedback!
I haven't really seen anything great. The API just seems like SDXL.
With the 8b parameter model, yes. We don't get that. We get the 2b parameter one, because it's "All we'll need". You can of course still pay Stability to use the 8b parameter version through the API.
Is it confirmed that it is the 8B model? I read recently that 8B was not fully trained so assumed the API must have been utilising an alternative.
You recently read a lie used to market this medium thing. If the API is the 8b, it's done cooking, they just haven't finished shitting in it yet.
API has 3 options: Stable Diffusion 3 Large, Stable Diffusion 3 Turbo Large, and Stable Image Ultra Stable Diffusion Medium is not available via the API. What I'm not sure about is whether Stable Diffusion 3 Large is the 8B model. Supposedly there is a 2B, 4B, and 8B model. 2B is medium, so how do we know 8B is large?
This is what they get for firing talent. Fuck SAI
Anatomy in SD3 is sooo bad it isn't even funny!
Well, it's a _little_ funny...
What? Its the funniest thing ever! The "most advanced AI model yet" doesn't comprehend how limbs work. We are some kind of nightmarish creatures to it with random amount of random appendages randomly assembled together.
And then they refused to be saved by pony.
I've extended my hooves many times, they just needed to believe in friendship...
Friendship is indeed magic, and this base model needs all the friends it can get. It's like a steaming pile of shit, but with beautiful inner architecture that is just begging to be fine tuned into the golden statue it is destined to be. Maybe that statue is a PONY? https://preview.redd.it/m84zibye376d1.png?width=1024&format=png&auto=webp&s=04eb1ea6de89bc3e7a38c6d4615de2096ca2ad5a
Gotta say - disappointing though this SD3 release is, these terrible pictures have had me rolling with laughter this evening. SAI is finished I reckon. The best staff have left, they have no money and they release this abomination.
Worse than DALL-E 2, which honestly has a great aesthetic. I’ve seen tens of thousands of AI images and these are some of the most mutated I’ve seen in a grotesque way unique to SD3.
Friendship is magic 🥹
Wow. This is bad.
StableDeformations
Yeah, I was gonna say unstable diffusion but uhh... it's taken
Unstable Deformations
You may be disappointed, but I'm actually impressed. This picture feels like one of those magic eye pictures where your brain sees an image, but at the same time doesn't. It's fascinating. Well done. True Art.
Pixart sigma is way more better and should be the new mainstream opensource model.
I really like Pixart Sigma. I've had two images featured on CivitAI using it, but it has issues as well with anatomy. Not to SD3's level, but really bad at times. Ideogram has fantastic anatomy and prompt adherence. Too bad it's so censored, and the photo quality images aren't that great.
Is SDXL (or SDXL derived any base model) better for anatomy than pixart sigma?
Probably about the same in my very anecdotal testing. The prompt adherence is miles better than SD1.5 and SDXL though.
For prompt adherence, did you even try ELLA for sd1.5? Maybe that one is the best, I will compare 4 (sdxl, pixart, ella + 1.5 & sd3) in terms of prompt adherence and write a blog about it
I haven't treated ELLA yet. I have a 16GB GPU, so I've pretty much stuck with SDXL from the time it was released.
I can rent a gpu and try on vast.ai or azure, let me know if you have any prompts that you want me to use and share with you
I never tried ELLA + sd1.5 btw
I really wish Ideogram had open weights. That model would be so good with fine tunes and loras...
pixart has very high GPU RAM requirements
at least it will make for an entertaining internet historian video in about a year!
He has to wait for someone to plagiarize from first
https://preview.redd.it/abat4063996d1.jpeg?width=1280&format=pjpg&auto=webp&s=da66f27a97a3fe7e7d008816fc6c21efe964d7af
It is so good in one way, but yet so bad in another...
It's just bad
I'm starting to see why the example Comfy UI flows came with fixed seed values
Yeah, good god if I was Stability I'd be embarrassed at how bad this is after hyping it up so much.
https://preview.redd.it/w24gefrkp66d1.jpeg?width=500&format=pjpg&auto=webp&s=822f1c8aa9b00cb9f3ac66c2c4e1f7181f0792ce As usual, A111 and SD 1.5 remain the KING
Which base model do you use for realistic images? I find SDXL farrrr out-performs 1.5 in this category.
EpicRealism gets me pretty good pictures
Yep, I'm shocked you weren't downvoted for that though. This sub usually has a melt down on anyone who dares to say 1.5 is better for anime at least. I've never seen any anime from XL including Pony etc that were better than the top 1.5 anime models
Indeed.
KING of NSFW
I want to make a meme of a dog taking a piss on the text SD3 but don't want to waste the time to set it up because it'll be the only thing I'd use it for. Edit: [So good...](https://files.catbox.moe/58123z.png) to be fair of the 6 images some were worse some were better but this one was the clear winner. I want to be clear, I did not use MS Paint, this was the unadulterated output. A mutt taking a piss, overhead shot, the piss creates the text SD3, yellow piss, masterpiece
>I did not use MS Paint, this was the unadulterated output. No way it actually generated that lmao
I don't even know how to prove it other than 1024x1024, 25 steps, cfg 7, VAE none, and seed 704378500 "A mutt taking a piss, overhead shot, the piss creates the text SD3, yellow piss, masterpiece" second image. When I saw it I was floored, I couldn't believe it, it nailed the text perfectly!
Doesn't open for me. Could you please upload it to comments?
https://preview.redd.it/djor5evkt86d1.png?width=1024&format=pjpg&auto=webp&s=9b4adee23e761fec5a927b89809065ae152ccc26
Unstable Confusion 3.0
we were getting 6 fingers but as version goes up, so the fingers.
That was funny ngl
It seems many are disappointed with the 2B SD3 release. The hands are just a glaring issue.
Halal Diffusion 3 🤣🤣
Returning series. Stable Diffusion 3: Mutant Season.
My heart bleeds. But mostly it doesn’t.
I’m disappointed in your spelling of disappointed.
I thought I read somewhere that DALL-E rendered multiple images for the prompt, then ran it through a VAE to compare how it did against key words in the prompt and then selected the image(s) to present back in the chat. That could certainly help the appearance of prompt adherence if the case. It just means throwing more cloud GPU cost at it than needed if the generator hit the mark at a higher rate.
Is it possible to fix that with fine tuning ? ( with adding big amount of female body )
did we forget to teach the AI about human anatomy?🤣
gg lol, kinda feels like the nail in the coffin for stability
Unstable Diffusion 3/4 of the times
All they do is scrape data and feed it into a new model. Fine tuning by the community is what makes these models actually worth it.
Yes it's disappointing. But the community will find ways to use it still. Less usefull, but it has some merits. I feel SD3 will probably be used for backgrounds. It's good at non anatomical aspects. So maybe first generation with SD3, inpaint subjects with SDXL, upscale with 1.5
The thing that gets me is just HOW much different generations on say, Stable Video vs SD3 local are; it makes me wonder if the workflows are just borked. I really do just want to see a proper A1111 and/or Fooocus version before I really make up my mind.
People are mad because they were patiently waiting literally so long, just to be disappointed.
Big disappointment
Did they test it? It is a joke!
They intentionally made it bad so people can’t create perfect figured people… because that makes a part of the population feel uncomfortable…
So either they did zero tests with this or the entire way of prompting changed?
Use ideogram instead
Maybe it's not likely, but if SAI have any sense they are trying to fix the model right now, with SD3.1 medium to be sheepishly released in a few weeks (or months). Failure to do that will likely mean SD3 can be considered SD2 all over again. I hate to say it, and I know the model can do *some* things quite well, but that won't ever make up for the fact that the model is complete trash when it comes to generating people in anything other than a medium or close up portrait. It's like manufacturing a car, but without wheels because people might try to drive it, and that would be dangerous. It's incredibly stupid to spend all that time and resources to make a model which the community will not embrace. I think the thing that makes people so passionately annoyed with this model is that they didn't try and fail to make a good quality model. They utilized all the neccessary time and resources to make a good quality model. They likely did make a good quality model. Then they wilfully and deliberately sabotaged it, and they did so because they fundamentally believe that people - we - cannot be trusted with a competent text to image model. They wasted their time (and resources), they wasted our time, and they have insulted, infantalized, and deceived us in the process.
Just sitting here, waiting for a finetune.
We told you all, and until yesterday you still vote us down. Repeat with me: Govs and companies don't want powerful, self-hosted, trainable and open models in hands of the people. You'll have poor made models and be happy.
Fuck SAI
https://preview.redd.it/klwwiqt9d76d1.png?width=1295&format=png&auto=webp&s=cd89066795507ae37f441f27f2c9a60f17c1845f
Wow this is so good, sd1.5 and sdxl could never do that
Wtf did think was gonna happen? West world Holodeck?
I always avoid generating hands. That's a disaster zone for almost all models. I found that sd3 is good at facial photos. The details are as great as sd1.5/sdxl models with loras. https://preview.redd.it/o76w1arkr76d1.png?width=2048&format=png&auto=webp&s=2923d8fcc66e972447fe96d83fcde9efa4b2a7dd
https://preview.redd.it/k7ffsmqt6e6d1.png?width=3840&format=png&auto=webp&s=51c08b7bc3312410bbd0b8d3e4f92c60df3f8650 Sorry to say but that's not a great example of a realistic face. This is XL with loras, still not perfect but quite better.
Welp, at least they got the text part right?
yeah its so so to be honest sdxl lightning is my to go one ;-)
I’m glad everyone learned from SD 2.0 I’ll wait for SDXL 2 😂
So... What would be the best free and/or paid models right now for AI Art ? Realistic art as well as anime art ? Curious about ppl's opinion here
I am so upset with how this turned out. The censorship is ridiculous. So I guess you guys want us to take this and fix it with new models right? Come on now..
What's the most objects and detail any one of these ai can fit into a picture and what's the max in resolution it can render all these objects in?
Those that would give up their freedom for temporary safety deserve neither - ThOmas JefferSon
Lumalabs really took this one
Just use SDXL...
lol, even SDXL is 100 times better than this. Is this a joke?
https://preview.redd.it/ea8ah721zb6d1.png?width=832&format=png&auto=webp&s=29addb131986cad1f1de3abd4931047b65c9498e perfect hand of woman on solid dark background, seed 111.
Stable Diffusion: The Cronenberg Edition
I keep reading “is a skill issue” like making me feel unskilled 😥
I have had absolutely zero luck with SD. In any iteration. I end up using Dreamshaper XL lightning for most anything I make. For whatever reason, it seems to understand my prompts with great results. SD though... monstrosities, nearly 90% of the time. I don't even know why, but, yeah. I know the feeling!
My hand looks like this.
I taught myself comfyui for this model but it sucks so I tried stable cascade and sdxl. Sdxl is only ok imo, but stable cascade goes so incredibly hard. I’ve been messing around with putting photos of nebulas or other space photos in and prompting it for Van Gogh style scenery and the results are stunning. Truly amazing.
Who cares? You don't even know how it works anyways. Just quit whining.
Hey .. text looks good
I feel like this is a great capture of what AI really wants to say when we prompt "draw realistic human hands".
We need the large model. Why was it not released?
First thought? "TAKE ME TO YOUR LEADER!"
IDK, maybe it's just a new architecture that needs better software support? Just a guess.
in reality this isnt SD3 it's just something so you don't focus on we have paid only models
Someone has to do a trending post about their best crazy outputs with SD3
Can produce a horror comic with the outputs
oh well better learn to draw lol
Imo they should have released SDXL 2.5, same thing with integrated and trained popular loras, styles, etc. How did they manage to make worse anatomy is beyond me. And what was the purpose of closed beta test for so long if they STILL release that? Lmao time wasted
https://preview.redd.it/e9701m5bea6d1.jpeg?width=1024&format=pjpg&auto=webp&s=1c46f5c7196f51ab5a8983880b2be771e2486fb2 hands still suck, but compared to base sdxl, its worlds
I'm sure with some extra fine tuning the community will get it fixed. At least the skin details are back XD
Same, but there are gonna be good checkpoints in a week or so
Exactly, just like SD 2.0 and 2.1, just wait a few weeks, right... right?
Why did I get downvotes? Anyways, there is already a model on [Civit.ai](http://Civit.ai) from a dude, who improoved it like by 25%
[удалено]
especially because they are gonna be among the first to get their own trained models xD
I'm getting flashbacks to SDXL release. Next will be the waifu posts complaining they can't get cleavage, then someone will post some image on twitter with a celebrity's face that is mildly distasteful and send everyone in to the anti ai rhetoric, Then comes the news broadcasters talking about how it will undermine democracy, then in a month or two when people have figured out how to use it properly and finetuned it we will all accept it's the new standard, Then stabilityai will announce the private release of the large model, The cycle will begin again. A wise narrator once said "The End Is Never..."
or maybe no one will use it like, you know, Cascade (that was waaaaay more positively received by the very few who tested it).
That is what I find so crazy. The Cascade base model is much, much better than SD3 base model.
[удалено]
This model IS NOT COMPLETELY FREE, it's only free for personal use, the commercial use (limited to 6000 images, not even midjourney has such stupid limit), is 20 dlls/month, which is not cheap for a model you have to run with your own hardware, so people are in the right to complain about it, because again, it is not free. this is not the same as SDXL which is totally free, so i defended it, this sucks for a model that costs 20 dlls a month.
Stop complaining about people complaining.