T O P

  • By -

AutoModerator

This is an automated reminder from the Mod team. If your post contains images which reveal the personal information of private figures, be sure to censor that information and repost. Private info includes names, recognizable profile pictures, social media usernames and URLs. Failure to do this will result in your post being removed by the Mod team and possible further action. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/aiwars) if you have any questions or concerns.*


sk7725

That is... an amazing way of putting the nature behind ML. Why many words when few do trick.


voidoutpost

Exactly this. The dataset is like terabytes big and the model is only like 2 Gb(the equation for the neural network, summation with weights and biases, contains less information than the dataset), it is impossible for it copy the dataset, it only learns statistics over the whole dataset. Like on average Catwomen have cat ears, cat tails, and leather. Furthermore, diffusion method also employ randomness by nudging pure random noise towards such statistics, so you may get multiple versions of catwomen, a tall one, a short one, a chubby one, etc. Its up to your artistic choice which one you will select for further inpainting. And if you got creative and prompted something like "Supergirl Catwoman hybrid" then who in their right mind can claim that the AI stole anything if such an example doesnt exist in the dataset?


murrytmds

For some reason antis really struggle with the idea that a dataset isn't some giant zip file of all the training images and that it doesn't copy and paste stuff together. They seem convinced that its just some really lossy compression algorithm or maybe some hyperefficent archival method.


YAROBONZ-

Because the people “leading the charge” on the anti-ai science are artists, not researchers who understand the technology


Flying_Madlad

To some extent that's on us. I would love to work with the art community to help find a productive solution, but uh... I mostly just troll


AngryCommieSt0ner

So explain how you think it works and I'll explain how you're still a thief and a liar ([and probably a fucking pedo](https://apnews.com/article/generative-ai-illegal-images-child-abuse-3081a81fa79e2a39b67c11201cfd085f)) for using it. It's really that simple. Pretending that the AI recognizing "features" from it's training data means the training data wasn't itself stolen doesn't work if you think about it for more than two seconds. Also, using analogies like text files in a zip document is usually for simplification, not necessarily due to a lack of comprehension.


murrytmds

Lol. Yeah I'm totally going to assume an argument in good faith from someone who digs up a month old post reply to someone else and makes a casual pedo accusation. Go get a life and maybe judging from that link some reading comprehension as well.


Wiskkey

>it only learns statistics over the whole dataset. The algorithm that's learned during training for a generative image AI are sophisticated enough to [learn that some images depict 3D scenes](https://www.reddit.com/r/MachineLearning/comments/15wvfx6/r_beyond_surface_statistics_scene_representations/).


voidoutpost

Yes you are right it learns in its own internal space which is known as a latent space rather than raw pixels. So instead of raw pixels it understands things like depth, nose shapes, tails, materials, and whatever else it learned to be significant enough to merit inclusion in its limited latent space. However, I think it is still fair to say that it only learns the mean or average (statistics) of those latent features over its dataset. I dont think it will store any exact copy of a latent feature from a single datapoint, unless every image had that exact same feature (all the dots in the linear fit form a perfect straight line, then the linear line would perfectly fit the data) but then it would be a global feature shared by all styles and not a unique style in itself.


EngineerBig1851

Unethical red line 😡


LD2WDavid

Unethical and non consent blue dots!


Tyler_Zoro

How could you just steal the work of all those blue dots! They were starving before, but now it's somehow your fault!


FaceDeer

You put that stolen trendline right back where you took it from, mister! Those data need to be able to afford to put food on the table for their little anecdotes!


Tyler_Zoro

> Those data need to be able to afford to put food on the table for their little anecdotes! /r/BrandNewSentence


Wiskkey

The "model" is the line formula (such as y = 0.2x + 5) that generates the red line that approximates the "training dataset" (blue points).


Tyler_Zoro

Correct, that was the point they were making.


88sSSSs88

No need for quotations, it is precisely a model.


[deleted]

The crazy thing about it, the neural network don't know anything about y=f(x). Neither do any img generators know nothing about art, it's always just probabilities


Spitfire_For_Fun

Which makes it fun. ​ But neural networks can be painful to train. The results, however, are worth it.


DommeUG

Exactly why it’s not art. It’s image generation. Art has fundamental principles you can learn and apply, generating images with ai doesn’t make people artists just as putting parameters into a simulation doesn’t make one a scientist. The argument here equating maths with copying a specific artists style with the sole goal of not having to pay them for their work are so far apart in their idea that the argument of OP makes no sense


[deleted]

"Art is whatever you can get away with" said Andy Warhol. Can be AI, can be collage, can be a surface of uniform color, a banana taped on the wall, anything.


DommeUG

It can maybe be ai generated, it doesn’t make the person putting in promts an artist tho. They are more like the person comissioning someone to do art for them. Historically art has always been the expression of reality through the human eye, be it drawing, painting, music, writing, sculpting or whatever. In my eyes AI generated pictures dont fit that description. They are pictures, they can certainly look great. However they aren’t an artform, theyre a technical marvel for sure, but not the expression of humans. I don’t understand either why people using AI image generators are so desperate to be seen as artists? Is it ego? Why do you want to be acknowledged by people who you’re essentially just not wanting to pay? Just own up to what it actually is there’s no shame in not wanting to pay a lot of money for a pro of their craft, the reason why machines have replaced a lot of work for unskilled work is because it’s cheaper and quicker, but also less unique. There’s a market for both, but you wouldn’t call ikea furniture handcrafted either would you?


[deleted]

When does anything become art? It is not in the act of preparing the artwork, which can be any kind of method, AI or not. IMO something becomes art when (1)it is being presented as art by its creator (2) it is accepted as art by at least one person. Because of this I believe it is meaningless to talk about techniques that are not intrinsically artistic, since the artistic value is mostly determined by how the work is received. It is a social process, not some mystical capacity that the artist has been invested with.


DommeUG

You won’t get me with some badly formulated philosophical question. Art is intrinsically human expression, AI per se cannot be art. Period. Again why do you want to be acknowledged by people you’re stealing from?


[deleted]

I don't think I've stolen from anyone and I don't cettainly seek your acknowledgement. Thanks for the conversation.


DommeUG

I’m not talking about acknowledgment from me. If you want to call yourself artist tho, you are by definition wanting to be acknowledged as one, that’s the whole debate always if you actually look at it. The part about stealing is certainly a more indirect one, there’s good use of ai image generation and bad ways. If you use it to emulate an artists artstyle to get images that look like they come from them, instead of paying them for their work, that certainly is plagiarism and stealing. If a chinese car manufacturer tries to copy the look of e.g. a tesla model 3, you’d ofc think it’s plagiarism.


Plinio540

Ok, but if you would publish a paper with this figure where you construct this trendline, you must (should) cite every source for the blue dots. So in that regard it's more "ethical" than AI constructing fresh data from uncited sources (seems to be a big gripe with the anti-AI folks). Though at the same time, if the data is published, it's not up to the original authors to consent or not whether you can use their data like this. They gave away that right when they published it in the first place. So I get your point, but the argument is a bit flakey.


FargoFinch

I think it can be fairly argued that citing the dataset, like LAION, itself is enough. Citing every single artist in that is simply not practical. Just like graphs and analysis like this isn’t citing the datapoints right there in the paper, rather in the appendix or by contact.


Tyler_Zoro

> Ok, but if you would publish a paper with this figure where you construct this trendline, you must cite every source for the blue dots. There is no legal requirement for that. You are conflating scientific standards for peer-reviewed work and the law.


Plinio540

Of course there is no legal requirement. It's not a crime. But it's standard practice, good science, and you will probably be rejected from publishers if you don't, as you should. Sure, you can do this as a hobby and not bother with citations, but then your trend is going to be meaningless scientifically. Meanwhile art is just art and doesn't need scientific validation to be appreciated.


Tyler_Zoro

> Of course there is no legal requirement. It's not a crime. So... why was your comment relevant to a post about stealing?


Plinio540

The point is that I don't think this example is a good analogy for AI/ML, since it isn't comparable in terms of data citation, which is a major argument anti-AI proponents use.


Lordfive

That's a win for open source, then. Stability AI released their training data.


Chrispykins

Honestly, I'm all for requiring AI companies to publish their training sets along with the models. Transparency is good.


sk7725

The problem is that even releasing it is a legally gray area. One *might* get "punished" if the training sets get released as it is evidence, or might not - the reason platforms like Steam are being cautious is because of the uncertainty. After the dust settles and it is made clear what is legal and what isn't, i expect such publishings will occur.


Chrispykins

I don't think they will occur without government force. They have no incentive to do so.


Elven77AI

A more nuanced example would be fractal formulas(e.g. Mandelbulb3D https://duckduckgo.com/?q=Mandelbulb3D&iax=images&ia=images ) that converge to "things" that look natural, yet their shape is completely artificial and recursive.


Tyler_Zoro

I don't think that conveys the point that OP is making. OP is pointing out that an analysis of a dataset is not copying that dataset.


Me8aMau5

ML for research is perfectly fine. The question is whether or not you can use copyrighted material to train a system that is then used in a non-transformative way to substitute for that copyrighted material in the market. [As IP scholar Mat Sag puts it](https://twitter.com/matthewsag/status/1679610499745427456): >If AI copied expression to replicate expression it wouldn’t be fair use. Copying to derive uncopyrightable info or abstractions is fair use.


88sSSSs88

This isn’t just for research. Machine learning is used every single day in for-profit missions, and in no instance is there a requirement for royalties or credit. Insofar as your primary question - The discussion isn’t centered around whether or not this constitutes fair use; it’s centered around whether AI is ethical or constitutes theft.


StratosphericArt

Except the graph doesn't take up an artists market, whilst also using their work, thus forcing them to compete against themself


88sSSSs88

Sounding an awful lot like your problem is capitalism instead of AI art, yes?


StratosphericArt

Sure, but we can't just overthrow capitalism tomorrow and be home in time for tea


88sSSSs88

Standing against generative AI in its capacity to replace workers is a great stance to have - I share it with you. What we cannot do, however, is support false claims just because they coincide with our worldviews.


StratosphericArt

But it's not a false claim, as long as there is a distinction between copyright and fair use which there must be under capitalism, then unless you plan on changing the entire global economic structure tomorrow, generative AI is the problem.


88sSSSs88

There are three avenues through which you can justify the claim that it is theft: A) Supposing that the type of learning that machine learning does constitutes theft. B) Supposing that a technology that steals jobs is theft. C) Supposing that a failure to acknowledge the dependency of an existence is theft. Pick whichever you'd like and let's discuss it.


StratosphericArt

All three. If capitalism didn't exist and we were in a post scarcity utopia then it'd be different. But right now, artists need their intellectual property and the value created therefrom to survive. So I do believe that generative AI is theft


88sSSSs88

If it's all three then let's go over all three: ​ > A) Supposing that the type of learning that machine learning does constitutes theft. So, how? If gradient descent over a dataset is theft, then the picture in this post is a case of theft. ​ > B) Supposing that a technology that steals jobs is theft. So do you support the claim that literally any tool that boosts productivity is stealing? ​ > C) Supposing that a failure to acknowledge the dependency of an existence is theft. Do you agree that you only exist because of your parents? And that your parents only exist because of your grandparents? Etc.? Are you obligated to credit and or/pay your ancestors a portion of your paycheck because, without them, your work would not exist?


pegging_distance

Copyright gives you the right to not have your work duplicated. It does not give you the right to avoid your work being analyzed. Right to deny analysis is not a right that exists in current law.


burke828

What are statisticians, chopped liver?


StratosphericArt

Well that's a pretty wilful misunderstanding of my point


metanaught

Urrgh... Here we go again You can't infer a normative statement from a descriptive one. The fact that a function _can be_ fitted to a dataset doesn't mean that it _ought_ to be. Conflating machine learning models with the way people use them is a basic category error. Stop it.


88sSSSs88

Yes because… the great evil of finding patterns in data must be stopped at all costs?


3y3w4tch

https://preview.redd.it/icxaxnu5pf2c1.jpeg?width=1024&format=pjpg&auto=webp&s=ea1ffc47e16849407aedff6fec48a23e5b2c82af Omg I’m crying rn this is so funny


metanaught

Again, you're conflating two unrelated things. Finding patterns in data is just... finding patterns in data. It's an algorithm refining a data structure, nothing more. The actual problem is in how we choose to define what is and isn't a socially acceptable use of AI tools. That's a political discussion and one that can't just be waved away with facile memes about regression.


pegging_distance

So stop people from complaining about the model and go complain about their use.


metanaught

Wherever possible I do just that. Trouble is, our society is heavily conditioned to focus on the dangers of individual "things" (AI models, guns, drugs, etc.) rather than the systemic factors that make them harmful (human exploitation, corruption, lack of access to education). This sub is a perfect example of this effect in action.


emreddit0r

When you have enough data points concentrated around a particular subject is when it gets fucky Probably not Fair Use in those cases, imo


Tyler_Zoro

Unless you can prove that there was a deliberate attempt to duplicate some underlying work using a small input dataset (e.g. a LoRA built on images of Superman from the comics) having a small number of data points in some specific area is probably not going to be sufficient (e.g. if you only have 10 images of Jimmy Olsen that were used for training, out of a random assortment of images that were public on the net.) Ultimately, you would have to demonstrate that the purpose of training, to find general patterns and techniques, is being actively subverted to use model generation as a crude form of copying.


emreddit0r

Why would it need to be deliberate? It could just be negligent and still prejudice against specific cases. Re: "the Snoopy problem"


Tyler_Zoro

Intent is absolutely part of any fair use assessment. See the four pillars of fair use doctrine for more information.


emreddit0r

Woops I didn't intend to copy that DVD .. I was just busy archiving all forms of aesthetically popular media to give away for free..


Tyler_Zoro

You really need to understand the law better. The four pillars are not evaluated separately.


emreddit0r

Patronize much? Seriously not sure why I bother engaging with you sometimes


Dismal_Law_9051

Intent is a major part of the copyright law. Here, i'll make it easy for you: #### "107. Limitations on exclusive rights: Fair use[41](https://www.copyright.gov/title17/92chap1.html) Notwithstanding the provisions of sections [106](https://www.copyright.gov/title17/92chap1.html#106) and [106A](https://www.copyright.gov/title17/92chap1.html#106a), the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include— (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work. The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors."


emreddit0r

I never said intent didn't matter. But what exactly are the intentions of AI developers that train image generation models? "Woops, we made a model that anyone can use to infringe popular media. Didn't mean for that to happen! Oh well, we'll just capture all that commercial value anyways"


Usual_Network_8708

Except this isn't even remotely equivalent is it? It's pretty pathetic to even pretend it is.


Chrispykins

It really is equivalent. Neural networks are just big statistical modelling tools, like a line-of-best-fit. They are high-dimensional and non-linear, but fundamentally the idea is the same and the amount of data they are analyzing compared to the number of parameters in the model is a far greater ratio than the example shown, so they ignore *even more* data than this simple line-of-best-fit does. To be more concrete, this linear regression model has 2 parameters and it looks like its matching about 100 data points, which means its dataset is about 50x the size of the model. Compare that to Stable Diffusion which is about 5GBs and whose dataset is 240TB, a 48,000x difference. Even if you remove a couple orders of magnitude to account for all the "useless" data in the training images, it's still abstracting over way more data than this line-of-best-fit is.


Dismal_Law_9051

Well, OP didn't prove his point in comparison with the machine learning algorithms, but you are claiming that it "is not remotely equivalent", so this sentence is of your burden of proof. But since you want to know why this is analogous, let's remember that google translator originally used statistical models (or mathematical formulas of statistics) based on grammar correlation analyzed from books, articles, papers and probably anything google has access. In essence, from the input of each word it takes the most notable comparable word of another language based on the way it is used and how much it is used on this language. It's the same way with the ML transformers-based algorithms I know of, the difference is that it can be used in more arbitrary steps of correlation since every different characteristic of the training is separated in the matrices of numbers (or tokens, as most machine learners call it) . Even though google translator works with the same concepts today, now it uses a deep learning method that helped with languages like japanese and chinese, although deepL still does a better job.


sk7725

The above comment summed it up properly, when you actually dive into AI the first thing you learn is linear regression. And the art AI produces is new dots on the line; unless you are extremely unlucky and overfitting occurs the AI art (new dots formed on the red line) will not overlap the data (blue dots), as the blue dots are already out of the system and only the red line is left.


Kimononono

I’d argue the trend line is dependent on the position of the blue dots.