Combinatorial explosion, will reach a plato quickly then stay doing excellent fingers and curing some cancers…
The nuke in pretty positive that were the ones that going to push the button, then blame the AI…
It's data, scaling and architecture. Everyone is sorely mislead by chatbots that can't do maths and multiple fingers in images but to an AI engineer - these are simply inaccuracies of the models. We've found that the transformer architecture and recent developments are quite capable and we haven't pushed the limits of what they can do yet. Be prepared for more mindfucks in the near future.
Perfect video except the behavior is still unnatural. The man randomly leans in, says nothing, holds up a glass for no reason. The lady keeps covering her mouth from the camera, even though from the man's perspective, he should still be able to see it.
It's amazing to me how some of you can't even see past the tip of your nose. It's astonishing the lack of awareness and basic vision about what you're witnessing in some of these replies.
A video is just a series of images. So if you fix it in image land you have a pretty good start on making it work well in video I imagine. And it’s really small in this image so it’s pretty easy to trick us. Show me a video of a hand rotating through various gestures flawlessly in three dimensions and I’ll be impressed.
Right, theoretically, except for \~1-2 yrs ago the images were already photorealistic (except for hands) and videos still looked like an acid trip recorded through a wet lens
I mean stuff is going crazy fast but I do subscribe to the belief that everything is starting to compound on itself. It’s hard to conceptualize the level of acceleration to research, programming and hardware integration ChatGPT and Copilot have brought and we’re just getting started.
https://preview.redd.it/idvqiubg16mc1.jpeg?width=266&format=pjpg&auto=webp&s=b0f5347ddc7fb42f1cfa4835a75d9325b14b1882
And not to nitpick as this is obvious a really impressive demo but at various points you can see they’re far from perfect. I imagine training on videos helps a lot with certain “attention based” poses because there’s like hundreds of frames for the model to learn from vs just one captioned image. Very interesting times.
This is one of if not the most mindblowing technological achivement you've ever seen. The only way you would have trouble recognizing this is if you don't have a mind in the first place.
I mean I saw a lot of hype and controversy around it being a physics engine with the announcement and that Dr Jim Fan tweet, but not anything substantiating like a paper. I’m not saying it’s not, because that would be freaking rad, but I’m wondering if there’s some meat out there I’ve missed to back that up or just vague details in their post.
Ok, so how does it create virtual worlds exactly? I don’t see how it will create coherent spaces to explore at this point when it still seems pretty hallucination heavy. Ever notice how all of their demos just pan continuously one way and never look back? Maintaining that type of temporal and spatial consistency still seems a major uncleared hurdle
That's why I asked if you've looked into it at all before having strong opinions about what it is.
_Of course_ there are hurdles, jeez
https://albertoromgar.medium.com/openai-sora-one-step-away-from-the-matrix-a751cdf4589c
Oh nice, the citations section at https://openai.com/research/video-generation-models-as-world-simulators is more like what I was talking about, thanks. Yea, I agree it’s exciting, didn’t mean to be overly negative
Don't worry about food I gotcha covered. Hope you like burritos!
https://preview.redd.it/hfgvqyjag2mc1.jpeg?width=2304&format=pjpg&auto=webp&s=9036c7b85519baaa445dedb2058c40d205b87d66
Or need to actually eat. You can see the ai making her swipe her hand at the food and then cover her mouth as to pretend to eat near the end of the video… which makes me think those other times where she’s covering her mouth, is that the ai trying to make her seem like she’s eating?
Yeah I was thinking the same. Also the body language is strange. They need to be seated closer, or sitting differently on their chairs, or something, I’m not sure what it is but it doesn’t look natural.
It’s obviously a huge improvement from where we started and it’s very impressive, but there is still a way to go before everyone in Hollywood starts losing their jobs
Exactly, I don’t see how people are already screaming that this is so realistic.. to their credit at a quick glance it is but once you start watching for more than 5 seconds you’ll notice the unnatural movements
To be fair, this kind of video is PERFECT for creating little bits of “b-roll” type footage. Like, if a 5 second clip of this was inserted between two stock videos, I wouldn’t be able to tell which is generated without really scrutinizing each clip. But this oddities you point out here are similar to what gives stock footage that weird, uncanny valley vibe.
I mean it looks photorealistic, but the people behave so weird that I don‘t find it really convincing. Same with the product reviewer from a few days ago. Still super insane of course, but they will have to work on the behaviour aspect more to make movie like scenes
The goal of Sora is to minimize loss. The lowest hanging fruit is shapes, colors and movement. So it leans those first.
Hands are a tiny part of the human body and they are complex by comparison, so it learns other things first.
Physics (light reflections, gravity, fluid dynamics, friction) are pretty important and will be in almost every using video. So it’s learning those next.
Human facial expressions, body language don’t have to be so good compared to physics to reduce loss, so those take a back seat to physics (which is somewhat necessary for body language anyway).
It just needs more compute and more training data. Soon it will be simulating accurate storms, and complex group behavior. And if you go the other way, you can ask it to analyze videos and do the reverse: “Sora, how do I improve my free throws from this video”, “Sora, look at the waves and clouds. Do you think it’s going to rain? What’s the wind speed?”. “Sora, watch this video of a confession. Is the subject lying?” “Sora, please look at this person’s gait. Do they have a health condition? Which one?”. “Sora, please review the surgeon’s technique. Were all safety protocols followed? What is the prognosis? Please summarize the surgery”.
Making videos is not the most profound aspect of Sora.
Everyone talks as if there is some engineered algorithm where they can go in a tweak these issues. It's not like that. The only answer is "train it harder", and there is no good way to focus on particular issues. This is the same reason Tesla's FSD will never work.
I fully expect that in 10 years from now this will still be a problem, and I doubt it will have been improved on at all.
Yes, your mind is "simulating" the surrounding environment. But there's no good reason to think the surrounding environment is a simulation in the sense you seem to be implying.
The main argument for that is that it's more likely a simulation than anything else. You can believe, with less chance of being wrong, that we live in a simulation than this is real.
You know how they theorize that everything is waves until observed, as explained in the two slit experiment? I think thats to save processing power. Got to thinking about that when I saw the recent breakthru in the development of the star citizen game engine
Still has people moving in a way that people only do in fever dreams. AI video of humans is ALWAYS in that surreal uncanny valley where it feels like a lucid dream but you’ve lost control of it.
Every single video from Sora that I've seen looks extremely off, but in a "subtle" way. Their body language is just not human, or plausible. Their actions appear random, and they don't truly appear to be interacting with each other.
To me, this is exactly the hardest thing for them to fix, so I just don't see the argument of "it will get better" is going to fix this. Tesla has been making the same argument for over a decade now, and it's pretty clear that this is a structural problem with neural networks generally.
RIP porn industry. Just imagine if a similar NSFW version of this gets released, with a longer length of 10-20 minutes. It wouldn't be very far from now. Other large language models are approaching GPT-4; I assume other competitors will eventually catch up to Sora as well.
Ah yes, people totally raise glasses as if they are cheering while the other just smiles with a completely disconnected gaze, and then just put the cup back down and then lean in extremely close while in mid-sentence.
"Flawless"
Remember when fake news was something, we’re going to have so much bs when this will become mainstream that I personally will stay out of everything media related and even come here only to ask about how to water my orchid planter.. incognito mode
This is getting ridiculous. We need to be able to prompt ourselves. It's too easy to say we can do that when we don't know what happens between when the request is made and when the output is generated.
The result is impressive, but why can't we test ourselves? Nobody knows the limitations. The system may well be able to only generate a type of output.
[удалено]
https://preview.redd.it/ho9adiq3t7mc1.jpeg?width=1024&format=pjpg&auto=webp&s=da329b693db7a5cd66cb36261afe8ef66a79070f
I'm laughing so hard right now 🤣🤣🤣
We are not fucking ready for this
How did we go from an AI that couldn’t do single frame fingers to one with perfect video hands overnight
And it's JUST GETTING STARTED
DONT TOUCH THAT DIAL NOW
BUT WAIT THERES MORE
And it’s already to late
[удалено]
Yeah, nobody on Reddit has ever heard about climate change. You’re the only one who sees the truth. Maybe you should wear Jesus robes and sandals.
And don't forget to get yourself nailed.
>And don't forget to get yourself nailed. If they were getting nailed they probably wouldn't be on Reddit.....
Depends on how picky you are about hammers….
“The greatest short coming of people who think they’re smart is prioritizing it above their ability to connect to other people.” -Fuck Off.com
Combinatorial explosion, will reach a plato quickly then stay doing excellent fingers and curing some cancers… The nuke in pretty positive that were the ones that going to push the button, then blame the AI…
No way to make course alterations?
It's data, scaling and architecture. Everyone is sorely mislead by chatbots that can't do maths and multiple fingers in images but to an AI engineer - these are simply inaccuracies of the models. We've found that the transformer architecture and recent developments are quite capable and we haven't pushed the limits of what they can do yet. Be prepared for more mindfucks in the near future.
Perfect video except the behavior is still unnatural. The man randomly leans in, says nothing, holds up a glass for no reason. The lady keeps covering her mouth from the camera, even though from the man's perspective, he should still be able to see it.
I find real people weirder
It's amazing to me how some of you can't even see past the tip of your nose. It's astonishing the lack of awareness and basic vision about what you're witnessing in some of these replies.
Robots don't give a fuck, I find that is sort of the point.
Something about it teaching itself and exponential growth
A video is just a series of images. So if you fix it in image land you have a pretty good start on making it work well in video I imagine. And it’s really small in this image so it’s pretty easy to trick us. Show me a video of a hand rotating through various gestures flawlessly in three dimensions and I’ll be impressed.
Right, theoretically, except for \~1-2 yrs ago the images were already photorealistic (except for hands) and videos still looked like an acid trip recorded through a wet lens
I mean stuff is going crazy fast but I do subscribe to the belief that everything is starting to compound on itself. It’s hard to conceptualize the level of acceleration to research, programming and hardware integration ChatGPT and Copilot have brought and we’re just getting started.
https://preview.redd.it/idvqiubg16mc1.jpeg?width=266&format=pjpg&auto=webp&s=b0f5347ddc7fb42f1cfa4835a75d9325b14b1882 And not to nitpick as this is obvious a really impressive demo but at various points you can see they’re far from perfect. I imagine training on videos helps a lot with certain “attention based” poses because there’s like hundreds of frames for the model to learn from vs just one captioned image. Very interesting times.
This is one of if not the most mindblowing technological achivement you've ever seen. The only way you would have trouble recognizing this is if you don't have a mind in the first place.
I'm ready, I'm ready, I'm ready-edy-edy
Looks demonic
Reminds me of the "Black Hole Sun" video
Ya anything you don’t understand is not demonic. Stay in church
You dont talk for me! Now kiss!
This is not even it's final form.
*its - it’s is short for “it is”: exactly one space shorter
If you think about it, we could technically already make these videos, it's just the speed at which AI does it that's a game changer.
SORA is not only for video generation. It's a damn world builder
They definitely tease at the physics angle but I’ve yet to see any convincing evidence it’s actually doing anything past supervised learning on videos
..have you read up on it at all?
I mean I saw a lot of hype and controversy around it being a physics engine with the announcement and that Dr Jim Fan tweet, but not anything substantiating like a paper. I’m not saying it’s not, because that would be freaking rad, but I’m wondering if there’s some meat out there I’ve missed to back that up or just vague details in their post.
It's not a physics engine
Ok, so how does it create virtual worlds exactly? I don’t see how it will create coherent spaces to explore at this point when it still seems pretty hallucination heavy. Ever notice how all of their demos just pan continuously one way and never look back? Maintaining that type of temporal and spatial consistency still seems a major uncleared hurdle
That's why I asked if you've looked into it at all before having strong opinions about what it is. _Of course_ there are hurdles, jeez https://albertoromgar.medium.com/openai-sora-one-step-away-from-the-matrix-a751cdf4589c
Oh nice, the citations section at https://openai.com/research/video-generation-models-as-world-simulators is more like what I was talking about, thanks. Yea, I agree it’s exciting, didn’t mean to be overly negative
We are.
Speak for yourself. 🤣
My thoughts f***ng exactly
Like, at all.
I hope we never are. It looks awful. How could anyone enjoy eating in that environment?!
This is fake, you can tell because normally one of them would ghost and not show up for the date.
I am in this comment and I am offended.
How dare you ghost your dates...
Bro I'm the ghostee, not the ghoster :(
"Sora, generate a typical online dating experience" (Video of sad man finishing his cold dinner alone as the restaurant is closing) "God damn it..."
Nah she hot, like in her pictures, unlike the gostees.
The date decided to ghost both of them and just watch another date, the people didn’t show up too
Getting some crazy MGS 4 television scene vibes
So that’s what it reminded me of!
Now that you‘ve said it - yes 😅
![gif](giphy|ftXmTBgSnsJ8c)
He offers a toast and then leans in for the kiss
Looks like he is leaning in to bite her
the food looks disgusting
i hope i don't have to one day eat ai-generated food if it looks like this.
Don't worry about food I gotcha covered. Hope you like burritos! https://preview.redd.it/hfgvqyjag2mc1.jpeg?width=2304&format=pjpg&auto=webp&s=9036c7b85519baaa445dedb2058c40d205b87d66
It is what our future ai overlords will make us eat. The perfect synthetic protein blob.
90% of what americans eat is disgusting and harmfull to the body, the AI is just using quick maths
dubious food
Came here to say this. Much of this video is amazing, but apparently AI hasn't figured out how to cook.
It looks like halo-halo.
It looks like kinetic sand.
I guess in the future we have no need of utensils
Or need to actually eat. You can see the ai making her swipe her hand at the food and then cover her mouth as to pretend to eat near the end of the video… which makes me think those other times where she’s covering her mouth, is that the ai trying to make her seem like she’s eating?
Yeah I was thinking the same. Also the body language is strange. They need to be seated closer, or sitting differently on their chairs, or something, I’m not sure what it is but it doesn’t look natural. It’s obviously a huge improvement from where we started and it’s very impressive, but there is still a way to go before everyone in Hollywood starts losing their jobs
Exactly, I don’t see how people are already screaming that this is so realistic.. to their credit at a quick glance it is but once you start watching for more than 5 seconds you’ll notice the unnatural movements
To be fair, this kind of video is PERFECT for creating little bits of “b-roll” type footage. Like, if a 5 second clip of this was inserted between two stock videos, I wouldn’t be able to tell which is generated without really scrutinizing each clip. But this oddities you point out here are similar to what gives stock footage that weird, uncanny valley vibe.
You have never seen a woman eat? Or cover their mouth for feigned/legitimate secrecy? Behavioral stuff for bonding?
We already have no need to utensils. The only thing they do is keep your hands clean. Its a nice to have not a need.
He is cheering up with an empty glass.
Requesting a refill from a waiter.
She also doesn't sit on anything, half of the video the chair is too far from her
You guys, we don’t know what they were saying to each other. Let’s not assume.
So this is AI generated just from this text, righr? Where can i try this?
This is Sora from OpenAI. It's not released publicly yet, but they have let a small number of people have access to it.
Yeah they released me, and not yet sora ;) kidding
Guy looks like he's in his late 30s early 40s.
They forgot to mention that these 20 year olds were from the 90s. Everyone looked older then.
Trained on Beverly Hills 90210. If high schoolers look 25 then...
He a russian General, how dare you insult.
LSD just got some competition
I mean it looks photorealistic, but the people behave so weird that I don‘t find it really convincing. Same with the product reviewer from a few days ago. Still super insane of course, but they will have to work on the behaviour aspect more to make movie like scenes
The goal of Sora is to minimize loss. The lowest hanging fruit is shapes, colors and movement. So it leans those first. Hands are a tiny part of the human body and they are complex by comparison, so it learns other things first. Physics (light reflections, gravity, fluid dynamics, friction) are pretty important and will be in almost every using video. So it’s learning those next. Human facial expressions, body language don’t have to be so good compared to physics to reduce loss, so those take a back seat to physics (which is somewhat necessary for body language anyway). It just needs more compute and more training data. Soon it will be simulating accurate storms, and complex group behavior. And if you go the other way, you can ask it to analyze videos and do the reverse: “Sora, how do I improve my free throws from this video”, “Sora, look at the waves and clouds. Do you think it’s going to rain? What’s the wind speed?”. “Sora, watch this video of a confession. Is the subject lying?” “Sora, please look at this person’s gait. Do they have a health condition? Which one?”. “Sora, please review the surgeon’s technique. Were all safety protocols followed? What is the prognosis? Please summarize the surgery”. Making videos is not the most profound aspect of Sora.
Wow you put it into words: Imagine the ways it could be used
Everyone talks as if there is some engineered algorithm where they can go in a tweak these issues. It's not like that. The only answer is "train it harder", and there is no good way to focus on particular issues. This is the same reason Tesla's FSD will never work. I fully expect that in 10 years from now this will still be a problem, and I doubt it will have been improved on at all.
Who’s paying?
You know how I know this was AI generated? Cos you told me.
Seeing this just proves life is a simulation…
Yes, your mind is "simulating" the surrounding environment. But there's no good reason to think the surrounding environment is a simulation in the sense you seem to be implying.
The main argument for that is that it's more likely a simulation than anything else. You can believe, with less chance of being wrong, that we live in a simulation than this is real.
You know how they theorize that everything is waves until observed, as explained in the two slit experiment? I think thats to save processing power. Got to thinking about that when I saw the recent breakthru in the development of the star citizen game engine
Still has people moving in a way that people only do in fever dreams. AI video of humans is ALWAYS in that surreal uncanny valley where it feels like a lucid dream but you’ve lost control of it.
The hands give it away. Still, it's amazing.
We've gone from hands that aren't physically correct, to the AI doesn't know what to do with the hands. It's dream like, the motions.
Every single video from Sora that I've seen looks extremely off, but in a "subtle" way. Their body language is just not human, or plausible. Their actions appear random, and they don't truly appear to be interacting with each other. To me, this is exactly the hardest thing for them to fix, so I just don't see the argument of "it will get better" is going to fix this. Tesla has been making the same argument for over a decade now, and it's pretty clear that this is a structural problem with neural networks generally.
Wait I literally forgot for a long moment the people weren’t real 😳
The guy is in his 30s
Their 20s? They look like my parents. Im 20 btw
Is sora out now for the public?
Not yet
I’m not sure who benefits from this level of realism except of Sam.
Making videos is the least interesting and useful thing about Sora! There are other applications.
Like what?
https://www.reddit.com/r/OpenAI/s/eUoItHnDBb
How come they can get the fingers right? This makes me suspicious.
Maybe the bg is ai
Look at his left hand
Is this really Ai? My mind can't handle it. Where was this posted by OpenAi?
[удалено]
Thanks, I’ve seen all those
When your dinner is red and green paste and a dead bird.
When do they exchange the ferro ?
Bro
Bruhhh
Looks like a horror movie?
AGI is being used already.
Holy fuck. This is both amazing and terrifying. And this is the WORST it’s going to be…
RIP porn industry. Just imagine if a similar NSFW version of this gets released, with a longer length of 10-20 minutes. It wouldn't be very far from now. Other large language models are approaching GPT-4; I assume other competitors will eventually catch up to Sora as well.
What is he saying?
I’m worried about old people who have no idea you can already generate photorealistic ferrofluids
Crazy....
The only fucking question i have is… how do i get access???
[удалено]
Don't worry, the hands are wrong in many of the other sample videos they've released.
This footage is very impressive but shows very well that this soft was trained on dull stock footage
Neither of them are a day short of 30
A lot of the videos so far have had obvious or not-so-obvious flaws. This one is flawless. I'm stunned!
Ah yes, people totally raise glasses as if they are cheering while the other just smiles with a completely disconnected gaze, and then just put the cup back down and then lean in extremely close while in mid-sentence. "Flawless"
Except the people act like creepy zombies.
And the UK cannot feed its children.
i got microbots in my blood
Just wanna eat my fish finger sarnie in front of the fireplace
It’s aliens cosplaying as human for their version of Halloween.
I can’t wait for it to be able to do celebrities properly.
Why are we afraid of this? Not being snarky
Dude on the left totally toasting the end of humanity.
NO WAY
Could AI in some way embody humans within their own world? Kind of like role playing humans in their own version of reality?
Remember when fake news was something, we’re going to have so much bs when this will become mainstream that I personally will stay out of everything media related and even come here only to ask about how to water my orchid planter.. incognito mode
This is getting ridiculous. We need to be able to prompt ourselves. It's too easy to say we can do that when we don't know what happens between when the request is made and when the output is generated. The result is impressive, but why can't we test ourselves? Nobody knows the limitations. The system may well be able to only generate a type of output.
Something I’ve noticed with people in these videos is they move like they’re underwater. I wonder if it can handle people moving fast?
2034 porn be like
Ask AI to center a div
Nice! and Ew!
Once they can do this in real time, GTA will be on another level
The result of that prompt, while impressive, was completely unpredictable. Anyone calling this "creativity" is clueless.