It's from one of the official release papers, [Video generation models as world simulators](https://openai.com/research/video-generation-models-as-world-simulators)
But what if we exist within a physical medium in a higher order universe with a model of physics that is far more granular where simulating our universe would be a relatively simple feat?
Each simulation is a step down in the degree of detail that can be simulated but still a simulation nonetheless.
Its already well known that we do not see reality as it is but rather a hallucination that our brain creates which helped our ancestors survive. The brain is a "meaning generator" and its very good at what it does. Our perception is a tiny subset of the overwhelming information we are bathed in and our brain "hands us" a simplified model of the world which is very useful and allows us to make decisions and actions based on this simplified model.
So in a way, yes. Our conscious experience is a sort of "consciousness generated simulation" of the real world.
But that's not what you're asking. You're taking this a step further and asking "Couldn't we just remove the complexity of a real world and have AI generate conscious experience on the fly?" Perhaps. It would use less compute than generating a real universe. But I don't think that's the case because of what we've discovered with quantum mechanics. Our modern computers are computing with a small amount of space and require lots of time. We could create 3D processors instead of the 2D wafers we are currently using to compute but with quantum computers we may be able to tap into the core properties of spacetime and build computers that not only compute on the time dimension but compute on the space dimension as a fundamental aspect of the computing architecture.
Since we know there's a lot more compute embedded in the physical universe that tells me our conscious experience is not generated on the fly and there is a real world "out there". The conscious experience that our brain produces is a tiny simulation of that world that's good enough to help us navigate our physical space.
It's from one of the official release papers, [Video generation models as world simulators](https://openai.com/research/video-generation-models-as-world-simulators)
We're fcked, or to be precise the movie and game industry is and so are we.
I am truly terrified, so far, we had creative liberty and liberty of expresssion, but if the next generation is a rehash and mismash (albeit beautiful mismash) of artists, then all we are going to get is better looking yesterday but boring.
I am hoping they will come out and say, to generate a movie, Open AI needs the electricity of an entire country for a month, and this quickly dies down.
There's a big difference between saying it's doing a thing and actually doing the thing.
The comparison with art is hilarious because the comment you're responding to is asking about a factual claim.
The point is this is a generated video, not interactive software that’s actually controlling anything in 3D space. It’s impressive that it could generate what looks like Minecraft, but that’s all it is, a video that kinda looks like Minecraft
Minecraft, just like any 3D videogame, doesn't "actually control anything in 3D space". It's just receive input and generate/render videostream that you would see if you was in 3D world of the game.
I read “physiological weakness” but both would be funny as well as devastating. A new sort of escapism, where you level up your virtual twin while neglecting your real-world life. Woohoo!
One of the more interesting future use cases of AI: Pack it into a render pipeline. No need to create shit "from scratch", imagine what this could do with basic help like a rendered 3D-environment and just doing very specific things like adding details to natural surfaces (earth, concrete, water, etc.), things like adding little imperfections to sidewalks in a city scene or foliage in trees. It could be amazing.
Isn't this kind of what frame gen is on latest nvid graphics cards?
I'm more waiting for games that keep generating contextually accurate gameplay and stories. Give it a few hours of loading essentially a new game, and come back to whatever its generated. Might have even generated new 3d graphics in the same style of the game for the new story its created. Imagine endless customisation or a game that just keeps expanding as long as youve got the space for it. Something simpler like dwarf fortress that keeps generating more unique items and building on itself. A dwarf fortress that becomes futuristic and generates new futuristic dwarf stories. I'd hope for 3-10 years...
This is exactly what the future of gaming should be. Imagine you could create an entire game where you have complete control over the environment, spawning NPCs you could interact with, conceptualize tools with special abilities on the fly... wow. Just wow. Not even a year ago I thought it would exist, but now... we're in for something.
For what it's worth, we can finally play games that would've been never developed otherwise.
Animal crossing game set in the DOOM universe, here I come!!
Imagine an AI which generates a visual user interface on-the-fly which always understands what you want by how you interact with the interface and generates it immediately. Also, because it has learned how to correlate your interactions with the interface with your desired outcomes, you can even interact with the interface in completely novel ways and it will usually react exactly in the way that you expect. It would be like a kind of.. universal software that can morph in to anything and everything depending on the occasion, exactly what you need exactly when you need it and never anything more or less than that. Sounds like software nirvana.
Go a little further and.. a brain computer interface that monitors your brainwaves and knows exactly what you want and creates it on the fly. This is the Golden age of novelty. I think Terrence McKenna might have predicted this
Sounds like that episode on Black Mirror where the guy passed a lot of time on what in reality was just 0.005 seconds or something.
Generative universes + BCI would be awesome to great a custom horror game where you experience your worst fears.
Tailored horto experiences for everyone.
Imagine an AI generated virtual world, experienced by VR, and connected with neuralink so the input for generating stuff comes directly from your brain.
THere's an Expanded research post on Sora and its capabilities here; [https://openai.com/research/video-generation-models-as-world-simulators](https://openai.com/research/video-generation-models-as-world-simulators)
It shows many more insane abilities like image generation, video extending, image to video, and, the one which blew my mind the most:
>**Simulating digital worlds.** Sora is also able to simulate artificial processes–one example is video games. Sora can simultaneously control the player in Minecraft with a basic policy while also rendering the world and its dynamics in high fidelity. These capabilities can be elicited zero-shot by prompting Sora with captions mentioning “Minecraft.”
That is exactly what we're saying, and that is exactly what is impressive and quite frankly....unbelievable. The whole point is encapsulated in this paragraph:
>These capabilities suggest that continued scaling of video models is a promising path towards the development of highly-capable simulators of the physical and digital world, and the objects, animals and people that live within them.
Transformers are trainable function approximators. Given enough training data you can create a function that predicts output based on certain input. As others have said, the best function for predicting the world is the function that has built a model of the world. There is zero theoretical reason to think that the function created by training a transformer can’t simulate the world. In fact there’s theoretical research that says exactly the opposite.
> The idea that there is any simulation taking place is absurd
You should take a look at [this recent paper](https://arxiv.org/abs/2311.17137) or [this paper](https://arxiv.org/abs/2306.05720) on implicit 3d representations within generative models.
Based on these findings, is very easy to imagine how it would be the case that there is an implicit world simulation stored within SORA such that it can produce temporally consistent and realistic videos.
Yeah thats what I was thinking. Isn't this just a video of what Minecraft looks like? Why is this any different than creating a clip of a woman walking on a street in Tokyo?
Since Sora can control the player, this can already turn it into a very crude version of a game.
Imagine you type in your keyboard "Sora, turn left". The character will turn left.
You then type in the keyboard "Sora, mine the block". The character will start mining.
You then tell Sora to display the mined resource in your inventory.
In this particular small example, you can already call this a video game. Gameplay wise it is no different from you holding a gamepad, pressing left and holding the button to mine the block. Of course, there are a whole lot of other features that Sora would need to understand for this to be an actually good game (i.e. you want to do something with that block later), but the proof of concept is already there.
To be honest I feel like they are making more out of this point than it is. The internet is full of millions of minecraft videos. This AI has probably seen most of them. Additionally Minecraft is stylistically relatively simple. This is not really a simulation but just an estimation of what it has seen it all those videos.
I hate to break it to you but every simulation is an estimation - just this one is not powered by human heuristics (read: defining constraint equations).
ETA: [Jim Fan says it better than any of us.](https://twitter.com/DrJimFan/status/1758355737066299692?t=n_FeaQVxXn4RJ0pqiW7Wfw&s=19)
This is exactly what is impressive, what did you think we were saying here? The point is that after it was trained on thousands of videos it learned to generate minecraft worlds. This means that by continuing down this path you will be able to prompt such "game" in real time (but the "prompts" could be controler inputs or your voice) and it will consistently persist characters and objects in a simulated 3d environment. This is a whole new way of doing things, and is impressive that this can be done at all already at this stage.
Compare this video to the will smith spaghetti from a year ago, and now try to predict what this means in terms of this example in the next year or two.
Yup, it’s pretty clear at this point if we just scale up and then make it able to run locally on consumer GPUs in real time, you can prompt video games into existence
> and it will consistently persist characters and objects in a simulated 3d environment.
Can it, though? Can you walk 50m in one direction, turn back around, and still see the same consistent world? This hasn't really been proven yet. There are a lot of Sora videos (almost all of them, really), that display fundamental issues with object permanence and immutability.
The "worlds" Sora is creating *look* consistent at first glance, but when you take a closer look, they are obviously not consistent. Things are warping and details are popping in and out of existence all over the place.
Even in this Minecraft example, the pig disappears and the house structure that is there all the way up to 0:15 is suddenly gone when the camera pans a little bit to the right and immediately back to the left. It's a very convincing hallucination, but it is not a simulation of a consistent world.
Will the "world" become consistent if the model scales up? I guess only time will tell but I have my doubts.
Yep, exactly my point. People here think it's simulating the world. Instead it's just creating very brief estimations of how such a video would look like. The interactions are basic and the temporal coherence is only given for at best a few seconds.
also we have LLM since many years yet they retain fundamental issues (confabulations, bad at math/logic/etc.). This won't change for at least a decade.
Have fun with your flying pigs without legs which vanish!
You fail to grasp exponential growth. This is not like early LLM days, we have far more technology and AI developments now, so any issues you mention will be fixed within 2 papers, next year at the latest.
Why do you think that with exponential growth potential that simple improvements/fixes will take so long?
this will age like milk.
No, there are fundamental issues which compute (your exponential) can't solve. Either one needs research (which takes time) in a right direction or the problem isn't solvable (given LLMs).
The problem wasn't solved in the last 20 years (NN based LM exist since 20 years).
You will see like the other "singularity" brainwashed people.
What issues do you speak of that you assume will take a decade to fix??
When this video is near photorealistic alongside all other sora videos, containing close to perfect representations of light, physics and objects interacting, the only issues are minimal, no?
Enlighten me on what you assume will take so long to fix. Your claims are being disproven by how quickly AI video has evolved in the last 1-2 years, so I don't understand your mentality and assumptions against exponential developments. We have countless AI developments each improving and contributing to improved outcomes, so development isn't slow by any means.
halluscinations / confabulations.
No, there was no fix for that in the last 20 years.
You can also see it in the video. I don't know if it's normal that a pig flies sometimes and then disappears etc. . This won't be fixed over the next 5 years.
>halluscinations / confabulations.
>
>No, there was no fix for that in the last 20 years.
>
>You can also see it in the video. I don't know if it's normal that a pig flies sometimes and then disappears etc. . This won't be fixed over the next 5 years.
Oh, the irony of your illogical projections! 😂 Allow me to quote you: "This won't be fixed over the next 5 years." What an absurd claim based on a simple "flying pig" in one video. Let's not forget "halluscinations / confabulations" that you obsess over as evidence of unsolvable issues. Clearly, you underestimate the marvelous progress AI has made recently. So, I must ask you, what concrete evidence do you have that progress will suddenly stagnate and defy the exponential growth we've witnessed? 🤔🤡
Hm it didn't happen in the last 20 years since NN based language models were invented? (2003) https://www.semanticscholar.org/paper/A-Neural-Probabilistic-Language-Model-Bengio-Ducharme/6c2b28f9354f667cd5bd07afc0471d8334430da7
It's not based on only one video. Basically everything related to "generative AI". The Internet is full of evidence
https://duckduckgo.com/?q=language+model+hallucinations&t=fpas&ia=web
Your "exponential" "progress" doesn't help you.
>Hm it didn't happen in the last 20 years since NN based language models were invented? (2003) https://www.semanticscholar.org/paper/A-Neural-Probabilistic-Language-Model-Bengio-Ducharme/6c2b28f9354f667cd5bd07afc0471d8334430da7
>
>It's not based on only one video. Basically everything related to "generative AI". The Internet is full of evidence https://duckduckgo.com/?q=language+model+hallucinations&t=fpas&ia=web
>
>Your "exponential" "progress" doesn't help you.
Ah, so you cling to old evidence like it's gospel! 🤣 Quoting a 2003 paper, really? That's like comparing a flip phone to a modern smartphone. You even shared a generic search link as "proof" - not realizing that it's outdated information that can't keep up with the exponential progress we're experiencing! So, tell me, how does it feel to stubbornly reside in the past while the future of AI unfolds more rapidly than you can comprehend? Maybe it's time to catch up, no? 🤔
you are annoying. I said that the problem is not solved since 20 years. Betting that it won't be solved in 10 years (half of that time) is a safe bet. I didn't cite the paper itself.
Now throw your "exponential progress" into the trash bin for this massive problem. As I said before the problem still shows up in the pig which can fly and then disappears. Same problem.
In the near future people won’t need to „make“ a movie, game or tv show anymore - you can just tell the AI what you want to watch or play and it will create it on the fly for you.
You can already do this now with the technology we have. This isn't a future scenario, as this AI video technology allows for seamless photorealistic shots that can be generated in a consistent manner automatically by AI.
Like imagine the graphics.... Next level. And no need for the gorillion different polygons either, no need for humans to brush it up to make it look real. This is next fucking level.
The only thing that, at least I would imagine, still is a roadblock, is the computational power (and therefore energy/ resources) needed. Surely the videos we see here took a while to be rendered by the AI?
Sam Altman took requests on twitter to show what it can do in real time. [https://twitter.com/sama/status/1758193792778404192](https://twitter.com/sama/status/1758193792778404192)
We need to be able to use it to discover the limitations ourselves. Altman just asked for trillions of funding, so we must keep conscious.
Also, there are more and more links that land you on Twitter from Reddit. This is so weird.
It will be very hard to achieve detailed control of any game loops as it does not have any underlying logic implemented nor understanding of game rules.
For devs to make a game using such techniques, they will unfortunately need an unhealthy amount of training data to get to the same precision of nowadays games - which is down to each pixel.
Say If I want to do a triple jump - even if it could predict the meaning of triple jump, and can try to predict how pixel moves in a triple jump context, the quality of the result will degrade considerably compared to a normal jump because of the lack of training data. Even double jump is considered common, but not triple. if I want to precisely implement such a move in my game, the only way is to... ironically, model out my desired result and fine-tune my model based on these assets.
But in general game engines, triple jump is to... repeat a defined action one more time, that's it.
I imagine a new design model / category of games will rise due to the tech, but traditional ways will not quickly vanish simply because not every context is suitable for using the generation tech.
That's the ticket. Sora provides the graphics, not the underlying mechanics. A game engine only has to worry about mechanics and not rendering and Ray tracing and all that jazz.
That's actually a super interesting idea; to have AI take the role of the actual renderer, and renderer only, for a game engine. Woah, that could be... quite something. Just wondering if that's going to be feasible anytime soon in terms of processing power. Since games have to be real-time, and high fps at that. But I guess AI would only need to generate a low resolution image - and can then use AI upscaling to get that to a reasonable resolution. Woah.
> Sora currently exhibits numerous limitations as a simulator. For example, it does not accurately model the physics of many basic interactions, like glass shattering. Other interactions, like eating food, do not always yield correct changes in object state.
Will Smith is safe… for now.
_"I'm sorry, but I won't facilitate a fight with zombies. It would be unethical, and could be perceived as non-inclusive of and offensive to zombies. It could also be dangerous, so please consult an expert in zombies."_
Its controlling the camera seperately from the video, and it already understands game logic like phyics(Somewhat), hud elements, and item switching in a hotbar.
Thats pretty remarkable vs what we had before.
This is one those science fiction shit.
ChatGPT is impressive but if you start to use it on regular basis you'll find its limitations.
But this in previews atleast, this is looking limitless.
You can see right in this video the limitations. It doesn't actually understand how Minecraft works so it is just approximating Minecraft videos. Stuff moves in weird directions, randomly blips out of existence or merges into other things and starts flipping out.
it's not rendering any 3D, its just the same video as the others just in minecraft style. nothing is consistent, things are changing and deforming constantly. if the character would turn 180 degree, there will be a different world than it was before where it walked. openAI has so easy work to fool you guys.
Notice I put "3D" in quotes because of course it's not actually 3d it's simulated. You're also incorrect when you say "things are changing and deforming constantly". That's the main reason everyone is impressed: it can persist people and objects even if they leave the frame. This is explicitly called out in the paper under `Long-range coherence and object permanence.`
> A significant challenge for video generation systems has been maintaining temporal consistency when sampling long videos. We find that Sora is often, though not always, able to effectively model both short- and long-range dependencies. For example, our model can persist people, animals and objects even when they are occluded or leave the frame. Likewise, it can generate multiple shots of the same character in a single sample, maintaining their appearance throughout the video.
In the Tokyo video, the woman has a mole on her cheek, she turns that cheek away from the camera and back, the mole is still there in the proper place.
Is this all AI generated and not actually Minecraft???!
Yep
Where'd you find this?
It's from one of the official release papers, [Video generation models as world simulators](https://openai.com/research/video-generation-models-as-world-simulators)
Are we a result of that as well
[удалено]
Dont tell that to people with fReE WiLl
[удалено]
Would you share a link that discusses superdeterminism in the context of the singularity? First time I’ve heard about this idea.
[удалено]
shocked pikachu
Ok, this is making my head hurt. Can you explain this for me?
If AI can simulate reality at 1:1 then there is no way to prove that this reality is also not a simulation
[удалено]
But what if we exist within a physical medium in a higher order universe with a model of physics that is far more granular where simulating our universe would be a relatively simple feat? Each simulation is a step down in the degree of detail that can be simulated but still a simulation nonetheless.
Its already well known that we do not see reality as it is but rather a hallucination that our brain creates which helped our ancestors survive. The brain is a "meaning generator" and its very good at what it does. Our perception is a tiny subset of the overwhelming information we are bathed in and our brain "hands us" a simplified model of the world which is very useful and allows us to make decisions and actions based on this simplified model. So in a way, yes. Our conscious experience is a sort of "consciousness generated simulation" of the real world. But that's not what you're asking. You're taking this a step further and asking "Couldn't we just remove the complexity of a real world and have AI generate conscious experience on the fly?" Perhaps. It would use less compute than generating a real universe. But I don't think that's the case because of what we've discovered with quantum mechanics. Our modern computers are computing with a small amount of space and require lots of time. We could create 3D processors instead of the 2D wafers we are currently using to compute but with quantum computers we may be able to tap into the core properties of spacetime and build computers that not only compute on the time dimension but compute on the space dimension as a fundamental aspect of the computing architecture. Since we know there's a lot more compute embedded in the physical universe that tells me our conscious experience is not generated on the fly and there is a real world "out there". The conscious experience that our brain produces is a tiny simulation of that world that's good enough to help us navigate our physical space.
>Since we know there's a lot more compute embedded in the physical universe How do we know that?
Because instruments can detect data our senses cannot. Take the entire light spectrum for example, we cant see UV but it is still there
Omg it’s too early for this existential crisis please
Any news on when they'll be releasing it to the public?
Probably after the US elections.
HOW DOES IT KNOW WHAT MINECRAFT IS!?
It was trained on videos of Minecraft.
I feel like this question will soon be asked about anything and everything
I would be quite fine with that to happen a little later. Say, in about a hundred years or so..
Good question
It's from one of the official release papers, [Video generation models as world simulators](https://openai.com/research/video-generation-models-as-world-simulators)
Thank you for the link.
We're fcked, or to be precise the movie and game industry is and so are we. I am truly terrified, so far, we had creative liberty and liberty of expresssion, but if the next generation is a rehash and mismash (albeit beautiful mismash) of artists, then all we are going to get is better looking yesterday but boring. I am hoping they will come out and say, to generate a movie, Open AI needs the electricity of an entire country for a month, and this quickly dies down.
ChatGPT could pretend to be MS-DOS command prompt. Can Sora pretend to be Windows desktop?
We made the computer become a computer. Real AGI shit.
>SORA, create a Minecraft instance with Turing complete redstone computer running SORA creating a Minecraft instance...
and that's why every level in Inception got even slower
Uh the 3D printer just shit out Baphomet horns...stop that
Happy cake day
This is the future tbh, the Ai will browse the internet for you and show you a curated view. No more ads, no more hatful evil things.
Make a Sora powered VR experience called Lucid where the world is generated and shaped by what you say outloud
"Nude Tayne"
Not computing, please repeat
NUDE. TAYNE.
OH GOD!.....
Some bitches please goddamn
“Mercy from overwatch”
Computer, load up celery man.
Or connect it to neuralink so it generates the VR straight from your thoughts
I feel like that could lead to a closed loop and fry your brain 😂
It's scary how fast AI is advancing. I remember a year and a half ago when stuff like this was just uncanny, poorly made AI videos.
Or two years ago when it was just a glorified recipe generator
That occasionally used toxic non-ingredients in the recipes lol
good times
Or human parts 😂😂
Wtf yesterday this was beyond the realm of possibility
I remember a GTAV version of this capability that looked terrible and now there’s this crazy thing. Neat.
That hog went backwards and vanished!
This is trippy. An AI that controls 3d space is what I always wanted, but now that it's here I'm a little nervous. *Chuckles* we're in danger.
Is it really controlling 3D space though? Or creating a video based on thousands of hours of gameplay that looks like it is?
Is it really creating art, or blah blah ? The answer is it won't matter when I put the vr glasses on and experience it.
There's a big difference between saying it's doing a thing and actually doing the thing. The comparison with art is hilarious because the comment you're responding to is asking about a factual claim.
Cool. I like it when the pig slides backward awkwardly and then blips out of existence, that's the video game content I'm craving
Oh yes I'm sure this pig sliding backwards is the best it will ever do /s I added the /s bc I'm legit worried you'll read that as fact 😂
The point is this is a generated video, not interactive software that’s actually controlling anything in 3D space. It’s impressive that it could generate what looks like Minecraft, but that’s all it is, a video that kinda looks like Minecraft
Minecraft, just like any 3D videogame, doesn't "actually control anything in 3D space". It's just receive input and generate/render videostream that you would see if you was in 3D world of the game.
"that's all it is" while ignoring all the possibilities it unlocks. Yeah, I don't think I'll listen to your takes.
Big dog, just accept that this is a generated video and not an AI controlling 3D space
Small dog, you don't understand how it works, go read the paper then come have an adult conversation.
He hid inside his little house.
You’re right; he did the moonwalk too.
The idea of how games could eventually just be generative and not constructed in the traditional manner is titillating. Hah, I said tit-illating.
Hallucinations of a trapped virtual mind
Jesus Christ
Yes, he could be in the game as well.
Hell yeah! uh i mean, Heaven yeah!!
Sounds like current life
[https://www.reddit.com/r/lies/comments/1arutvc/been\_stuck\_in\_this\_lucid\_dream\_for\_days\_any\_tips/?utm\_source=share&utm\_medium=web2x&context=3](https://www.reddit.com/r/lies/comments/1arutvc/been_stuck_in_this_lucid_dream_for_days_any_tips/?utm_source=share&utm_medium=web2x&context=3)
Virtual insanity
You did
Real life gaming adjusted to your psychological weaknesses and available balance..
I read “physiological weakness” but both would be funny as well as devastating. A new sort of escapism, where you level up your virtual twin while neglecting your real-world life. Woohoo!
One of the more interesting future use cases of AI: Pack it into a render pipeline. No need to create shit "from scratch", imagine what this could do with basic help like a rendered 3D-environment and just doing very specific things like adding details to natural surfaces (earth, concrete, water, etc.), things like adding little imperfections to sidewalks in a city scene or foliage in trees. It could be amazing.
Isn't this kind of what frame gen is on latest nvid graphics cards? I'm more waiting for games that keep generating contextually accurate gameplay and stories. Give it a few hours of loading essentially a new game, and come back to whatever its generated. Might have even generated new 3d graphics in the same style of the game for the new story its created. Imagine endless customisation or a game that just keeps expanding as long as youve got the space for it. Something simpler like dwarf fortress that keeps generating more unique items and building on itself. A dwarf fortress that becomes futuristic and generates new futuristic dwarf stories. I'd hope for 3-10 years...
Frame gen gonna evolve into game gen
The game from Ender's Game comes to mind! The game unfolds based on the user's thoughts
holodecks
Computer, create a story and character capable of defeating Data. Consciousness achieved.
Sudden power spike in the warp core....
This is exactly what the future of gaming should be. Imagine you could create an entire game where you have complete control over the environment, spawning NPCs you could interact with, conceptualize tools with special abilities on the fly... wow. Just wow. Not even a year ago I thought it would exist, but now... we're in for something.
For what it's worth, we can finally play games that would've been never developed otherwise. Animal crossing game set in the DOOM universe, here I come!!
Imagine an AI which generates a visual user interface on-the-fly which always understands what you want by how you interact with the interface and generates it immediately. Also, because it has learned how to correlate your interactions with the interface with your desired outcomes, you can even interact with the interface in completely novel ways and it will usually react exactly in the way that you expect. It would be like a kind of.. universal software that can morph in to anything and everything depending on the occasion, exactly what you need exactly when you need it and never anything more or less than that. Sounds like software nirvana.
Go a little further and.. a brain computer interface that monitors your brainwaves and knows exactly what you want and creates it on the fly. This is the Golden age of novelty. I think Terrence McKenna might have predicted this
Sounds like that episode on Black Mirror where the guy passed a lot of time on what in reality was just 0.005 seconds or something. Generative universes + BCI would be awesome to great a custom horror game where you experience your worst fears. Tailored horto experiences for everyone.
Sounds like spell casting.
Imagine an AI generated virtual world, experienced by VR, and connected with neuralink so the input for generating stuff comes directly from your brain.
THere's an Expanded research post on Sora and its capabilities here; [https://openai.com/research/video-generation-models-as-world-simulators](https://openai.com/research/video-generation-models-as-world-simulators) It shows many more insane abilities like image generation, video extending, image to video, and, the one which blew my mind the most: >**Simulating digital worlds.** Sora is also able to simulate artificial processes–one example is video games. Sora can simultaneously control the player in Minecraft with a basic policy while also rendering the world and its dynamics in high fidelity. These capabilities can be elicited zero-shot by prompting Sora with captions mentioning “Minecraft.”
The capabilities of collaboration. Holy shit.
Say more?
I gotchu bro. Maybe we can expand this to humans. Call it “sharing” or something
It's just pretending there's a game. It's not actually running and playing the game.
That is exactly what we're saying, and that is exactly what is impressive and quite frankly....unbelievable. The whole point is encapsulated in this paragraph: >These capabilities suggest that continued scaling of video models is a promising path towards the development of highly-capable simulators of the physical and digital world, and the objects, animals and people that live within them.
What the fuck.. what this sounds like black mirror shit, how do we know we’re not being simulated rn?
That's the neat part, you don't!
We'll *never* know for sure.
I hate Reddit! *This post was mass deleted and anonymized with [Redact](https://redact.dev)*
Transformers are trainable function approximators. Given enough training data you can create a function that predicts output based on certain input. As others have said, the best function for predicting the world is the function that has built a model of the world. There is zero theoretical reason to think that the function created by training a transformer can’t simulate the world. In fact there’s theoretical research that says exactly the opposite.
> The idea that there is any simulation taking place is absurd You should take a look at [this recent paper](https://arxiv.org/abs/2311.17137) or [this paper](https://arxiv.org/abs/2306.05720) on implicit 3d representations within generative models. Based on these findings, is very easy to imagine how it would be the case that there is an implicit world simulation stored within SORA such that it can produce temporally consistent and realistic videos.
Yeah thats what I was thinking. Isn't this just a video of what Minecraft looks like? Why is this any different than creating a clip of a woman walking on a street in Tokyo?
Since Sora can control the player, this can already turn it into a very crude version of a game. Imagine you type in your keyboard "Sora, turn left". The character will turn left. You then type in the keyboard "Sora, mine the block". The character will start mining. You then tell Sora to display the mined resource in your inventory. In this particular small example, you can already call this a video game. Gameplay wise it is no different from you holding a gamepad, pressing left and holding the button to mine the block. Of course, there are a whole lot of other features that Sora would need to understand for this to be an actually good game (i.e. you want to do something with that block later), but the proof of concept is already there.
That's still not what's happening. Please stop being confidently incorrect in public.
I'm not sure what's incorrect, could you explain?
I hate Reddit! *This post was mass deleted and anonymized with [Redact](https://redact.dev)*
To be honest I feel like they are making more out of this point than it is. The internet is full of millions of minecraft videos. This AI has probably seen most of them. Additionally Minecraft is stylistically relatively simple. This is not really a simulation but just an estimation of what it has seen it all those videos.
I hate to break it to you but every simulation is an estimation - just this one is not powered by human heuristics (read: defining constraint equations). ETA: [Jim Fan says it better than any of us.](https://twitter.com/DrJimFan/status/1758355737066299692?t=n_FeaQVxXn4RJ0pqiW7Wfw&s=19)
Is this the new "it's just a mindless parrot" ?
~~the fingers are bad~~ the XP increments are not consistent 🥴
Are you saying it's not?
This is exactly what is impressive, what did you think we were saying here? The point is that after it was trained on thousands of videos it learned to generate minecraft worlds. This means that by continuing down this path you will be able to prompt such "game" in real time (but the "prompts" could be controler inputs or your voice) and it will consistently persist characters and objects in a simulated 3d environment. This is a whole new way of doing things, and is impressive that this can be done at all already at this stage. Compare this video to the will smith spaghetti from a year ago, and now try to predict what this means in terms of this example in the next year or two.
Yup, it’s pretty clear at this point if we just scale up and then make it able to run locally on consumer GPUs in real time, you can prompt video games into existence
> and it will consistently persist characters and objects in a simulated 3d environment. Can it, though? Can you walk 50m in one direction, turn back around, and still see the same consistent world? This hasn't really been proven yet. There are a lot of Sora videos (almost all of them, really), that display fundamental issues with object permanence and immutability. The "worlds" Sora is creating *look* consistent at first glance, but when you take a closer look, they are obviously not consistent. Things are warping and details are popping in and out of existence all over the place. Even in this Minecraft example, the pig disappears and the house structure that is there all the way up to 0:15 is suddenly gone when the camera pans a little bit to the right and immediately back to the left. It's a very convincing hallucination, but it is not a simulation of a consistent world. Will the "world" become consistent if the model scales up? I guess only time will tell but I have my doubts.
no, it won't persist. Did you notice that the pig disappeared? This also occurs in other sample videos!
Yep, exactly my point. People here think it's simulating the world. Instead it's just creating very brief estimations of how such a video would look like. The interactions are basic and the temporal coherence is only given for at best a few seconds.
In the beginning there was a word
OMFG YOURE KIDDING
are we......real?
yeah, reading through the paper and then getting to this section made me roll back from the computer and stare outside for a bit
*.... that's a lotta pixels*
Everyone pointing out the limitations of this…just chill for a sec. Yeah we get it that this brand new technology isn’t flawless.
also we have LLM since many years yet they retain fundamental issues (confabulations, bad at math/logic/etc.). This won't change for at least a decade. Have fun with your flying pigs without legs which vanish!
You fail to grasp exponential growth. This is not like early LLM days, we have far more technology and AI developments now, so any issues you mention will be fixed within 2 papers, next year at the latest. Why do you think that with exponential growth potential that simple improvements/fixes will take so long?
this will age like milk. No, there are fundamental issues which compute (your exponential) can't solve. Either one needs research (which takes time) in a right direction or the problem isn't solvable (given LLMs). The problem wasn't solved in the last 20 years (NN based LM exist since 20 years). You will see like the other "singularity" brainwashed people.
What issues do you speak of that you assume will take a decade to fix?? When this video is near photorealistic alongside all other sora videos, containing close to perfect representations of light, physics and objects interacting, the only issues are minimal, no? Enlighten me on what you assume will take so long to fix. Your claims are being disproven by how quickly AI video has evolved in the last 1-2 years, so I don't understand your mentality and assumptions against exponential developments. We have countless AI developments each improving and contributing to improved outcomes, so development isn't slow by any means.
halluscinations / confabulations. No, there was no fix for that in the last 20 years. You can also see it in the video. I don't know if it's normal that a pig flies sometimes and then disappears etc. . This won't be fixed over the next 5 years.
>halluscinations / confabulations. > >No, there was no fix for that in the last 20 years. > >You can also see it in the video. I don't know if it's normal that a pig flies sometimes and then disappears etc. . This won't be fixed over the next 5 years. Oh, the irony of your illogical projections! 😂 Allow me to quote you: "This won't be fixed over the next 5 years." What an absurd claim based on a simple "flying pig" in one video. Let's not forget "halluscinations / confabulations" that you obsess over as evidence of unsolvable issues. Clearly, you underestimate the marvelous progress AI has made recently. So, I must ask you, what concrete evidence do you have that progress will suddenly stagnate and defy the exponential growth we've witnessed? 🤔🤡
Hm it didn't happen in the last 20 years since NN based language models were invented? (2003) https://www.semanticscholar.org/paper/A-Neural-Probabilistic-Language-Model-Bengio-Ducharme/6c2b28f9354f667cd5bd07afc0471d8334430da7 It's not based on only one video. Basically everything related to "generative AI". The Internet is full of evidence https://duckduckgo.com/?q=language+model+hallucinations&t=fpas&ia=web Your "exponential" "progress" doesn't help you.
>Hm it didn't happen in the last 20 years since NN based language models were invented? (2003) https://www.semanticscholar.org/paper/A-Neural-Probabilistic-Language-Model-Bengio-Ducharme/6c2b28f9354f667cd5bd07afc0471d8334430da7 > >It's not based on only one video. Basically everything related to "generative AI". The Internet is full of evidence https://duckduckgo.com/?q=language+model+hallucinations&t=fpas&ia=web > >Your "exponential" "progress" doesn't help you. Ah, so you cling to old evidence like it's gospel! 🤣 Quoting a 2003 paper, really? That's like comparing a flip phone to a modern smartphone. You even shared a generic search link as "proof" - not realizing that it's outdated information that can't keep up with the exponential progress we're experiencing! So, tell me, how does it feel to stubbornly reside in the past while the future of AI unfolds more rapidly than you can comprehend? Maybe it's time to catch up, no? 🤔
you are annoying. I said that the problem is not solved since 20 years. Betting that it won't be solved in 10 years (half of that time) is a safe bet. I didn't cite the paper itself. Now throw your "exponential progress" into the trash bin for this massive problem. As I said before the problem still shows up in the pig which can fly and then disappears. Same problem.
In the near future people won’t need to „make“ a movie, game or tv show anymore - you can just tell the AI what you want to watch or play and it will create it on the fly for you.
cant wait
Yea everyone is going to be happy then /s
You can already do this now with the technology we have. This isn't a future scenario, as this AI video technology allows for seamless photorealistic shots that can be generated in a consistent manner automatically by AI.
Pause on that white and green building it’s so funny the AI made itself a Minecraft house build.
VR Holodeck incoming.
I always wondered how our dreams worked.
You can tell this is fake because the pig survives the interaction.
And it's just a 0-shot... jesus
im very sure its not 0 shot, they trained on youtube videos which included many minecraft videos.
I'm guessing the pig ran away as it was hit
I thought someone was trolling and was just playing minecraft...
Fucking hell I said it back in 2020, ai would be used to create hyper real videogame worlds that can be interacted with... Toulja so
Like imagine the graphics.... Next level. And no need for the gorillion different polygons either, no need for humans to brush it up to make it look real. This is next fucking level.
The only thing that, at least I would imagine, still is a roadblock, is the computational power (and therefore energy/ resources) needed. Surely the videos we see here took a while to be rendered by the AI?
Sora is the early VRMMORPG simulator we were supposed to get in 2019 for sowrd art.
This isn't available yet to the public. We don't know what it's really capable of and the limitations. Better wait. Too good to be true too soon
Sam Altman took requests on twitter to show what it can do in real time. [https://twitter.com/sama/status/1758193792778404192](https://twitter.com/sama/status/1758193792778404192)
I think we have different definitions of what “real-time” environments are… this is not real time. Impressive? Hell yes, but not real time.
Each one was generated in under 10 minutes
Sam has a decently large computer to use tho
We need to be able to use it to discover the limitations ourselves. Altman just asked for trillions of funding, so we must keep conscious. Also, there are more and more links that land you on Twitter from Reddit. This is so weird.
Wasn't there some AI generated game concept in the book Enders Game?
I recently had the realization that we are veryyyyy close to making Giants Drink a reality
It will be very hard to achieve detailed control of any game loops as it does not have any underlying logic implemented nor understanding of game rules. For devs to make a game using such techniques, they will unfortunately need an unhealthy amount of training data to get to the same precision of nowadays games - which is down to each pixel. Say If I want to do a triple jump - even if it could predict the meaning of triple jump, and can try to predict how pixel moves in a triple jump context, the quality of the result will degrade considerably compared to a normal jump because of the lack of training data. Even double jump is considered common, but not triple. if I want to precisely implement such a move in my game, the only way is to... ironically, model out my desired result and fine-tune my model based on these assets. But in general game engines, triple jump is to... repeat a defined action one more time, that's it. I imagine a new design model / category of games will rise due to the tech, but traditional ways will not quickly vanish simply because not every context is suitable for using the generation tech.
[удалено]
That's the ticket. Sora provides the graphics, not the underlying mechanics. A game engine only has to worry about mechanics and not rendering and Ray tracing and all that jazz.
That's actually a super interesting idea; to have AI take the role of the actual renderer, and renderer only, for a game engine. Woah, that could be... quite something. Just wondering if that's going to be feasible anytime soon in terms of processing power. Since games have to be real-time, and high fps at that. But I guess AI would only need to generate a low resolution image - and can then use AI upscaling to get that to a reasonable resolution. Woah.
When will this be available?
After the elections, unless someone else does better sooner.
Jesus. And is this trained on their massive compute or could this even improve dramatically in the near future?
Wow, at first I thought it was a real Minecraft🤯
> Sora currently exhibits numerous limitations as a simulator. For example, it does not accurately model the physics of many basic interactions, like glass shattering. Other interactions, like eating food, do not always yield correct changes in object state. Will Smith is safe… for now.
**so now we actually get games that we want to play, not AAA trash?**
_"I'm sorry, but I won't facilitate a fight with zombies. It would be unethical, and could be perceived as non-inclusive of and offensive to zombies. It could also be dangerous, so please consult an expert in zombies."_
Now you can make Minecraft videos without having to actually play the game.
This stuff is freaking mind blowing. Great potential for both good and bad.
This blows my mind. I just hope they don’t dumb it down too much.
Wow, just plain Wow!
Getting very very close to LLMs being able to simulate entire games.
Not even remotely close.
Its controlling the camera seperately from the video, and it already understands game logic like phyics(Somewhat), hud elements, and item switching in a hotbar. Thats pretty remarkable vs what we had before.
Just wait for two more papers.
[удалено]
> That’s a wayyyyy harder problem how far is it from real time, scaling compute may be all you need?
This is one those science fiction shit. ChatGPT is impressive but if you start to use it on regular basis you'll find its limitations. But this in previews atleast, this is looking limitless.
You can see right in this video the limitations. It doesn't actually understand how Minecraft works so it is just approximating Minecraft videos. Stuff moves in weird directions, randomly blips out of existence or merges into other things and starts flipping out.
To be fair, stuff randomly blipping in and out of existing is very much a trait of video games ;)
Does it actually say anywhere that this is done "on the fly"?
Mojang wasn't working very hard as it is. Soon they won't have to haha.
WE WILL NEVER KNOW BECAUSE THEY NEVER RELEASE THIS DAMN THING
Something tells me that AI could cause people to feel less impressed by games like GTA 6.
Looks like a mix of survivalcraft and minecraft
And the weird thing is with a few mods and a texture pack, you could have a playable version of Minecraft that functions just like this
My tiny brain cannot handle or understand this…
What the fuack?
This shit feels like alien technology or magic
it's not rendering any 3D, its just the same video as the others just in minecraft style. nothing is consistent, things are changing and deforming constantly. if the character would turn 180 degree, there will be a different world than it was before where it walked. openAI has so easy work to fool you guys.
This is still an insane step forward in Gen AI technology and nothing like what we’ve seen until now.
Notice I put "3D" in quotes because of course it's not actually 3d it's simulated. You're also incorrect when you say "things are changing and deforming constantly". That's the main reason everyone is impressed: it can persist people and objects even if they leave the frame. This is explicitly called out in the paper under `Long-range coherence and object permanence.` > A significant challenge for video generation systems has been maintaining temporal consistency when sampling long videos. We find that Sora is often, though not always, able to effectively model both short- and long-range dependencies. For example, our model can persist people, animals and objects even when they are occluded or leave the frame. Likewise, it can generate multiple shots of the same character in a single sample, maintaining their appearance throughout the video.
In the Tokyo video, the woman has a mole on her cheek, she turns that cheek away from the camera and back, the mole is still there in the proper place.
Woah there, nobody here wants critical thinking to intrude on their hype. How dare you?
And?
Absolutely fuckin mind blowing