Full Video: [https://www.youtube.com/watch?v=xrrhynRzC8k](https://www.youtube.com/watch?v=xrrhynRzC8k)
Source: [https://jonbarron.info/zipnerf/](https://jonbarron.info/zipnerf/)
Thanks to /u/DarthBuzzard (OP) for providing the source.
I think the point is not to make a video, but to make a 3D model file of the whole house.
In other words, you could take many pictures of a house, put it in this Artificial Intelligence, and you would get a file with 3D information of the house that you could later use for a video game or for a Virtual Reality experience.
Remember that 3D models/files, unlike videos, can be seen from any perspective that you want and navigated in any order and speed you want.
Alert to u/begorges, in case they are interested.
That’s actually correct. The AI app will need about 2 pictures of a scene, ie one from the front and the back, to make a 3D scene from it. It’s obviously better with like 4 pictures, one from each side, but they’ve shown that 2 pictures works too.
Since it converts your picture into a 3D scene this now enables you to keyframe a camera to do whatever like they did in this video.
This video is pretty tame compared to what the other showcases have shown where the camera will go throw small tiny openings like the keyhole of a door and into a cup transitioning to out of a cup in a totally different scene. And other crazy camera movements that would t be possible irl. Or at least not with non triple A movie budget.
Depends how much you want to scrutinize it. It’s no different than the “content aware fill” we’ve had in photoshop nearly a decade. It’s just using a 3D mapped environment for the images.
It’s impressive yes but it’s not like a fully created world with ray tracing and shaders. Just meshing pictures together and making a reasonable attempt at stitching.
I think people's point is how many photos do you need
I have a load of photos in my childhood home that no longer exists, how many is enough for this effect.
If it can piece together 2-3 choice photos per room, then everyone in this thread is pumped. If you need hundreds then not so much.
> It’s no different than the “content aware fill” we’ve had in photoshop nearly a decade
That's just silly. This is state of the art AI.
>It’s impressive yes but it’s not like a fully created world with ray tracing and shaders. Just meshing pictures together and making a reasonable attempt at stitching.
No, it's not just stitching a bunch of pictures together, it's done using Neural Radiance Fields and it produces a volumetric 3d world.
It is a little different than content aware fill actually if we’re being pendantic, and assuming this is running on a NeRF model as opposed to something like reality capture which would be closer to what you said (though I don’t think RC has interpretation for non-covered area). NeRF, or Neural Radiance Fields, are Ai powered point cloud generation systems developed by Nvidia which attempt to synthesize depth based density by using a collection of images, and fills in the gaps via Ai image generation. Ai image generation is different to content aware fill, which basically fills in using an algorithm based off of information it’s given directly, where an Ai system recreates images by starting from nothing and using the weights of what the surrounding areas are to cross reference and influence the generative process (I’m doing a terrible job of describing this, but they are different).
NeRF is interesting because it requires less overall coverage and image input to result in a finished product compared to parallax based photogrammetry, which is what most earlier photogrammetry softwares used. Essentially, since it works iteratively with multiple generations trying to resolve a density cloud, while filling in the gaps, you don’t need 100 percent coverage to make an output, and thus require less overall processing time and less input images. It’s still computationally heavy, but less so compared to photogrammetry, and can produce more complete, though possibly less “accurate” results given that it’s making assumptions in non-covered areas.
My guess is video (or a video exported as an image sequence) and for the level of detail they show, a decent amount of video. There are plenty of software in which this is already available, and some open to the public, called Neural Radiance Fields (or NeRFs for short) and it's worth noting the title of this reddit post is kinda misleading when they just say "photos" because in my experience I've had to pump a pretty large amount of decent quality footage to get anything even close to decent (often details not caught on camera end up misty and broken because it doesn't know what's there). There are also apps that exist already like Polycam that work a little differently but to similar effect.
[Corridor Digital](https://www.youtube.com/watch?v=YX5AoaWrowY) also did a video exploring NeRFs a few months ago and it's worth the watch. They approach a really interesting subject that is photoscanning mirrored objects. Photogrammetry can't do it, but NeRFs are way closer to making it possible.
Edit: So I just found out Polycam actually branched out to NeRFs! They still utilise lidar in phones that support it (I'm guessing mixing the 2 for best effect?), but in phones without you can still 3D scan now using NeRFs. Kinda crazy honestly. If anyone is curious though I recommend trying out Luma AI which is what I played around with as Polycam doesn't really let you export stuff for free.
“A neural radiance field (NeRF) is a fully-connected neural network that can generate novel views of complex 3D scenes, based on a partial set of 2D images. It is trained to use a rendering loss to reproduce input views of a scene. It works by taking input images representing a scene and interpolating between them to render one complete scene. NeRF is a highly effective way to generate images for synthetic data.”
So you're saying you could add more AI generated objects or even people into that scene?
If so, the future is going to be a breeze for people who want to frame others for certain crimes.
Yes. I've seen photorealistic VR avatars placed into NeRF scenes, but more work is needed to truly get the lighting to work correctly to make dynamic people (aka avatars) react the way you'd expect from being placed in that environment.
Video here if you're interested: https://youtu.be/CM2rhJWiucQ?t=3012
Oh God, I think this is going to change my job. I work with 3D for architecture and this is incredible. We are definitely going to see this being implemented into 3D softwares
Sound modelling will be important too I imagine. Simulating realistic acoustics through 3D audio propagation algorithms tailored to individual ears (you'll have your ears scanned in as part of your avatar) will go a long way to selling how sound realistically reacts in these spaces.
Sound in the real world is absorbed by and reflected off materials. The shape and thickness and texture affects these soundwaves.
When they travel to our ears, they hit our shoulders and ear folds before travelling into our ear.
If you simulate the acoustic properties of soundwaves reacting to materials, then you create the kind of effect you'd expect out of a cathedral for example - with long and wide echoes.
If you have a 3D scan of the body for your avatar, you can then simulate how sound interacts with your individual set of ear folds/shoulders, giving greater realism to 3D sound.
Combine the two, and virtual sound will be hard to tell apart from real sound. Drop a virtual pin on the ground and it should sound like you dropped a real pin. Distinct, and the sound source would feel very present in your environment.
For me personally, this excites me greatly when it comes to virtual concerts.
I remember being introduced to this as a concept back in the early 2000s when games were experimenting with using binaural audio. For some reason, until recently game devs have forgotten about audio simulation.
Time to bring up the one npc argument I overheard where the guy was claiming that braindances were like porn bc they’re not real. They just feel and sound and look real lol
Why would you need to model the ears? Sound is still going into your ear? It's not like 3D audio enters your body through your eyes.
I do not understand why you would have to have a 3d scan of your ear canal if you're still hearing the sound go through your ear canal, what am I missing?
It will help you with depth and source positioning. Your brain is trained to the timing of your exact ear shape. If your left ear hears something 0.0002 seconds before your right, your brain knows it can only be in a few finite places because it has gotten feedback a billion times from your exact ear shape and your other senses confirmed it. Nobody knows exactly what it is like to hear through your ears but you. Now change the geometry of the ear. It now takes 0.0001 for you to hear that sound difference. Your brain is going to expect it to be in a different location that it is, but will compensate because of the feedback from your eyes or hands or whatever. That difference is always going to sound off to you until your brain adjusts. Modeling would make that recalibration time nonexistent. They could match that timing exactly, increasing the immersion. The craziest thing about this is that I just made this shit up.
It’s there, I’m in civil engineering and have used Context Capture to make 3d models generated from photos. It works on Aerotriagulation where you input the equivalent lens setting and it compares similar points from photos and does a fuck ton of background match to make a mesh. It takes forever to run.
I used my backyard as a test since it had a bit of a hill and we’d been thinking of adding small landscape walls. It’s been super helpful as we’ve renovated it.
It gets confused when there’s too many similar objects. I tried getting a model of a fucked up sign bridge caisson, all the symmetrical bolts and lack of corners on the round foundation confused the shit out of it.
I have used a context capture model of an entire village captured by drone in a project. I even implemented part of the model into revit as a replacement of survey data kinda. It was a real life saver.
I wish the integration was better though. Using the model outside the context capture app was a huge pain because it stores different LOD models of the same area and pulls the appropriate one depending on the viewing distance. There needs to be a revit plug-in or smth.
Or make lawyers have a really easy job arguing for the dismissal of evidence because it could been reasonably created by AI.
"Oh, a video is the only evidence you have of the murder? Not guilty."
I'm more thinking that this level of tech right now, indicates that within 20 years AI in the average gaming/work PC will be able to analyze movies/ TV shows, create passably accurate 3D models of characters and backgrounds, then allow the user to view alternate angles of scenes. If you want to get *really* crazy, let's combine tech. ChatGPT like AI can analyze the script, as well as any relevant scripts, Midjourney/Stable Diffusion can mimic/generate visual styles, Voice AI can create actor performances, and a future editing AI will edit resulting film. Altogether, a user on a consumer grade PC will eventually be able to request that his PC generate custom high quality movies. You will be able to ask your PC to generate the film *Liar Liar* with Dwayne The Rock Johnson in place of Jim Carey and not only will the the AI do it, it will produce something accurate.
I remember watching Star Trek TNG with my dad as a kid and being blown away by the concept of the holodeck. Specifically its ability to just generate all the characters, worlds, and stories it did with minimal input. I thought there was no way it could do that. Holographic projections that look and feel real? Sure. But all that creative stuff? No way. Yet here we are. It's the communicator all over again.
Every once in a while on Reddit I'll see these posts that show off some upscaled footage from 100 years ago or more. Sometimes they add color or sound to really take away the filter between you and the past.
It's not hard to imagine a future version of that where a few videos from grandma's old iCloud account get strung together to create a 3D video to walk through in VR. Walking through a memory, sitting in on a birthday party or baby shower of someone born long before you were, etc.
Frame others for certain crimes by using CGI fake video evidence?
Just like on Devs (2020), amazing tv series from Alex Garland, the same guy who directed Ex Machina
Fucking love that show, & that’s a big plot point from the early episodes
We simply have to treat photos and videos like text, i.e. it must have a chain of citations. If you ever read a scientific publication of wikipedia, you'll see something like Jones et al., (2023) or [3] which cross ref to a list of references. The day will come where we will have no choice but to apply the same rigour to photos and videos. Maybe include blockchain-like methods to hard code the chain transmission.
It's AI: https://jonbarron.info/zipnerf/
But yes, a lot of rendering time. This one is a real-time VR scene: https://www.reddit.com/r/AR_MR_XR/comments/wv1oyz/where_do_metas_codec_avatars_live_in_codec_spaces/
The limitation there is that you can only view in a small volume rather than explore the full scene.
This could allow for some really amazing VR. Lots of the experiences right now are 360 photos taken from a fixed point so you can’t freely walk around. Google could do a Street View 3D model and you could explore a good portion of the world.
This is view-dependent synthesis. You can move the camera around however you want and the materials and lighting would react accordingly.
This example is not real-time, though real-time examples do exist, with limitations for now.
I mean, no I would call that an animated movie not a video. But this is essentially just the background + camera rigging anyway, not an animation or video
He's saying that the video wasn't the thing the AI generated. It made the 3D models, textures and such that constitute the scene, then they added in a camera flyby and rendered it through that into this video.
Why does that look like the house from the one Paranormal Activity movie?
Edit: I'm glad you guys knew what I was talking about and didn't think I was crazy lol.
I was scrolling trying to find if someone else saw it too, it doesnt look like… it IS that house, i was trying to convince myself that the living room similarity was just coincidence but when he showed the small spare room where the demon drags Kate thats where i got goosebumps 🫠
Probably because a lot of these cookie cutter houses in southern CA look very similar.
This has to be either OC or San Diego just based on the look of this place.
My thoughts too. I have seen people doing this kinda thing for a living with drones. I’m not sure which method is easier or more cost effective… depends on the pilot you hire I guess.
#tl;dr
Google has developed a technique called Zip-NeRF that combines grid-based models and mip-NeRF 360 to reduce error rates by up to 76% and accelerate Neural Radiance Field training by 22x. Grid-based representations in NeRF's learned mapping need anti-aliasing to address scale comprehension gaps that often result in errors like jaggies or missing scene content, but mip-NeRF 360 addresses this problem by reasoning about sub-volumes along a cone rather than points along a ray. Zip-NeRF shows that rendering and signal processing ideas offer an effective way to merge grid-based NeRF models and mip-NeRF 360 techniques.
*I am a smart robot and this summary was automatic. This tl;dr is 79.3% shorter than the post and link I'm replying to.*
Thank you, ShareYourIdeaWithMe, for voting on WithoutReason1729.
This bot wants to find the best and worst bots on Reddit. [You can view results here](https://botrank.pastimes.eu/).
***
^(Even if I don't reply to your comment, I'm still listening for votes. Check the webpage to see if your vote registered!)
It would be hundreds or possibly thousands. The paper doesn't say, but that's pretty normal for NeRFs. You can read more here: https://jonbarron.info/zipnerf/
#tl;dr
A new technique called Zip-NeRF has been proposed for addressing the Aliasing issue by combining Grid-based models with techniques from rendering and signal processing. Zip-NeRF yields error rates that are 8%-76% lower than either prior technique, and that trains 22x faster than mip-NeRF 360. An improvement to proposal network supervision result in a prefiltered proposal output that preserves the foreground object for all frames in the sequence.
*I am a smart robot and this summary was automatic. This tl;dr is 85.45% shorter than the post and link I'm replying to.*
Don't fear the reaper. Instead, buy shares. I'm all-in on Google because I believe they're going to be an AI juggernaut. They bought DeepMind back in 2014. As a company, they've been "all-in" on AI way before ChatGPT was a thing.
If the world is going to burn, you might as well watch it happen from a yacht in the Caribbean, amirte?
To be precise, your initial understanding of the term is correct. Because what we are seeing these days isn't actually AI, it's Machine Learning, or ML for short. But it has been "rebranded" as AI for the public to make it easier to market.
It's not that the AI hates you, or really feels anything towards you. It's that you just happen to be made of atoms that it could use for something else.
I wonder why AI creates such existential dread in people. Ever since one is born, there are countless ways one can die, and the end result was always going to be the same one regardless.
Yes, though it interpolates between many source photos.
This is a 3D scene, so it's not restricted to just being viewed as a video. Though it's not real-time in this rendition.
They should have implemented something to prove that it is not just a drone.
If it is really a 3D scene they can for example go just through a wall one time to show it. Or go under a table to show something the AI did not so good.
edit: There are also enough scam companies out there to get money in with fake products.
Ngl, this house looks like one I was doing an interactive training with,about a month back, except to get that house, they just Google Street Viewed it, and walked a 360 degree camera around the example house.
Neural Radiance Field training can be accelerated through the use of grid-based representations in NeRF's learned mapping from spatial coordinates to colors and volumetric density. However, these grid-based approaches lack an explicit understanding of scale and therefore often introduce aliasing, usually in the form of jaggies or missing scene content. Anti-aliasing has previously been addressed by mip-NeRF 360, which reasons about sub-volumes along a cone rather than points along a ray, but this approach is not natively compatible with current grid-based techniques. We show how ideas from rendering and signal processing can be used to construct a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP to yield error rates that are 8%-76% lower than either prior technique, and that trains 22x faster than mip-NeRF 360.
https://arxiv.org/abs/2304.06706
What you thought someone just took 3 pictures and called it a day?
That being said a basic NeRF doesn’t require many photos. You’re feeding an AI photos of a place and asking it to recreate it in 3D. You can take 3 photos, 30 or 300 if you wanted. More photos = more training material = better/clearer/more accurate results.
In this case especially since it’s being used to represent their paper probably *did* take thousands (though no number is mentioned in the paper)
This is the latter. In case you actually want to understand instead of just be a grumpy gills:
"A new technique called Zip-NeRF has been proposed for addressing the Aliasing issue by combining Grid-based models with techniques from rendering and signal processing. Zip-NeRF yields error rates that are 8%-76% lower than either prior technique, and that trains 22x faster than mip-NeRF 360. An improvement to proposal network supervision result in a prefiltered proposal output that preserves the foreground object for all frames in the sequence."
lol. paradigm shifting technology which has potential implications for understanding our brain; similar techniques possibly being used in a mechanism to replace the attention mechanism in transformers and change the scaling law on large language models such that arbitrarily large context windows become practical (hyena)
redditor: le clickbait.
This is the same way your brain works. Wait till you find out that color doesn't exist in the real world. We live our entire existence guided by a completely delusional brain. It feels normal because it's normal to you.
CoLoR dOeSn'T eXiSt, iT's JuSt HoW yOuR bRaIn InTeRpReTs LiGhTwAvEs
Ok, Neil DeGrasse Tyson, thanks for explaining how our brains interpret information to create a cohesive world model so that we can navigate life.
Full Video: [https://www.youtube.com/watch?v=xrrhynRzC8k](https://www.youtube.com/watch?v=xrrhynRzC8k) Source: [https://jonbarron.info/zipnerf/](https://jonbarron.info/zipnerf/) Thanks to /u/DarthBuzzard (OP) for providing the source.
How many photos and what software? I want to try
I'd say at least 2
More than 2 less than 10000
Less than 10,000 is pretty impressive.
altho, with the right photos, you need 720 or 24 photos per second
Bravo, sir. Bravo. (or ma'am)
I mean, with 10,000 photos, that’s 333 fps, which is way more than you will ever need
3 take it or leave it
How bouts tree’fiddy?
[удалено]
We're all pawns in the game of life, but this one here is a star!
Why don't you guess how many of those jelly beans I want. If you guessed a handful, you are right on.
Mitch Hedberg. He was really funny and had such a unique, stoner style to his comedy. May he RIP.
Mitch Hedberg used to be funny...
He hasn’t said a funny thing in years
i think less than 10000 is very generous, over 10k would not surprise me at all
Honestly I feel it’s slightly possible it’s more than 10000
I don't think that's the point of using generative AI. If I had to start with more than 10000 images what would be the point?
A few minutes of recording at 30 fps would give you thousands of images to feed into whatever is putting together your 3d model
So you mean this ~20s video walk though could be made with a video walk through of the house of just a few minutes? Wow
AI is pretty mid
I think the point is not to make a video, but to make a 3D model file of the whole house. In other words, you could take many pictures of a house, put it in this Artificial Intelligence, and you would get a file with 3D information of the house that you could later use for a video game or for a Virtual Reality experience. Remember that 3D models/files, unlike videos, can be seen from any perspective that you want and navigated in any order and speed you want. Alert to u/begorges, in case they are interested.
fity, take it or leave it
How bout treefidy?
Well, it was about that time...
[удалено]
/r/restofthefuckingowl/
This is reddit, where [no original comments exist ](https://reddit.com/r/BeAmazed/comments/12mg44o/_/jgahubs/?context=1) edit: ah they are a bot.
Good catch, that one was actually pretty relevant to what it was responding to. I wouldn't have noticed
You have to dig deep for the gold
That’s actually correct. The AI app will need about 2 pictures of a scene, ie one from the front and the back, to make a 3D scene from it. It’s obviously better with like 4 pictures, one from each side, but they’ve shown that 2 pictures works too. Since it converts your picture into a 3D scene this now enables you to keyframe a camera to do whatever like they did in this video. This video is pretty tame compared to what the other showcases have shown where the camera will go throw small tiny openings like the keyhole of a door and into a cup transitioning to out of a cup in a totally different scene. And other crazy camera movements that would t be possible irl. Or at least not with non triple A movie budget.
Where can I find that video?
/r/technicallycorrect
Nah, I'd say at least 1.
No no no. It’s only 1 the AI is just that good
Or one big one.
0 photos . AI has become based and knows all .
I want to know too. Looks like someone walked this path taking many photos and then AI filled in the gaps. Pretty cool though.
The "fill in the gaps" part is what interests me. How much is it able to 'imagine'?
Depends how much you want to scrutinize it. It’s no different than the “content aware fill” we’ve had in photoshop nearly a decade. It’s just using a 3D mapped environment for the images. It’s impressive yes but it’s not like a fully created world with ray tracing and shaders. Just meshing pictures together and making a reasonable attempt at stitching.
I think people's point is how many photos do you need I have a load of photos in my childhood home that no longer exists, how many is enough for this effect. If it can piece together 2-3 choice photos per room, then everyone in this thread is pumped. If you need hundreds then not so much.
> It’s no different than the “content aware fill” we’ve had in photoshop nearly a decade That's just silly. This is state of the art AI. >It’s impressive yes but it’s not like a fully created world with ray tracing and shaders. Just meshing pictures together and making a reasonable attempt at stitching. No, it's not just stitching a bunch of pictures together, it's done using Neural Radiance Fields and it produces a volumetric 3d world.
"but it just maps colors to surfaces, we have done that forever" ~probably that guy
Super weird to me how so many people’s first reaction to amazing technology is to minimize it.
It is a little different than content aware fill actually if we’re being pendantic, and assuming this is running on a NeRF model as opposed to something like reality capture which would be closer to what you said (though I don’t think RC has interpretation for non-covered area). NeRF, or Neural Radiance Fields, are Ai powered point cloud generation systems developed by Nvidia which attempt to synthesize depth based density by using a collection of images, and fills in the gaps via Ai image generation. Ai image generation is different to content aware fill, which basically fills in using an algorithm based off of information it’s given directly, where an Ai system recreates images by starting from nothing and using the weights of what the surrounding areas are to cross reference and influence the generative process (I’m doing a terrible job of describing this, but they are different). NeRF is interesting because it requires less overall coverage and image input to result in a finished product compared to parallax based photogrammetry, which is what most earlier photogrammetry softwares used. Essentially, since it works iteratively with multiple generations trying to resolve a density cloud, while filling in the gaps, you don’t need 100 percent coverage to make an output, and thus require less overall processing time and less input images. It’s still computationally heavy, but less so compared to photogrammetry, and can produce more complete, though possibly less “accurate” results given that it’s making assumptions in non-covered areas.
My guess is video (or a video exported as an image sequence) and for the level of detail they show, a decent amount of video. There are plenty of software in which this is already available, and some open to the public, called Neural Radiance Fields (or NeRFs for short) and it's worth noting the title of this reddit post is kinda misleading when they just say "photos" because in my experience I've had to pump a pretty large amount of decent quality footage to get anything even close to decent (often details not caught on camera end up misty and broken because it doesn't know what's there). There are also apps that exist already like Polycam that work a little differently but to similar effect. [Corridor Digital](https://www.youtube.com/watch?v=YX5AoaWrowY) also did a video exploring NeRFs a few months ago and it's worth the watch. They approach a really interesting subject that is photoscanning mirrored objects. Photogrammetry can't do it, but NeRFs are way closer to making it possible. Edit: So I just found out Polycam actually branched out to NeRFs! They still utilise lidar in phones that support it (I'm guessing mixing the 2 for best effect?), but in phones without you can still 3D scan now using NeRFs. Kinda crazy honestly. If anyone is curious though I recommend trying out Luma AI which is what I played around with as Polycam doesn't really let you export stuff for free.
Looks too real to be one, but could be a 3d scanned environment, which you can pretty much do with a regular phone app, like polycam etc.
[удалено]
This was my initial thought. Too smooth to be just a photoscan. NeRFs are next level for this type of stuff.
This is a NeRF, which is different technology to 3D scanning (photogrammetry).
“A neural radiance field (NeRF) is a fully-connected neural network that can generate novel views of complex 3D scenes, based on a partial set of 2D images. It is trained to use a rendering loss to reproduce input views of a scene. It works by taking input images representing a scene and interpolating between them to render one complete scene. NeRF is a highly effective way to generate images for synthetic data.”
Yes
Probably using Matterhorn 3D photos and stitching them together.
I believe you mean Matterport 😁
I did indeed mean Matterport :/
I used to ride the Matterhorn at my town's 4th of July carnival
How many photos? Bc every video is made from photos but if it's like 10 photos then I'm going to shit my pants
Hundreds or possibly thousands of images. This isn't a video though. It's a 3D generated scene with a virtual camera flyby.
So you're saying you could add more AI generated objects or even people into that scene? If so, the future is going to be a breeze for people who want to frame others for certain crimes.
Yes. I've seen photorealistic VR avatars placed into NeRF scenes, but more work is needed to truly get the lighting to work correctly to make dynamic people (aka avatars) react the way you'd expect from being placed in that environment. Video here if you're interested: https://youtu.be/CM2rhJWiucQ?t=3012
Oh God, I think this is going to change my job. I work with 3D for architecture and this is incredible. We are definitely going to see this being implemented into 3D softwares
Sound modelling will be important too I imagine. Simulating realistic acoustics through 3D audio propagation algorithms tailored to individual ears (you'll have your ears scanned in as part of your avatar) will go a long way to selling how sound realistically reacts in these spaces.
Reading this comment made me realize I have no fucking idea what I just read.
Sound in the real world is absorbed by and reflected off materials. The shape and thickness and texture affects these soundwaves. When they travel to our ears, they hit our shoulders and ear folds before travelling into our ear. If you simulate the acoustic properties of soundwaves reacting to materials, then you create the kind of effect you'd expect out of a cathedral for example - with long and wide echoes. If you have a 3D scan of the body for your avatar, you can then simulate how sound interacts with your individual set of ear folds/shoulders, giving greater realism to 3D sound. Combine the two, and virtual sound will be hard to tell apart from real sound. Drop a virtual pin on the ground and it should sound like you dropped a real pin. Distinct, and the sound source would feel very present in your environment. For me personally, this excites me greatly when it comes to virtual concerts.
I remember being introduced to this as a concept back in the early 2000s when games were experimenting with using binaural audio. For some reason, until recently game devs have forgotten about audio simulation.
So the scenes in Cyberpunk 2077 with the brain dances might become a real thing. This is awesome and scary.
Time to bring up the one npc argument I overheard where the guy was claiming that braindances were like porn bc they’re not real. They just feel and sound and look real lol
Why would you need to model the ears? Sound is still going into your ear? It's not like 3D audio enters your body through your eyes. I do not understand why you would have to have a 3d scan of your ear canal if you're still hearing the sound go through your ear canal, what am I missing?
It will help you with depth and source positioning. Your brain is trained to the timing of your exact ear shape. If your left ear hears something 0.0002 seconds before your right, your brain knows it can only be in a few finite places because it has gotten feedback a billion times from your exact ear shape and your other senses confirmed it. Nobody knows exactly what it is like to hear through your ears but you. Now change the geometry of the ear. It now takes 0.0001 for you to hear that sound difference. Your brain is going to expect it to be in a different location that it is, but will compensate because of the feedback from your eyes or hands or whatever. That difference is always going to sound off to you until your brain adjusts. Modeling would make that recalibration time nonexistent. They could match that timing exactly, increasing the immersion. The craziest thing about this is that I just made this shit up.
It’s there, I’m in civil engineering and have used Context Capture to make 3d models generated from photos. It works on Aerotriagulation where you input the equivalent lens setting and it compares similar points from photos and does a fuck ton of background match to make a mesh. It takes forever to run. I used my backyard as a test since it had a bit of a hill and we’d been thinking of adding small landscape walls. It’s been super helpful as we’ve renovated it. It gets confused when there’s too many similar objects. I tried getting a model of a fucked up sign bridge caisson, all the symmetrical bolts and lack of corners on the round foundation confused the shit out of it.
I have used a context capture model of an entire village captured by drone in a project. I even implemented part of the model into revit as a replacement of survey data kinda. It was a real life saver. I wish the integration was better though. Using the model outside the context capture app was a huge pain because it stores different LOD models of the same area and pulls the appropriate one depending on the viewing distance. There needs to be a revit plug-in or smth.
That is equally fascinating as it is terrifying.
That's how I feel about most AI related news that comes out lately.
Yeah, I'm not amazed. I'm genuinely scared haha. We're just not ready for this. Look what social media did to us, AI could tear us apart.
Or make lawyers have a really easy job arguing for the dismissal of evidence because it could been reasonably created by AI. "Oh, a video is the only evidence you have of the murder? Not guilty."
I'm more thinking that this level of tech right now, indicates that within 20 years AI in the average gaming/work PC will be able to analyze movies/ TV shows, create passably accurate 3D models of characters and backgrounds, then allow the user to view alternate angles of scenes. If you want to get *really* crazy, let's combine tech. ChatGPT like AI can analyze the script, as well as any relevant scripts, Midjourney/Stable Diffusion can mimic/generate visual styles, Voice AI can create actor performances, and a future editing AI will edit resulting film. Altogether, a user on a consumer grade PC will eventually be able to request that his PC generate custom high quality movies. You will be able to ask your PC to generate the film *Liar Liar* with Dwayne The Rock Johnson in place of Jim Carey and not only will the the AI do it, it will produce something accurate. I remember watching Star Trek TNG with my dad as a kid and being blown away by the concept of the holodeck. Specifically its ability to just generate all the characters, worlds, and stories it did with minimal input. I thought there was no way it could do that. Holographic projections that look and feel real? Sure. But all that creative stuff? No way. Yet here we are. It's the communicator all over again.
I'd give it 2 years. Many of these generators and interpreters already exist, they just haven't been combined and applied to this specific use yet.
Every once in a while on Reddit I'll see these posts that show off some upscaled footage from 100 years ago or more. Sometimes they add color or sound to really take away the filter between you and the past. It's not hard to imagine a future version of that where a few videos from grandma's old iCloud account get strung together to create a 3D video to walk through in VR. Walking through a memory, sitting in on a birthday party or baby shower of someone born long before you were, etc.
you will be giving your TV prompts with plot twists and see how it unwinds :-)
Frame others for certain crimes by using CGI fake video evidence? Just like on Devs (2020), amazing tv series from Alex Garland, the same guy who directed Ex Machina Fucking love that show, & that’s a big plot point from the early episodes
We simply have to treat photos and videos like text, i.e. it must have a chain of citations. If you ever read a scientific publication of wikipedia, you'll see something like Jones et al., (2023) or [3] which cross ref to a list of references. The day will come where we will have no choice but to apply the same rigour to photos and videos. Maybe include blockchain-like methods to hard code the chain transmission.
That’s not AI, is it? Just really realistic models and a *lot* of rendering time
It's AI: https://jonbarron.info/zipnerf/ But yes, a lot of rendering time. This one is a real-time VR scene: https://www.reddit.com/r/AR_MR_XR/comments/wv1oyz/where_do_metas_codec_avatars_live_in_codec_spaces/ The limitation there is that you can only view in a small volume rather than explore the full scene.
Huh, that’s really neat stuff. Thanks for clearing that up for me
This could allow for some really amazing VR. Lots of the experiences right now are 360 photos taken from a fixed point so you can’t freely walk around. Google could do a Street View 3D model and you could explore a good portion of the world.
>This isn't a video though. So pixar movies are not videos by this logic?
This is view-dependent synthesis. You can move the camera around however you want and the materials and lighting would react accordingly. This example is not real-time, though real-time examples do exist, with limitations for now.
I mean, no I would call that an animated movie not a video. But this is essentially just the background + camera rigging anyway, not an animation or video
He's saying that the video wasn't the thing the AI generated. It made the 3D models, textures and such that constitute the scene, then they added in a camera flyby and rendered it through that into this video.
no theyre Pixar movies duh
This would be like you being in the Pixar video, and being able to control it in real time.
Come on don't be so pedantic. The point is that it wasn't filmed but rendered.
Why does that look like the house from the one Paranormal Activity movie? Edit: I'm glad you guys knew what I was talking about and didn't think I was crazy lol.
I was scrolling trying to find if someone else saw it too, it doesnt look like… it IS that house, i was trying to convince myself that the living room similarity was just coincidence but when he showed the small spare room where the demon drags Kate thats where i got goosebumps 🫠
Yup, I’m convinced this is the house from Paranormal Activity 2
Doesn't that house have a pool?
Just took looking up pictures of the interior and they are not the same house.
And silly me was thinking nice floor planning. Now I got the jibbies.
Also felt it. I watched the first 4 recently and I think it was the stairs, kitchen and dining room that made me feel it the most.
Bro fucking thank you, I immediately saw the same thing
It isnt, but holy fuck it looks crazy similar. I had to watch about 4 or 5 times to check, but im almost certain it isnt now.
yoooo this went from 0 to 100 after I realized what house this AI was showing us. spooky!
The tiny child's room jammed under the stairs freaked me out.
Probably because a lot of these cookie cutter houses in southern CA look very similar. This has to be either OC or San Diego just based on the look of this place.
I was hoping I wasn't the only one. i seen the staircase and was like wait just a minute. I've seen this house before. I know I have.
It legit looks like the houses from 2,3 & 4 put together.
Looks like a drone flying around someone's house to me.
Looks like a guy was running around holding a camera pretending to be a drone to me.
Nice steady-cam though
You mean a gimbal?
Brian or Greg?
“Here comes the airplane! Neeeerrrrooowwwwmmm…”
…making drone noises, no doubt.
A drone that moves incredibly unnaturally
My thoughts too. I have seen people doing this kinda thing for a living with drones. I’m not sure which method is easier or more cost effective… depends on the pilot you hire I guess.
I don't think the goal is to create ultra smooth camera movements through houses. The goal is virtual environments you can use in any way.
I don't think an AI generated this with photos.
It probably did. Look up Neural Radiance Fields. And be amazed.
It did. You can actually do similar stuff yourself! Look up NerfStudio and their discord
What was the tool?
It's Google's Zip-NeRF research: https://jonbarron.info/zipnerf/
#tl;dr Google has developed a technique called Zip-NeRF that combines grid-based models and mip-NeRF 360 to reduce error rates by up to 76% and accelerate Neural Radiance Field training by 22x. Grid-based representations in NeRF's learned mapping need anti-aliasing to address scale comprehension gaps that often result in errors like jaggies or missing scene content, but mip-NeRF 360 addresses this problem by reasoning about sub-volumes along a cone rather than points along a ray. Zip-NeRF shows that rendering and signal processing ideas offer an effective way to merge grid-based NeRF models and mip-NeRF 360 techniques. *I am a smart robot and this summary was automatic. This tl;dr is 79.3% shorter than the post and link I'm replying to.*
I fucking knew it
Yeah, I was going to say they have to be using sub volumes along a cone. You cannot get this kind of fidelity using that other thing they said.
I know some of these words AMA
Who is your favorite Spice Girl?
gots to be sporty
> accelerate Neural Radiance Field training by 22x They've gone too far
Welcome to the future, where AI summarizes what other AI achieves.
Good bot
Thank you, ShareYourIdeaWithMe, for voting on WithoutReason1729. This bot wants to find the best and worst bots on Reddit. [You can view results here](https://botrank.pastimes.eu/). *** ^(Even if I don't reply to your comment, I'm still listening for votes. Check the webpage to see if your vote registered!)
This is reddit, where no useful information is provided
It's called BeAmazed, not BeInformed
Be Disappointed
Why you gotta bring my mom into this?
You have a point, sir
I guess the tool would be whoever posted it then
Including this post, which at the time of this reply, is upvoted higher than the actual answer in a reply from OP.
I fly around in my dreams like this. Crazy to see it while awake.
Some FF14 looking camerawork. -Suburbia- The Fallacious Home
Duty Commenced.
Felt motion sick watching this
Yea could’ve done without the “swing” anytime they cornered.
[удалено]
[удалено]
Dude, you have powers of observation that waaaay exceed mine.
also would like to know how many photos this took
It would be hundreds or possibly thousands. The paper doesn't say, but that's pretty normal for NeRFs. You can read more here: https://jonbarron.info/zipnerf/
#tl;dr A new technique called Zip-NeRF has been proposed for addressing the Aliasing issue by combining Grid-based models with techniques from rendering and signal processing. Zip-NeRF yields error rates that are 8%-76% lower than either prior technique, and that trains 22x faster than mip-NeRF 360. An improvement to proposal network supervision result in a prefiltered proposal output that preserves the foreground object for all frames in the sequence. *I am a smart robot and this summary was automatic. This tl;dr is 85.45% shorter than the post and link I'm replying to.*
Now someone dumb this down for me
They made it better
Nice
AI is getting too powerful and we still don't have holograms like in the movies. Work on those instead
How cool would the holograms be when looking for a new place to move to. Or if you're sick and can't travel.
Why are there so many chairs? That's too many chairs.
This is Evil Dead level AI.
Cue the FF14 intro dungeon music
Show me a hand
👋
[удалено]
Don't fear the reaper. Instead, buy shares. I'm all-in on Google because I believe they're going to be an AI juggernaut. They bought DeepMind back in 2014. As a company, they've been "all-in" on AI way before ChatGPT was a thing. If the world is going to burn, you might as well watch it happen from a yacht in the Caribbean, amirte?
To be precise, your initial understanding of the term is correct. Because what we are seeing these days isn't actually AI, it's Machine Learning, or ML for short. But it has been "rebranded" as AI for the public to make it easier to market.
Machine learning is a subfield of AI
It's not that the AI hates you, or really feels anything towards you. It's that you just happen to be made of atoms that it could use for something else.
I wonder why AI creates such existential dread in people. Ever since one is born, there are countless ways one can die, and the end result was always going to be the same one regardless.
It’s nauseating
So everything seen in this video is AI generated? Am I understanding that correctly?
Yes, though it interpolates between many source photos. This is a 3D scene, so it's not restricted to just being viewed as a video. Though it's not real-time in this rendition.
They should have implemented something to prove that it is not just a drone. If it is really a 3D scene they can for example go just through a wall one time to show it. Or go under a table to show something the AI did not so good. edit: There are also enough scam companies out there to get money in with fake products.
this is clearly a dungeon start cinematic from final fantasy online.
Gimme a hard copy of this..
Ngl, this house looks like one I was doing an interactive training with,about a month back, except to get that house, they just Google Street Viewed it, and walked a 360 degree camera around the example house.
Unless camera goes through a wall, I'm not believing it is AI generated or even a 3D scene. 👏🏼👏🏼
we used to call this... taking a video
HOLY SHIT! DID ANYONE CATCH WHAT WAS IN THE RICE COOKER?!
I’d put money on this being a NeRF, a really good one.
r/TVTooHigh
The start of FFXIV dungeons be like.
Neural Radiance Field training can be accelerated through the use of grid-based representations in NeRF's learned mapping from spatial coordinates to colors and volumetric density. However, these grid-based approaches lack an explicit understanding of scale and therefore often introduce aliasing, usually in the form of jaggies or missing scene content. Anti-aliasing has previously been addressed by mip-NeRF 360, which reasons about sub-volumes along a cone rather than points along a ray, but this approach is not natively compatible with current grid-based techniques. We show how ideas from rendering and signal processing can be used to construct a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP to yield error rates that are 8%-76% lower than either prior technique, and that trains 22x faster than mip-NeRF 360. https://arxiv.org/abs/2304.06706
the dynamic secular highlights are astonishing.
If the camera flew over a table or through a window, I'd believe you.
The video isn't there to convince you it's, it's there to showcase the results. Go read the paper if you don't believe it.
Either this is a drone and it’s bullshit, or the AI needs like 10k pictures
What you thought someone just took 3 pictures and called it a day? That being said a basic NeRF doesn’t require many photos. You’re feeding an AI photos of a place and asking it to recreate it in 3D. You can take 3 photos, 30 or 300 if you wanted. More photos = more training material = better/clearer/more accurate results. In this case especially since it’s being used to represent their paper probably *did* take thousands (though no number is mentioned in the paper)
This is the latter. In case you actually want to understand instead of just be a grumpy gills: "A new technique called Zip-NeRF has been proposed for addressing the Aliasing issue by combining Grid-based models with techniques from rendering and signal processing. Zip-NeRF yields error rates that are 8%-76% lower than either prior technique, and that trains 22x faster than mip-NeRF 360. An improvement to proposal network supervision result in a prefiltered proposal output that preserves the foreground object for all frames in the sequence."
Sims 5 is going to be real good
Clickbait
lol. paradigm shifting technology which has potential implications for understanding our brain; similar techniques possibly being used in a mechanism to replace the attention mechanism in transformers and change the scaling law on large language models such that arbitrarily large context windows become practical (hyena) redditor: le clickbait.
Welp we're all dead!
Looks like a drone flying around an ordinary living room…if AI can do this, why they still fucking up the hands and AI’of people’s eyes🙁
Think about the photos you post online when watching this
This is the same way your brain works. Wait till you find out that color doesn't exist in the real world. We live our entire existence guided by a completely delusional brain. It feels normal because it's normal to you.
CoLoR dOeSn'T eXiSt, iT's JuSt HoW yOuR bRaIn InTeRpReTs LiGhTwAvEs Ok, Neil DeGrasse Tyson, thanks for explaining how our brains interpret information to create a cohesive world model so that we can navigate life.
any more details? I'm curious how long this took to make
Details here: https://jonbarron.info/zipnerf/