It would be really great to recreate loved ones after they have past in some sort of digital space.
As I’ve gotten older, and my parents get older as well, I’ve been thinking more about what my life will be like in old age (and beyond too). I’ve also been thinking what I would want “heaven” to be. Eternal life doesn’t appeal to me much. Imagine living a quadrillion years. Even as a god, that would be miserable. That would be (by my rough estimate) the equivalent of 500 times the cumulative lifespans of all humans who have ever lived.
What I would really like is to see my parents and my beloved dog again, decades after they have past (along with any living ones at that time). Being able to see them and speak to them one last time at the end of my life before fading into eternal darkness would be how I would want to go.
Anyway, there’s a free startup idea for anyone—recreate loved ones in VR so people can see them again.
There's so many things we invent with good intentions but in the end go terribly wrong and I think this is one of those things. I think it's ok to mourn and remember the past, but moving on and accepting reality is important to a healthy life.
Let's be real though, the startup that makes this but appeals to our worst instincts make bank. I can't imagine how much more messed up future generations will be as we keep making more dangerous technology that appeals to our primal instincts.
Let be real: I've worked on the R&D stage of a Chinese research project for a State supported Ancestor Worship software where people's ancestors are recreated in 3D, their "ancestorial home" is made available in pieces and parts the software user must purchase with real currency, and the user is encouraged to discuss their day to day life issues with their observing and consoling animated ancestors. The software is a complete Orwellian Spy while masquerading as all your ancestors listening, offering advice, and demanding gifts that cost real currency. To say the least, I spooked the hell out of that situation.
"mourn and move on" is a somewhat Western concept of dealing with death. Plenty of cultures around the world have developed different practices, up to religious forms of ancestor worship.
These practices would typically be associated with the idea that the dead have a genuine existence beyond simply “existing in my memory of them” and would be restricted in the forms they take by surrounding ritual, so I’m not sure there’s a direct point of comparison, though?
Yes, there's a direct comparison. Practices change as times and technology change. When photography became cheap enough, practices like in-home shrines often changed to incorporate photographs of loved ones. I think it's reasonable to speculate that newer forms digital representation can become part of these practices as well.
Bs. People do different things. But nobody can sit around wallowing in the VR. Having seen what death of my cousin and grand parents did to the family, moving on helped every one heal in a way. VR would have been a torture to live in.
You are taking the worst possible interpretation of my comment. I was responding to literally the aspect of the GP that said "mourn.... move on"
It is a factual matter, easily discovered through simple searches, that non-Western cultures often take a different approach. Including, literally, ancestor worship. I am not making a judgement on which is better. I believe there are multiple healthy ways of dealing with grief, and "mourn & move on" can be one, but not the only one.
This is very far from an obsessive interaction with a semi-emulated version of a dead person in VR. However things wouldn't have to be that extreme.
In cultures that practice it, it's not uncommon for a small shrine in the home to be setup. And yet that is of limited access to family located further away, so a digital form of this not bound to a specific location could also be of use to people from these cultures. There is no reason that such things couldn't compliment existing practices of honoring ancestors.
I dont know how a grieving person can maintain distance from vr when the memories of their loved ines are tied. Reminds me so much of the movie Reminiscence
Curious how you find other cultures and belief systems are? In Hinduism, at least I can attest that there is the concept of "mourn and move on" in that the body is not even preserved, it is freed of its physical form through cremation. The fire into the funeral pyre is instilled by hand by the eldest son - it's a poetic closure.
If you do a search for "ancestor worship" you'll find a few examples. It seems more common among Asian cultures, perhaps specifically those with a Buddhist tradition, but that's what I happen to be more familiar with so it may be common outside of that as well.
My gut reaction to 'recreating loved ones in VR' is that it would be torturous more than soothing.
Once a person dies, they're gone. The world is different, and nothing will bring them back. Spending time with a simulacrum isn't really spending time with a loved one.
That might be true for you but lots of people imagine talking to dead loved ones and imagine the responses and advice they'd give. Some of them do that reflection in front of a picture of their dead loved one. Seeing them in VR would just be an enhancement. No need to hate on it.
I mean, already in recent years people have made some very low fidelity 'resurrections' and gotten some measure of comfort from it, never mind the many years of history of people who visit gravestones to 'chat' (some even believing they get replies to some extent or another). When markov chains were hip, "talk with Charles Dickens!" (play with a markov model trained on his works) was at least interesting to some, GPT of course can do a lot better. Imagine we had actual superintelligent AI working on this, which actually tried to recreate brain models, or even restore a sense of existence to the resurrection that can continue independently and to their own delight rather than just being some static VR experience turned on and off whenever. (I jotted some thoughts a few years ago... https://www.thejach.com/view/2017/6/how_you_might_see_some_o...)
I'm in agreement though at least for now that even if you dialed up the fidelity to the point I can't point to anything objectively "off", let alone went with a low fidelity VR thing or just a chat bot, I'd still always think in the back of my mind that this isn't really the same person. Nevertheless, it could still be comforting to some, and interesting to others, and so if it's at all possible for potential future humans/ems/aligned AIs to work at it, they will.
Curiously I don't have the same back of the mind feeling at all when considering the idea of someone preserved with cryonics and then brought back as an em, the person would be the same to me, even if there was a bit of damage from the vitrification process that was error-corrected. I have a disagreement with a friend on that point, who thinks it would be something similar but not the same. Anyway, I think this is due to just having a much higher fidelity "source of truth" to work from that is the preserved adult brain, whereas someone supposedly information-theoretically dead requires a lot more guess-work, perhaps truly impossibly too much more even for a superintelligence, to bring back convincingly.
I suppose... but I'll always know that it's some sort of cheap trick to make me think they're still here when really their consciousness ceased.
Will the simulacrum age? Will it change? Will it ever surprise me or intrigue me? And if it does, is it something the dead person truly would have done?
It's sort of like a photograph; a photograph seems to capture reality but all it shows is some abstraction of a physical reality at one point in time. The photograph tells me nothing about the current reality of anything depicted within.
A simulacrum gives me the person I knew, when they died, and that's it. Perhaps comforting and interesting, but ultimately unfulfilling and unsettling I would imagine.
To me, the whole idea of a recreation of a person for my own daily comfort just cheapens the former existence of that person. It's one thing to have genuine photographs or even videos since there's an obvious delineation between memory and reality. Photos can make it even more obvious that the person is no longer alive. But to turn someone into an AI for the sake of coping (and denial) turns them into a product.
Maybe that's fine for some people. To me there's a line where it crosses over into offense. In no way do I think the status quo of my mental well-being is so important that I'd replace someone with a digital robot facsimile.
Yes, but I get the unstated implication. I don't think it would be fair to apply it to me even though I think it may be fair to apply it to e.g. Kurzweil, unless he's made recent statements suggesting otherwise (I don't keep up with him). I currently have no expectation of seeing a convincing simulation/resurrection/recreation/continuation of any of them, or the ones I currently anticipate losing over the next few decades, even should I go on indefinitely living, and don't expect anything at all if/when I should die.
If the 'avatar' is convincing and can have a continued existence as a person independent of my interaction with it, i.e. I don't "have" them as a form of possession, yes. It does seem better than 50/50 to me though that some (maybe all) wouldn't want continued existence and would decide to go back to not existing (for all I can tell), there may even be strong predictive signs of that in the brain models such that they don't even need to be temporarily brought back and asked or first made to listen to arguments or just have some final-final talks with me/others before deciding. I'd accept that.
For less convincing avatars where the point is just my own benefit of conversing with something like them when I want, from slightly like them to eerily like them, for one it's a weak yes, for the others I'm more indifferent -- it'd be more in the realm of curiosity than desire, like talking to a historical figure or a fictional character. The weak yes I expect will get weaker (as it already has, despite non-linear flare-ups/resurgences where it's temporarily stronger) and eventually match the others after long enough.
I was going to suggest the HBO film "Reminiscence", which was panned as mediocre but I liked it for being Noir and somewhat original. It features a machine that can replay your own memories and record them for others -- useful for investigating crime, but otherwise highly addictive to the nostalgic.
Reading about The Dead Past tho, there's a good deal of overlap with the plot device of "Devs", a periscope into deep history, but which Nick Offerman's character uses to re-live moments of his daughter's short life, as Asimov's protagonist does.
Seriously... the idea of living a quadrillion is hardly an extension of 100. You can still never go backward. You still need to plan your life. Death would still be a common occurrence.
I would much MUCH rather live much longer. I don't hate my life at all.
Literally there are about the same number of atoms in the universe as the number of shuffles in a 52 card deck (52!). The number of topics that can be played with in a joke is beyond even this limited number.
Remember, jokes about tinder/grinder dates didn’t even exist a decade ago.
That said, what would you do with your quadrillion year long lifespan, assuming you're healthy during it? It's way past everything you can do and learn, and cosmic level events are not what the human mind can perceive as they slowly unfold...
I would like to live a couple thousand years though, as long as my loved ones get to be along for the ride. Though I wonder if my loved ones would eventually become my hated ones...
Such arrogance, humanity isn't all that far beyond it's banging rocks together phase. We haven't even scratched the surface. There is so much more to learn.
It's not arrogance. There aren't an infinite number of things to do and learn. It may seem so when your lifespan is less than a century, but quadrillion years? You'd run out of things that make sense for a human being.
I'm fond of my loved ones, but I also like creating new ones. I also would like to see very long term plans come to fruition. It seems very much like a "640K ought to be enough for anybody" problem.
Works of art that require thousands of lifetimes to complete. Deep understanding of the physical world. Exploring the universe. Much more.
I get what you mean, but I don't think the human mind can contemplate things that take thousands of years to unfold; it's just not wired for that. Even things that take years are difficult for us to comprehend.
Re: creating new beings. How long till you also get bored of that? Remember we're talking about quadrillions of years.
An additional thing to consider: if you could live quadrillions of years and still be able to perceive things that take thousands or millions of years to unfold, wouldn't you subjectively be living the lifespan of a normal human being? That is, your life wouldn't feel long to your mind, but normal (your mind would have to adjust to run more slowly). Wouldn't then regret your "short" lifespan and wish you could live longer?
It seems like you assume you would remain human for a quadrillion years. Why? I'd assume a technology to upload yourself into a better substrate (one you have full control over) will appear within next few hundred years. From that point there are no limits on what you can do or become.
Your answer is the only valid counterpoint, in my opinion. Since the human mind cannot perceive the passage of quadrillion years, we would have to become something else, something inhuman.
I don't know whether this is something I would wish for myself. Maybe. I like being human. But maybe, who knows?
This extends to a cohort of people living thousands of years, having memories of each other to thousands of years in the past. Plus they all need beds.
I miss Black Mirror. Some episodes were hit-n-miss, but many of them made me think. The best part was that each episode was mostly independent of the other.
Hah! I made this suggestion here about a couple years ago. It's been my dream-pet-project-I'll-never-create for a while now, but it seems like more talented and predisposed individuals have also considered it / may be working on it already...
If that technology existed a lot of people would live in misery knowing that they could be seeing their dog and their parents and speak to them one last time, but only when their time to goes come. Either that or people would be able to use it whenever they wanted or could pay for it and it would become a new highly addictive and harmfully dissociative tech product that I venture to guess would be like 10x more harmful to society than Facebook ever was.
I don't even know if it would have some kind of therapeutic use of whatever. Yeah ir would be a wonderful experience and I... ehh... let's say "dreamed" about similar experiences a few times and it was a powerful "dream". But being able to do it on demand would I think change how grieving works in the modern world so much! And we rely on the things that went to not actually be there for us on demand to be able to move on.
It's a beautiful idea but it's as beautiful as the fear of nuclear MAD keeping the first world safe from war in their own land, not as beautiful as a poem or a flower or a good memory.
I suppose I'm less pessimistic. I assume that eternal life wouldn't be the same experience as the corporeal passage of time. Time would exist of course, but would be perceived either as its true essence, or at least as a higher dimensional projection. Rather than be anchored in time at a constant velocity, you would be able to move and exist independently from it.
I'm with you on this. I would love to be able to connect with ancestors I barely or never met, even if it only captures a fraction of their essence.
I worked on this for a while but Microsoft holds a pretty broad patent on this concept, which scared me off.
what's fun is my mom has a bunch of 3D photo slides from the 1950's and 1960's so we could bring those in to stereo VR really easily. I still need to borrow my friend's slide scanner...
> Eternal life doesn’t appeal to me much. Imagine living a quadrillion years. Even as a god, that would be miserable.
That isn't necessarily what eternal life would be. Indeed, eternity is not temporal at all.
For example, in the Catholic understanding of heaven, those in heaven exist in aeveternity or aevum[0], a state in between the temporal and the eternal. Eternity[1] is proper only to God and is by definition timeless, with no beginning or end, so it would not make sense to speak of quadrillions of years. Where God is concerned, there is, loosely, only a now with no beginning, no end, no past, and no future, only the present, to use temporal language analogically.
Furthermore, heaven is in part characterized by the beatific vision[2] which is an immediate, direct, and inexhaustible knowing of God (Being Himself) which is Man's ultimate and supreme happiness. In this life, knowledge of God is generally mediate like much of human knowledge.
In other words, thus understood, the best of this life is but a faint shadow of a shadow in comparison to the ultimate fulfillment of heaven which is Man's proper end.
My prediction/hope is that NeRFs will totally revolutionize how the film/TV industry. I can imagine:
- Shooting a movie from a few cameras, creating a movie version of a NeRF using those angles, and then dynamically adding in other shots in post
- Using lighting and depth information embedded in NeRFs to assist in lighting/integrating CG elements
- Using NeRFs to generate virtual sets on LED walls (like those on The Mandalorian) from just a couple of photos of a location or a couple of renders of a scene (currently, the sets have to be built in a game engine and optimized for real time performance).
This sort of stuff (generating 3D assets from photographs of real objects) has been common for quite a while via photogrammetry. NeRFs are interesting because (in some cases) they can create renders that look higher quality with fewer photos, and they hint at the potential of future learned rendering models.
There is no mesh here. Nerfs are 5d (colours are computed based on a 3D position vector + a view direction vector) fields that are rendered volumetrically. So the “texture” is an integral part of the neural representation of the scene, not just an image applied to a mesh.
The cool part is that this also allows for capturing transparency, and any effects caused by lighting (including complex specular reflections) are embedded into the representation.
Nitpicking, but for GP; Nerf is the internal representation, but the output doesnt have to be 2D (ray traced basically)
There are examples of people outputting SDF (and by extension geometry) with nerf, and projecting original texture onto that would give some nice effects; (live volumetric works best this way) though there would be some disparity where edges/occlusion isnt perfect, so youd want to sample nerf's rgb anyway... although a lot of that is fuzzy at the edges too. A lot of incorrect transparency at edges looks great in the 2D renders (so much anti aliasing and noise!) but less good for texturing
A NeRF is not the same as an SDF though. NGP (the paper by Nvidia linked here) can train NeRFs and SDFs, but I don't know of any straightforward way of extracting an SDF from a NeRF.
And while it's true that there are methods for extracting a surface from a NeRF, achieving a high quality result can be challenging because you have to figure out what to do with regions that have low occupancy (i.e. regions that are translucent). Should you consider those regions as contained within the surface, or outside of it? Especially when dealing with things like hair, it's not obvious how to construct a surface based on a NeRF.
Perhaps even making non-gimmicky live action 3d films.
Having 3d renders of the entire film without needing green screens and a bunch of balls seems like it would have to make some of the post processing work easier. You can add or remove elements. Adjust the camera angles. More effectively de-age actors. Heck, even create scenes whole cloth if an actor unexpectedly dies (since you still have their model).
Seems like you could also save some time having fewer takes. What you can fix in post would be dramatically expanded.
Best part for film makers, they are often using multiple cameras anyways. So this doesn't seem like it'd be too much of a stretch.
> - Using lighting and depth information embedded in NeRFs to assist in lighting/integrating CG elements
> - Using NeRFs to generate virtual sets on LED walls
Sounds like a powerful set of tools to defeat a number of image manipulation detection tricks, with limited effort once the process is set up as routine. State actor level information warfare will soon be a class of its own. Not just in terms of getting harder to detect, but more importantly in terms of becoming able to produce "quality" in high volume.
hmmm well I still think they will be in demand for the same reason software developers will be not automated away. NeRF is really mind boggling good but there are still artifacts, and something that modelers have a good eye for.
Having said that, it might be the end for any junior type of roles. Same reason that github copilot really takes a bite of the need to have a junior developer.
I'm very curious what will happen because it will become a sort of trend across other industries apart from legal or medical professions (peace of mind from human-in-the-loop).
People made clay sculptures of CG characters as a modeling technique for a long time. It’s still done, but digital sculpture tools are getting easier to use so it’s not as common as it was.
Not even a full 3-D environment is required, just a bit of 6DOF and parallax would go a long way. I think VR videos (ok, porn) has gone as far as it can without head tracking.
Have you seen the first episode of Halo, there are multiple outdoor scenes where you feel sure it's a recording than a CGI render. The uncanny valley is almost crushed
I'm thinking it would be something like: I want to be the baddy in Die Hard and want the protagonist to be Peter Griffin (cartoon version). The system feeds you the movie ... I'm imagining there could be an industry for writers to create the off-screen plots of other characters and principaly it would be rendered with the same scenes as the original movie.
I wonder what happens to most people when they see innovation such as this. Over the years I have seen numerous mind-blowing AI achievement, which essentially feel like miracles. Yet literally after an hour I forget what I even saw. I don't find these innovations to have a lasting impression on me or on the internet except for the times when these solutions are released to the public for tinkering and they end up failing catastrophically.
I remember having the same feeling about chatbots and TTS technology literally ages ago, but at present time, the practical use of these innovation feel very mediocre.
besides being very helpful for edge case uses such as for differently abled people (maybe this is subjective) but I don't find any of these to be exciting or mundane life quality improvements at all for the general population.
I have the impression, that now some of them seem to really end up in some practical applications. Funnily enough someone just today showed me a feature of his phone where you can select some undesired objects in youe photo and it would just replace them with a fitting background indistinguishable from the original photo.
It was already debunked. If you scroll down in that Twitter thread, you read:
> Big news! I sent @sdw
the original image. He theorized a leaf from a foreground tree obscured the face. I didn’t think anything was in view, and the closest tree is a Japanese Maple (smaller leaves). But he’s right! Here’s a video I just shot, showing the parallax. Wow!
The followup on that is... it didn't happen. There was a leaf in the foreground, and the depth of field in the photo was large enough that it was in focus rather than blurring. https://twitter.com/mitchcohen/status/1476951534160257026
Another comment mentioned education: I take classes, formal education, for things that are immediately applicable for me right at the time
So its pretty consistent for me
Another thing that might be different is that I don't go for perfect or good enough, or worry that an existing alternative like "having an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem" might already exist, I just go for novelty and am selling to people that also like novelty
hmmm I really find this to be different from chatbots, in fact it took me a lot of skepticism to overcome before using github copilot and I saw a new reality where it became part of the process, albeit, not as prolific but enough to make me ponder what the next evolution might be.
For 3D modelers, this is huge since it takes a lot of experience and grunt work to put the right touches to get an even a boilerplate 3D model. So much so that many game companies have outsourced non-human 3d modeling, this would certainly impact those markets.
1) It could further lower the cost and improve quality.
2) Studios could move back those time-consuming tasks on-shore and put an experienced in house artist/modeler to manage the production.
3) Hybrid of both
What I see here is that NeRF has a far more impact to the 3d modeling/animating industry than github copilot. Another certainty is that we are going to see faster rate innovation. We are at a point where a paper released merely months ago are being completely outpaced by another. The improvement in training time that NeRF offers is insane, especially given how quickly this new approach came out.
We could be at a future where the release of AI achievements will not be able to keep up with published works. It would be as fast as somebody tweeting a new technique, only to be outdone by somebody weeks or possibly days.
The problem is that when the thing is initially announced, it's not useful to anyone yet, because it's not productionized and released to the general public.
But then once it IS released to the general public, it's probably been at least several months, maybe even multiple years since the announcement, so people are like, "yawn, this is old news."
I would love to see this being popular in VR. I enjoy google earth in VR way too much (it is just 360 photos) and there are some 3d real scenes you can walk in
I think with any tech demo (or other corporate PR piece), it is good to assume the worst, because companies spin things to be as ducky as possible. This is a self-reinforcing cycle, because if two companies have identical products, then the best liar--er, marketer--will win.
(not to say this sort of behavior is exclusive to corporate PR. as the best and smartest person ever, I would never need to exaggerate my achievements on a job application, but others may)
I don't really understand why NeRFs would be particularly useful in more than a few niche cases, perhaps because I don't fully understand what they really are.
My impression is that you take a bunch of photos in various places and directions, then you use those as samples of a 3D function that describes the full scene, and optimize a neural network to minimize the difference between the true light field and what's described by the network. An approximation of the actual function, that fits the training data. The millions of coefficients are seen as a black box that somehow describes the scene when combined in a certain way, I guess mapping a camera pose to a rendered image? But why would that be better than some other data structure, like a mesh, a point cloud, or signed distance field, where you have the scene as structured data you can reason about? What happens if you want to animate part of a NeRF, or crop it, or change it in any way? Do you have to throw away all trained coefficients and start again from training data?
Can you use this method as a part of a more traditional photogrammetry pipeline and extract the result as a regular mesh? Nvidia seems to suggest that NeRFs are in some way better than meshes, but according to my flawed understanding they just seem unwieldy.
If you tried to repro these results (including time & space constraints) using traditional photogrammetry, you would be sorely disappointed.
Photogrammetry is great if you have a very solid object that is not shiny or translucent at all. You get a lot of surface color micro-detail and a bit of bumpy meso-detail.
But, if something is fuzzy, hairy, or lacey or smokey you are straight-up out of luck. Don't even try.
If it is shiny, it can be difficult to capture at all --let alone capture the shine. Material capture techniques that are not "chalk sculpture" are rare, very limited and usually experimental.
NeRFs however are pretty much a photograph that you can walk around in. They have about as much structure as a photograph ;) But, that lets them not care about the mathematical definition of your scuffed-up, lacquered, iridescent, carbon fiber mirror frame and just show it as it looks from whatever angle.
Maybe NeRFs can be used as an intermediate step to reconstruct the scene, and then extract the surfaces and their materials to more conventional representations like meshes, textures, refraction index, etc. I guess the main benefit is that it fills in the undersampled areas in a scene, whether that's an occlusion, a reflection angle, or something else.
My main problem with them is that it seems as if all the data is unstructured and interdependent, not like pixels, voxels, or similar where you can clearly extract and manipulate parts of the data and know what it means. To use your photograph example, a digital photo is a simple grid of colored points, and it's easy to change them individually. A regular 3D scene is a collection of well defined vertices, triangles, materials etc, that is then rendered into a digital photo using a easy to describe process. A NeRF on the other hand seems to be more like enter camera pose => magic => inferred image.
Maybe I'm overthinking it and it doesn't have to be as general as out current formats, maybe a binary blob that can represent a static scene is fine for plenty of applications. But it feels needlessly complicated.
That's more interesting than I realized. In this example, I assumed that the model was generating some sort of 3D mesh representing the woman. Is that not at all the case? Would this technique be unable to generate a model or volumetric information despite being able to reasonably render her from many directions?
No, there is no mesh. A NeRF is a neural network trained to work as a function f(x, y, z, θ, φ). You put in your viewing position (x, y, z) in 3D space and the direction (θ, φ) you're looking into (where θ and φ are the angles for up/down and left/right, respectively), and the function will output a tuple (r, g, b, σ) of the colour (r, g, b) and the material density (σ) of whatever you see at the pixel in that direction and from that position.
You can generate a mesh from the density information this function gives you, but for that you need to discretise the continuous densities you get out.
The pros in the vfx industry still all use reconstructed geometry. And yes, animating or cropping a Nerf is painful.
In my opinion, Nerf is more about showing progress in making AI memorize 3D scenes and the hope is that this will lead to actual understanding sometime in the future.
What happens if you want to animate part of a NeRF, or crop it, or change it in any way? Do you have to throw away all trained coefficients and start again from training data?
You don’t change NeRF (the model). You change the point of view of an observer.
I mean, this is the parent posts point; the use case for a static photo or a static 3d nerf is pretty limited.
With other structured data compositing and animating is relatively trivial.
It turns out that people have approached this problem before and you can composite nerf too (1) by sampling different functions over the volume.
…but, let’s not pretend.
The complaint is entirely valid. You’re taking a high resolution voxel grid and encoding it into a model.
Working with simple voxel data let’s you do all kinds of normal image manipulation techniques, and it’s not clear how you would do some of those with a nerf.
Practically speaking, the applications you can use this for are therefore reasonably limited right now.
> Working with simple voxel data let’s you do all kinds of normal image manipulation techniques, and it’s not clear how you would do some of those with a nerf.
Any image transformation you can do on voxels you can straightforwardly transfer to nerfs. Voxel data is just a lookup table from discrete positions to material properties like color and density. When you apply a transform, you change the inputs (e.g. multiplying them with a rotation matrix) or the outputs (e.g. changing the color). If you want to do the same thing with a nerf that maps continuous positions and directions to material properties like color and density, just transform the inputs or the outputs.
The major difference is that with voxel data you can easily do output-modifying transformations directly on the stored representation, while for nerfs it might be cheaper to do it on the fly instead of redoing the training procedure to bake the change into the model.
> Any image transformation you can do on voxels you can straightforwardly transfer to nerfs
No.
> it might be cheaper to do it on the fly instead of redoing the training procedure to bake the change into the model.
I think it’s a bit more complex than you imagine; it’s not “cheaper/not cheaper”; it’s literally the only way of doing it.
If you have a transformation f(x) that takes a pixel array as in input and returns a pixel array as an output, that is a trivial transformation.
If you have a transformation that takes a vector input f(x) and returns a pixel output, it’s seriously bloody hard to convert it to a “good” vector again.
Consider taking a layered svg and applying a box blur.
Now you want an svg again.
It’s not a trivial problem. Lines blur and merge, you have reconstruct an entirely new svg.
Now you add the constraint in 3d; you can never have a full voxel representation in memory even temporarily because of memory constraints.
At best you’re looking at applying voxel level transformations on the fly to render specific views, and then retrain those into a new nerf model.
I think that counts as … not straightforward.
Doing all your transformations on the fly is a lovely idea, but you gotta understand reason nerf exists is that the raw voxel data is too big to store in memory. It’s simply not possible you can dynamically run a image processing pipeline over that volume data in real-time. You have to bake it into nerfs to use it at all.
> Consider taking a layered svg and applying a box blur.
> Now you want an svg again.
> It’s not a trivial problem. Lines blur and merge, you have reconstruct an entirely new svg.
Nerfs are not svgs.
Consider taking a nerf and applying a box blur. Easy, a box blur on voxel data takes multiple samples within a box and averages them together, so to do the same thing to a nerf just take multiple samples and average them together.
That does get slower the more samples you need, but you never have to materialize a full voxel representation.
> Doing all your transformations on the fly is a lovely idea, but you gotta understand reason nerf exists is that the raw voxel data is too big to store in memory.
> It’s simply not possible you can dynamically run a image processing pipeline over that volume data in real-time.
You are right, this is why approaches like plenoxels are vastly faster than nerf. They combine the optimization approach of neural nets, but combine it with a simple and regular data representation of the scene.
This is great, and the paper+codebase they're referring to (but not linking, here [1]) is neat too.
The research is moving fast though, so if you want something almost as fast without specialized CUDA kernels (just plain pytorch) you're in luck: https://github.com/apchenstu/TensoRF
As a bonus you also get a more compact representation of the scene.
>The model requires just seconds to train on a few dozen still photos — plus data on the camera angles they were taken from — and can then render the resulting 3D scene within tens of milliseconds.
Generating the novel viewpoints is almost fast enough for VR, assuming you're tethered to a desktop computer with whatever GPUs they're using (probably the best setup possible).
The holy grail (from my estimation) is getting both the training and the rendering to fit into a VR frame budget. They'll probably achieve it soon with some very clever techniques that only require differential re-training as the scene changes. The result will be a VR experience with live people and objects that feels photorealistic, because it essentially is based on real photos.
You can just make a mesh once you have the NeRF, which is plenty fast for VR. (What's important isn't new perspective generation but scene training. New perspective generation doesn't have to be real time just within a reasonable time to preprocess and then make a mesh.)
OK. So that's using a render of a NERFfield as input into photogrammetry? Yeah - that might be more feasible. I was talking about directly generating a mesh from the NERF density itself (marching cubes).
But NERFs can capture and render things that are very difficult subjects for photogrammetry: Fur, vegetation, reflective or transparent surfaces etc.
I've spent a lot of time thinking about this (i.e. taking a video and creating a 3D scene) and I don't think that it is feasible in most cases to have good accuracy. If you need to infer the angle, you need make a lot of biased assumptions about things like velocity, position, etc., of the camera and even if you were 99.9% accurate, that 0.1% inaccuracy is compounded over time. Now I'm not saying it's not possible, but I'd believe that if you want an accurate 3D scene, you'd rather be spending your computation budget on things other than determining those angles when it can be simply be provided by hardware.
You're far too pessimistic (or maybe you don't know the field well). The problem
of estimating the relative poses of the cameras responsible for a set of photos is a long standing and essentially "solved" problem in computer vision. I say "solved" because there is still active research (increasing accuracy, faster, more robust, etc.) but there are decades-old, well known techniques that any dedicated programmer could implement in a week.
If you're genuinely curious, look into structure from motion, visual odometry, or SLAM.
https://github.com/NVLabs/instant-ngp has a script that converts a video into frames and then uses COLMAP ([1]) to compute camera poses. You can then train a NeRF model within a few seconds.
It all works pretty well. Trying it on your own video is pretty straightforward.
> even if you were 99.9% accurate, that 0.1% inaccuracy is compounded over time
Not really, with SLAM there are various algorithms to keep inaccuracy in check. Basically it works by a feedback loop of guessing an estimate for position and then updating it using landmarks.
You don't even need anything that fancy. Traditional structure-from-motion, or visual odometry gives accurate enough position estimations.
If you want to experiment, take a bunch (~100) of photos of an object, and use COLMAP to generate the poses. COLMAP implements a global SfM technique, so it will be very accurate but very slow.
It's possible to capture video / movement to into NeRFs, possible to animate, relight, compose multiple NeRF scenes, and a lot of papers are about making faster more efficient and higher quality NeRF. Looks very promising.
I hope someone can take this, all the images of street view, recent images of places etc. and create a 3d environment covering as much of earth as possible to be used for an advanced Second Life or other purposes.
NERF is a very active research area, and the progress from 2020 to now has been nothing sort of astonishing. In 5 years, I expect there to be fully generative NERF's in research i.e describe a scene, and a NN produces a full 3d scene that can you interact with.
In the demo video, they mention that they used a lot of footage from self-driving cars to produce that.
One thing I noticed is there are no pedestrians and cars in those scenes. So they must do a lot of work to filter them out by combining a lot of footage. Therefore, it likely can't be used (as-is) on the street view dataset...
Yes!!!! I always get frustrated when ever it does the weird stretch thing as soon as you move around in street view. Just jump to the next frame of you have to.
This is where we were going with https:/ayvri.com but we've moved on to other projects. We still operate and are profitable, but too early for the market.
My first thoughts seeing this is darn, Facebook will with there metaverse, be drinking this up for content. So much so that my thoughts of, would I be shocked if Facebook/Meta made a play to buy Nvidia! Certainly wouldn't shock me as much now as it would before this given how they are banking upon the metaverse/VR being there new growth divergance, what with the leveling of with current services user base after well over a decade and a half.
Certainly though, game franchised films would become a lot more imersive, though I do hope that whole avenue dosn't become sameish with this tech overly learned upon.
But one thing for sure, I can't wait to bullet-time the film - The Wizzard of OZ with this tech :).
Actually it is explained in the article, I somehow missed it
The model requires just seconds to train on a few dozen still photos — plus data on the camera angles they were taken from — and can then render the resulting 3D scene within tens of milliseconds
Pretty impressive, but lesser compared to generating it from 4 photos (which imho the movie suggests). Which would be "real magic level of impressiveness" for me
This is essentially like a 3D JPEG, but instead of modelling the image with Discrete Cosine Transform (DCT) they use a neural net. So the neural net itself will learn just that one image and be able to reproduce it point by point, from various angles. A whole network for just one example, and an innovative way to look at what a neural network can be.
as someone who works in both AI and filmmaking, I remember losing my mind when this paper was first released a few weeks ago. It's absolute insanity what the folks at Nvidia have managed to accomplish in such a short time. The paper itself[0] is quite dense, but I recommend reading it -- they had to pull some fancy tricks to get performance to be as good as it is!
No, really, it's a cool application of neural nets. It was unexpected when it first came up and it took a whole day to learn a scene, but a couple of years later it can be done in seconds.
I think the cool part is that a neural net can learn to produce (R,G,B) from (X,Y,Z,angle) by using a clever encoding trick with sin() and cos() for the input coordinates. And the fact that a neural net can be a frigging JPEG in 3D.
You're missing this work in the context of the field as a whole. Labs have been releasing papers boasting 2-4x speedups and getting them published at conferences, and then this group comes in and speeds up the original by 1000x. That's a huge leap in capability.
> Creating a 3D scene with traditional methods takes hours or longer, depending on the complexity and resolution of the visualization.
This just isn't true. I can create a 3D scene from 360-degree photos (even 4) in a minute or so using traditional methods, even open-source toolkits.
It doesn't look as good as this because it doesn't have a neural net smoothing the gaps, but it's not true that it takes hours to build 3D information from 2D images.
There's something very uncanny-valley about that video. I can't decide if it's the smoothness of the shading on the textures or if it's the way the parallax perspective on the buildings sometimes is just a tiny bit off. I don't generally get motion sickness from VR but I feel like this would cause it.
You’ll find this is true of all NeRFs if you spend time playing around with them. If a NeRF is trying to render part of an object that wasn’t observered in the input images, it’s going to look strange, since it’s ultimately just guessing at the appearance. The NVidia example in the link has the benefit of focusing on a single entity that’s centered in all of the input photographs - the effect is much more pronounced in large scale scenes with tons of objects, like the Waymo one. You can still see some of this distortion in the NVidia one - pay close attention to the backside of the woman’s left shoulder. You’ll see a faint haze or blur near her shoulder - the input images didn’t contain a clear shot of it from multiple angles, so the model has to guess when rendering it.
I know when doing typical 2D video based rotoscoping it is possible to use frames from before/after the current frame to see data that is being blocked in the current frame. It's also common in restoration when removing scrathes/hair in the gate/etc.
To that end, I wonder if exporting a similar bit of video from that same path exported as stills would be enough to generate the 3D version.
Still, NVIDIA's achievement (and Thomas Müller in particular) is amazing. Thomas and his collaborators achieved an almost 1000x performance improvement, by a combination of algorithmical and implementation tricks.
Waymo needed 2.8 million images to create that scene, I wonder how many Nvidia would need? Or was the focus only on speed? I skimmed the article and didn't really find info on that.
Waymo essentially trained several NeRF models for Block-NeRF that are rendered together. It's conceivable that NVIDIA's instant-ngp could be used for that.
What's new about this? That it's faster? People have been reconstructing 3D images from multiple photos for over a decade. The experimental work today is constructing a 3D image from a single photo, using a neural net to fill in a reasonable model of the stuff you can't see.
Five years ago, I've used common software to do this. I had to take hundreds of pictures of a scene, getting as many angles and details possible. Then when you pass it to the computer. Stitching it all together took well beyond 24 hours.
Now that I had a 3d model of the scene, I had to spend countless hours cleaning it up to make sure it was useable. Maybe in the last 5 years, things have improved.
But this demo used 4 pictures. And apparently, it rendered the final image in seconds. That's what's new.
If I understand it correctly it didn't make a 3d model, though. So you can't extract and reuse the result. Only move around in it and it creates an image for that viewpoint. But no meshes or textures.
So the part which makes this interesting to me is the speed. My new desire in our video conferencing world these days has been to have my camera on but running a corrected model of myself so I can sustain apparently eye-contact without needing to look directly at the camera.
I find it hard to believe there is only 4. There are clearly more data in video
https://i2.paste.pics/645fe17e418b2cb1f6179e0b6671a170.png
like back side of camera here (it is kinda visible but much poorer compared to video). Or existence of a 2nd white sheet in background. But correct me if it is only 4 and you have a source on that
Nvidia is really turning into an AI powerhouse. The moat around CUDA, and how those target customer aren't as stringent about budget, especially when the hardware cost is tiny compare to what they do.
I wonder if they could reach a trillion market cap.
Well I think that is a little optimistic in the near terms. Considering there has never been a Semi-Conductor player reaching that milestone without a Fab. So Nvidia will be a first. At current P/E of 70, Semiconductor industry average is only ~30. Realistically Nvidia will need to triple their revenue at a fair P/E. I could see that in Data Center, reaching $30B revenue per year within next ~5 years. And this is already larger than Intel's DataCenter record revenue. We are still $30B short coming from Gaming and Professional Visualisation. Intel already has their GPU play ready in 2022. ( Assuming it is competitive. ). i.e Nvidia will need to find another massive market to conquer to reach that Trillion Market Cap status.
AI and 3D content making is becoming so exciting. Soon we'll have an idea and be able to make it with automated tools. Sure having a deeper undertaking of how 3D works will be beneficial, but will no longer be the entry requirement.
I know that taste in comedy is seasonal (yes, there were a people in a time that thought vaudeville was the cat's pajamas), but has anyone ever greeted a pun with anything other than a pained sigh?
> It's schadenfreude for the person making the pun.
Nah, if it is a joke at their own expense then it is "self deprecating humor", something which is definitely designed to get a laugh. Humiliation fetish, maybe? Obviously nothing is funny past a certain point of deconstruction... especially if you find yourself defending the distinguishing difference of the "meta". Just stop making puns, easy.
Idk, personally I find wordplay quite punny -- though I almost always try to greet the person who made the pun, and not the pun itself (they're abstract, inanimate concepts, pretty difficult to say "hello!" to)
In terms of practical use - is there a pipeline to use the NeRF 3D scenes in Unreal Engine? How many photos do you need on average vs photogrammetry? 50% less?
This is not a textured polygon asset, it's a neural field, so it's like stored the directional data in a neural network, I think using spherical harmonics.
That’s ok, just need some level of integration with UE (being able to /integrate with UE camera etc). Specifically interested in using this for LED green screens that sync to a live camera movement (Mandalorian style). Our pipeline uses UE, Nvidia cards and photogrammetry/plates/3D models atm, this could speed things up a lot and require less photos for creating the 3D backgrounds.
Next time someone says "why does everyone in AI use NVidia and CUDA"? this is why.
They do high quality research and almost inevitably end up releasing the code and models. It's possible to reconstruct all that as a non-CUDA model, but when you want to use it, why would you when it's going to take months of work to get something that isn't as optimised?
Was talking to someone 2 days ago, just died randomly, early 40's.
It's trippy, I have data of this person's face eg. videos/base64 strings... it's eerie. Unanswered texts wondering what's wrong. My thinking is I was only exposed to a part of this person, won't be them fully if reproduced.
I'm curious for those that work with NeRFs what their results look like for random images as opposed to the 'nice' ones that are selected for publications/demos.
IIRC Microsoft had something like this years ago, but the results weren't nearly as smooth or natural looking. I can't remember what it was called, though.
I remember a very old video where they rendered 3d scenes for frames 1 and 5 (examples) but interpolated the frames in between, or something like that, instead of re-rendering the whole scene. Maybe they were only redrawing some triangles if the angle changed too much.
If it's that tech, I'm pretty sure it got dropped when 3D acceleration made it feasible to just re-render the whole scene every frame. Dedicated hardware won out over software tricks.
Google has done some expirements from people taking very similarly framed images from different times to create timelapse videos, so that would hint to me they definitely have the content to try.
There was an article here some time back that showed streets in NYC that used this kind of idea of using older photographs to put one in street view of older NYC. So, yeah, I'm guessing it could be done. Might be weird with the different quality of images (modern digital, polaroids, kodachrome, etc).
I’m gonna do the same particularly dog pics. I’d love to bring my old dogs back to virtual life. Have old spanky (died a decade ago) running with lilybean (current version) in iMovie.
What's the current state of research on true volumetric displays? That's what I'm excited for, although that takes less AI and more hardware, so quite a bit more difficult.
As I’ve gotten older, and my parents get older as well, I’ve been thinking more about what my life will be like in old age (and beyond too). I’ve also been thinking what I would want “heaven” to be. Eternal life doesn’t appeal to me much. Imagine living a quadrillion years. Even as a god, that would be miserable. That would be (by my rough estimate) the equivalent of 500 times the cumulative lifespans of all humans who have ever lived.
What I would really like is to see my parents and my beloved dog again, decades after they have past (along with any living ones at that time). Being able to see them and speak to them one last time at the end of my life before fading into eternal darkness would be how I would want to go.
Anyway, there’s a free startup idea for anyone—recreate loved ones in VR so people can see them again.