I've been working on AR and related technologies for almost the last decade and ...

kevinavery · on Dec 10, 2015

The first iteration of a good AR system could simply sidestep the latency issue by embracing layers.

Magic Leap should skip the fancy stuff (mixing virtual scenes with real), at least at first, and focus the many other useful features of a great head-mounted display system - think mobile notifications, video calls, web browser, etc.

It could easily replace smart watches and later cell phones and computer monitors without solving the latency issue.

jackcwang · on Dec 10, 2015

Yep, checkout atheerair.com

zipppy · on Dec 10, 2015

Would it be possible to artificially delay the world by 15ish ms? A person would have to fully have a headset on (so it'd be more like VR than AR), but perhaps it could deliver a time-delayed view of the world only once the augmented pieces are ready to render.

Edit: you'd still have the motion-sickness challenge, but perhaps at least the 'layers', so-to-speak, wouldn't appear separately.

AndrewKemendo · on Dec 10, 2015

No. The important thing is keeping your sensory inputs in sync with your vestibular system. There were some research questions about hacking the vestibular system a few years ago.

robzyb · on Dec 10, 2015

rsp1984 says that what seperates AR from VR is the need to have latencies <15ms.

zipppy says "what if AR was delayed 15ms to make it equiv. to VR".

Zipppy's intent was to make AR compete with VR, not HVS.

jobigoud · on Dec 10, 2015

But in VR we can have even lower latencies for synthetic content.

Because we have the head tracker recent history, we use prediction on pose trajectory, and can effectively know where the head pose will be at the time the current rendered frame will be displayed. And use that predicted pose to render the scene. That type of optimization won't be possible with see-through VR or AR.

The second optimization is timewarp, where the rendered scene is distorted in screen space after the fact, based on post-render tracker data (just a few ms before display). I wonder if that type of optimization would create artifacts in AR.

x0x0 · on Dec 10, 2015

Since you're an expert: what about these videos is hard? The things that jumped out at me are:

1 - the robot moving behind the table leg (ie you have to do depth recognition of objects in the scene)

2 - the user's hand interacting with the artificial elements in the scene. Some code had to recognize a hand and figure out which element it was touching.

What strikes you as the hard parts of those videos besides the real-time requirement?

rsp1984 · on Dec 10, 2015

Well the second video is a mock-up. In the first video notice that a) the observed things are floating in space and b) the camera motion is very smooth. This is how they sidestep the "layering problem" in the video. The desk leg occluding the robot is probably done using a depth sensor.

_ntka · on Dec 10, 2015

These two things are non-trivial, but not particularly hard in themselves. However, doing them at ultra-low latency becomes quite a challenge. Doing anything at ultra-low latency is already a challenge, but especially so when what you're trying to do is running a deep neural net for entity recognition or gesture recognition.

woodman · on Dec 10, 2015

Training an ANN is computationally intensive, using a trained ANN is not. No context switching for system calls, no memory management, just matrix math.

ansgri · on Dec 10, 2015

well, first you need to know what image regions feed to ANN, and that can involve some segmentation and pre-recognition, otherwise you're going to evaluate the net at all feasible subwindows — and that's a LOT of matrix math for you. Very big GPU can help, but they have latency in themselves, and FPGA at such performance levels are inordinately expensive. Done at scale though ASICs seem to be the sure-to-work way.

woodman · on Dec 10, 2015

I'd be very surprised if a modern cpu couldn't handle the task, especially if you were clever about detecting regions of interest, predicting head movement and cache maintenance. But I'd also be surprised if they go to market with an x86 under the hood.

I remember reading a while ago about how smart tvs were using ANNs for upscaling, so it has been done at scale. rimshot

ansgri · on Dec 10, 2015

(1) TVs don't have strict latency requirement. I've hard latencies of 100 ms are common.

(2) Upscaling ANNs process rather small image neighborhood radius, and required processing power is on the order of O(r² * log r), and if a minimally recognizable cat is 50x50 px and for upscale you use a very large window of 16x16, that's 14 times already.

woodman · on Dec 10, 2015

Latencies of 100 ms may be common because TVs don't have strict latency requirements.

16x16 is a very small window, I have no idea what they're using for TVs, but 128 isn't uncommon in post production ANN upscaling. Also consider the fact that ANNs have not received anywhere close to the level of attention in optimization that compilers have, so there is also a lot of potential slack to be taken up if real time processing demands it.

rasz_pl · on Dec 10, 2015

1 or have premade 3d environment model and do accurate position tracking. Position tracking is a LOT easier to do realtime.

2 bullshit CGI "this is how we hope it would look like if it was real" demo

Few months ago their apparatus was one color only, stationary and the size of a desk. Now all of a sudden can be strapped to a camera and does colors? color me sceptical :(

ant6n · on Dec 10, 2015

The weapon is physically there tho

mondoshawan · on Dec 10, 2015

I, too, was part of the first few folks to work on glass. Now I work at ML. Name here is June. We should talk. :-)

rsp1984 · on Dec 10, 2015

Cool, you have an email or Skype name?

mondoshawan · on Dec 10, 2015

[email protected] -- feel free to chat me up on Hangouts.

haydenlee · on Dec 10, 2015

The 10-15ms motion-to-photon latency is pretty vital in VR too so that users don't get sick.

acgourley · on Dec 10, 2015

It's a spectrum, but VR can do with more like 50 and be very good.

jobigoud · on Dec 10, 2015

I disagree, 50ms gives a very noticeable wobble/distortion and will make most people sick in addition to look wrong.