The critical problem with optical body and hand tracking in VR

There is a critical problem with optical body and hand tracking in VR that has so far gone unaddressed - perspective. I'm not talking about how tired we all are of watching fish-eye lens videos. I mean we're only using one point of view. Current tracking solutions can't see around things.

Let's go back in time a bit to the announcement of the Kinect at E3 2009. As you may remember, we were initially promised one-to-one body tracking. People were really excited. If you punched, your avatar in the game would punch. If you dodged, your avatar in game would dodge.

E3 2009 promo video

But games that "harnessed the full power of the Kinect" never came. Partially because the resolution of the IR cameras on the Kinect was not high enough. With the Kinect 2 Microsoft has significantly increased that resolution to provide much higher fidelity and that has helped a lot. But there is still the problem of perspective.

If I have my body facing the Kinect it can get a pretty accurate picture of where all my joints are. In this gif the blue colored lines are the Kinect 2's tracking of my joints overlaid on top of the Kinect 2's video feed.

Facing Kinect Gif

As you can see, it does a very good job of tracking your body if you are directly facing it. That does allow for the kind of direct one to one body to avatar tracking experience shown in the initial demo. For a traditional gaming experience you're generally going to be facing the screen. But generally is not always, and that's where we run into trouble.

If I turn with the left side of my body facing the Kinect, it can no longer see the right side of my body. The Kinect is a stationary device. This means that it has a static view of the user. If you turn your body it can't circle around you to maintain an ideal tracking position. It has to attempt to use fancy prediction algorithms to guess where the parts of your body are that it can't see.

With VR this becomes a much larger problem. The beauty of virtual reality is that we get whole environments to explore and be immersed in. If the user turns around to look at something cool suddenly the Kinect can't see the front of their body. For the user this manifests as arms / hands freezing in a static location in the best case, or flopping around randomly in the worst case. Here I've got the Kinect's view and tracking on the left. On the right is another camera pointed at me from the side to show you what I'm actually doing.

Not facing gif

You'll notice the Kinect barely recognizes a change at all. It has some nice prediction with my left arm, correctly guessing that it's idle position is in front of my body. But when I move my arms up it has no way to track that and guesses that they stay stationary.

This kind of behavior in game would completely and immediately break immersion. And that's not even the worst of it. If you've got hand/body tracking then you're likely using it as some form of input too. So now, not only is the user violently aware of the limitations of their simulation, but they can't interact with it either.

Dynamic directional tracking with the Leap Motion

The Leap Motion is basically a tiny Kinect, except instead of doing whole body tracking, it specializes in finger tracking. The Leap Motion launched in early 2013, initially developed as a gesture-based system for anything from games to spreadsheets. The idea being that developers would invent whole new user interfaces for computing to capitalize on this new form of input. You set it up by placing it on your desk in front of the keyboard and plugging it in to USB.

Leap Motion on desk

Developers quickly discovered that interaction with a 2D screen using a 3D input device made zero sense. There were some cool demos but it I never saw any applications that amounted to much more than fancy gimmicks. Conveniently, as the Leap Motion was failing to deliver us into the "future of computing" - VR started its comeback.

Having finger tracking in VR is absolutely amazing. I can't stress this enough. As our lord and savior Palmer Luckey mentions every time he gets a chance, the first thing users do when they put on a headset is look for their hands. Seeing them there, is just awesome.

Like most of VR it's one of those things that is hard to express if you haven't tried it. But for as awesome as having your hands in VR is, having to keep your hands positioned over a tiny device - that you can't see - is terrible. One of the other drawbacks of the Leap Motion is that it has a very small range. This limits the position of your hands to directly in front of you in the environment. You have to be very conscious of where the sensor is so your hands maintain tracking. If you move your hands outside of these bounds then suddenly they disappear.

Some developer had a stroke of genius and realized that the majority of the time you're using your hands - you are also looking at them. And an elegant solution was born.

Leap Motion strapped to Oculus Rift

There are actual elegant solutions but they require a 3d printer or cost money. Duct tape works okay, though it does mess up the DK2's positional tracking.

Attaching the Leap Motion to the Rift greatly improves the perspective problem. Now, instead of being a static perspective you get a dynamic directional perspective. The device can virtually see what you see.

Moving while using Leap Motion

However

The Leap Motion seeing what you see isn't good enough. Even if we imagine the other issues of the Leap Motion (resolution, field of view, range) get resolved - there is still the issue of singular perspective. Prediction algorithms could be significantly improved, but they can still only do so much.

If I close my fist with the back of my hand facing the leap motion it cannot see any of my fingers. If I then extend my fingers it can, at best, see little slivers of one section of my fingers. That's simply not enough data to create a full 3d model from.

In the gif below you can see what I'm talking about. The left is the leap motion's IR camera's view with the fingers it is tracking overlaid on top. The right is a side view of what my hand is doing. As I move my fingers the Leap Motion cannot see them and so does not track the changes in their position. It just leaves my fingers as a clenched fist. Bad news if I'm in VR trying to let go of an object I'm holding.

Gif of side by side leap view and side camera view

Supplementing the dynamic-directional camera of the leap motion with the DK2's position tracking camera would in theory give you more data. Adding another perspective would significantly help the tracking issues. However the DK2's positional camera has a static position. Once you turn to the side, or your hand goes below the level of that camera, you run into the same issue as before - lack of data. Even in the example gif above, it's hard for the human eye to detect I'm extending my fingers one by one, an IR camera isn't going to do any better.

Just add lots of cameras!

As simple as this is, it's not a bad idea, technically. With 3 Kinect style cameras positioned triangularly around you now you have three separate perspectives to look at the user from. If one camera can only see the back of your palm then it's nearly assured that at least one other camera can see the front. Combining the data from multiple perspectives makes for very accurate models.

Though, this is pretty cumbersome for a user.

Those Kinects have to be setup at about waist height. They also need power running to them. Then they need cables running to 3 USB3 ports on a desktop computer. You also have to position the Kinect 2 a minimum of 3 feet away from you for proper tracking. Imagine a 6 foot tracking radius for the user to be able to extend their arms inside of and you've got a space requirement of about 10 foot square. And that's not accounting for the user stepping forward at all. Tack on the $600 (pre tax) price tag of those three devices (without mounts) and you're arguably out of reach of the average VR enthusiast.

No other solution

There really is no other solution to this issue. The data simply isn't there and the only way to get it is to add more sensors and average the data. Ignoring this problem or trying to get around it with shmancy algorithms is just going to make for crappy user experience due to the inevitable loss of tracking.

The good news is that the concept of this type of multi sensor setup makes a lot of sense for a system like the Virtuix Omni or the Cyberith Virtualizer. Though, there is still the issue of added cost to an already expensive product. And they may not look quite as slick with a few sensors sticking out on all sides. But it could provide real, 360 degree, full body and finger tracking. No other solution out there can claim that.

FULL ARTICLE