Facebook Reveals Computer Vision Research That Could Hasten Development of AR Smartglasses

Just like Apple and Google, Facebook has been working to develop computer vision shortcuts designed to give mobile apps augmented reality superpowers.

And while Apple's ARKit and Google's ARCore use computer vision to guess the position of horizontal and vertical surfaces, Facebook researchers now claim to have prototype mechanisms that can now derive an object's 3D shape based on its 2D image information.

Don't Miss: Facebook's Brain Control Interface Research Could Eventually Produce Thought-Controlled AR Wearables

If Facebook's research is as groundbreaking as it sounds, then, just as the aforementioned mobile toolkits from Apple and Google represent the foundation for future AR wearables, Facebook's breakthroughs could contribute to its own smartglasses developments.

This week, Facebook research scientists Georgia Gkioxari, Shubham Tulsiani, and David Novotny published their findings on four new methods for 3D image recognition, which will be presented at the International Conference on Computer Vision (ICCV) in Seoul, South Korea, which runs until Nov. 2.

Facebook Reveals Computer Vision Research That Could Hasten Development of AR Smartglasses — *Image via Facebook*

Two of those methods concern identifying 3D objects from 2D images. Building on the Mask R-CNN model for segmenting the objects in an image (and presented at the same conference last year), Mesh R-CNN infers the 3D shapes of those identified objects while compensating for occlusion, clutter, and other challengingly-composed photos.

"Adding a third dimension to object detection systems that are robust against such complexities requires stronger engineering capabilities, and current engineering frameworks have hindered progress in this area," wrote the team in a blog post.

In addition, the team built another computer vision model that serves as an alternative and complement to Mesh R-CNN. The cheekily-named C3DPO (Canonical 3D Pose Networks) system can accomplish the feat of large-scale reconstruction of 3D objects using only 2D keypoints for 14 object categories, such as birds, people, and automobiles.

"Such reconstructions were previously unachievable mainly because of memory constraints with the previous matrix-factorization-based methods which, unlike our deep network, cannot operate in a 'minibatch' regime. Previous methods addressed the modeling of deformations by leveraging multiple simultaneous images and establishing correspondences between instantaneous 3D reconstructions, which requires hardware that's mostly found in special labs," wrote the team. "The efficiencies introduced by C3DPO makes it possible to enable 3D reconstruction in cases where employing hardware for 3D capture isn't feasible, such as with large-scale objects like airplanes."

The team also built a system for canonical surface mapping, which takes generic image collections and maps them to 3D shapes. This helps computer vision applications better understand common properties between different objects in an AR scene.

"For instance, if we train a system to learn the correct place to sit on a chair or where to grasp a mug, our representation can be useful the next time the system needs to understand where to sit on a different chair or how to grasp another mug," wrote the team. "Such tasks can not only help deepen our understanding of traditional 2D images and video content, but also enhance AR/VR experiences by transferring representations of objects."

Finally, VoteNet is an experimental 3D object detection network that can accurately understand a 3D point cloud using just geometric information rather than color images.

"VoteNet has a simple design, compact model size, and high efficiency, with a speed of about 100 milliseconds for a full scene and a smaller memory footprint than previous methods designed for research," the team claims. "Our algorithm takes in 3D point clouds from depth cameras and returns 3D bounding boxes of objects with their semantic classes."

The latest innovations in 3D imaging from Facebook's research team echo a similar development from last year that was the result of a collaboration with University of Washington researchers at the Facebook-funded UW Reality Lab. That team up resulted in Photo Wake-Up, a method for generating 3D animations from 2D images.

Some of Facebook's AR research becomes commercialized quicker than others, so while it may be some time before we see all of these innovations in action, the company tends to move quickly when it has something worth introducing to the public. For example, using the Mask R-CNN model, Facebook researchers developed a body mask method that has already made its way into Facebook's Spark AR platform.

This latest research, though, could contribute to Facebook's ongoing development of AR smartglasses. Along with its Live Maps AR cloud platform, Facebook could in the future potentially simulate the real-world occlusion and spatial computing capabilities of the HoloLens 2 or the Magic Leap One by giving its smartglasses the power to identify 3D objects.

Of course, the key is how all of this might eventually look in a housing that meets mainstream tastes in a way that would truly usher in the golden age of AR smartglasses.

Just updated your iPhone? You'll find new features for Podcasts, News, Books, and TV, as well as important security improvements and fresh wallpapers. Find out what's new and changed on your iPhone with the iOS 17.5 update.

Cover image via Facebook