News


Generating 3D Models in Mobile: Sony’s 3D Creator Made Me a Bobblehead

Generating 3D Models in Mobile: Sony’s 3D Creator Made Me a Bobblehead

In a show like IFA, it’s easy to get wide-eyed about a flashy new feature that is being heavily promoted but might have limited use. Normally, something like Sony’s 3D Creator app would fall under this umbrella – a tool that can create a 3D wireframe model of someone’s head and shoulders and then implement a 4K texture over the top. What is making me write about it is some of the implementation.

Normally in a single photo, without subsequent depth map data, creating a 3D model is difficult. Also, depth data would only show points directly in front of the camera – it says nothing about what is around the corner, especially when it comes to generating a texture from the image data to fit the model. With multiple photos, by correlating points (and perhaps using internal x/y/z sensor data), distances can be measured for identical points and a full depth map can be done taking the color data from the pixels and understanding which pixel would be where in that depth map allows the wireframe model to be textured.

For anyone who follows our desktop CPU coverage, we’ve actually been running a benchmark that does this for the last few years. Our test suite runs Agisoft Photoscan, which takes a set of high-quality images (usually 50+ images) of people, of items, of buildings and of landscapes, and builds a 3D textured model to be used in displays, games, and anything that wants a 3D model. Normally this benchmark is computationally expensive: Agisoft splits the work into four segments:

  1. Alignment
  2. Point Cloud Generation
  3. Mesh Building
  4. Texture Building/Skinning

Each of these segments has dedicated algorithms and the goal here is to compute as fast as possible. Some of the algorithms are linear and rely heavily on single thread performance, whereas others, such as Mesh Building, are very parallel which Agisoft implements via OpenCL. This allows any OpenCL connected accelerator, such as a GPU, to be able to increase the performance of this test. For low core count CPUs this is usually the longest part of the full benchmark, however the higher core count parts move into other bottlenecks, such as memory or cache.

So for our Agisoft run in those benchmarks, we use a 50 image set of a building with 4K images. We get the algorithm to select 50000 points from each image, and use those for the mesh building. We typically run it in OpenCL off mode, as we are testing the CPU cores, although Ganesh has seen some minor speedup on this test with Intel’s dual-core U-series CPUs when enabling OpenCL. A high end but low power processor, such as the Core i5-7500T, takes nearly 1500 seconds, or 25 minutes to run our test. We also see speed up based on cache sizes and DRAM frequency/latency, but major parts of the app either rely on single thread performance exclusively or multithread performance exclusively.

Sony’s way of creating the 3D head model involves panning the camera from one ear to the other, and then moving the camera around the head to generate finer detail and texture information. It does this all in-situ, computing on the fly and showing the results in real time back on the screen as the scan is being done. The whole process takes a minute, which compared to the method outlined above, is super quick. Now of course, Sony’s implementation is limited to just heads, rather than something about buildings, and we were told by Sony that their models are limited to 50000 polygons. During the demonstration I was given, I could see the software generating points on the head and it was obvious the number of points was in the hundreds in total, rather than the thousands per static image, so there is a perceptible difference in quality. But the Sony modeling implementation still gives a good visual output. 

The smartphones from Sony that support this feature are the XZ series, which have Snapdragon 835 SoCs inside. Qualcomm is notoriously secretive about what is under the hood on their mobile chips, although features like the Hexagon DSP contained within the chip are announced. Sony would not state how they are implementing their algorithms, if they were leveraging a compute API from the Adreno GPU, a graphics API, the Kryo CPUs, or something from the special DSPs housed on the chip. It also leads two different questions: do the algorithms work on other SoCs, or can other Snapdragon 835 smartphone vendors develop their own equivalent application?

Sony’s goal is to allow users to implement their new facial model in applications that support personal avatars, or exporting to 3D printing formats for real-world creation of a user’s head. My mind instantly pointed to who would use something like this on scale: console players, specifically on the Xbox and Nintendo devices, or for special games such as NBA2k17. Given Sony’s exists in the console space with their own Playstation 4, one might expect them not to play with competitors, although the smartphone department is a different business unit (and other Snapdragon 835 players do not have a potential conflict). I was told by the booth demonstrator that he doesn’t know of any collaboration, which is unfortunate as I’d suspect this as being a good opening for this tool.

I’m trying to probe for more information, from Sony on the algorithm or Qualcomm on the hardware, because how the algorithm is implemented on the hardware is something I find interesting given how we’ve tested desktop CPUs in the past. It also puts the challenge to other smartphone vendors that use Snapdragon 835 (or other SoCs) to see if this is a feature that they might want to implement, or if there are apps that will implement this feature regardless of hardware.

Related Reading