One picture is worth a thousand words, so what have been told with videos? What about 100 simultaneous videos to reconstruct every frame of life in a 10.000 sq. ft dome? Similar to other industries, entertainment industry is also being reshaped by AI, especially towards AR/VR consumption. As the amount of data increased, our models got deeper, and the reality became decipherable. This talk will introduce recent deep learning advancements in 3D vision, reconstruction, and shape understanding techniques with a focus on generative models to digitize performances and scenes. Then we will shift gears with an overview of such models in 3D, and their progression on voxels, point clouds, meshes, graphs, and other 3D representations. Back to our studio, in addition to a discussion about how to process such large visual data, the challenges of scaling 10x over current capture platforms, and over 200x over state-of-the-art datasets will be presented. The talk will conclude with a sneak peek of upcoming VR/AR productions from the world's largest volumetric capture stage at Intel Studios, as an example of real-world use cases of such AI approaches.