ARC 3D Glossary

Glossary

A list of terms that have to do with radiance fields and world generation.

3D Gaussian Splatting (3DGS)
A technique for real-time radiance field rendering that represents scenes using a sparse set of 3D Gaussians. This approach enables high-quality novel-view synthesis by optimizing Gaussian parameters to match input images, facilitating efficient rendering without the need for neural networks.
3D Gaussian Splatting for Real-Time Radiance Field Rendering
3D Reconstruction
The process of capturing the shape and appearance of real objects to create digital 3D models. This can be achieved through various methods, including photogrammetry, laser scanning, and computational algorithms that interpret 2D images to infer 3D structures.
3D Reconstruction - Wikipedia
3D Scanning
A technique used to capture the physical dimensions and shape of an object or environment by collecting data on its surface. This data is then used to create accurate digital 3D models for applications in industries like manufacturing, entertainment, and cultural heritage preservation.
3D Scanning - Wikipedia
Active Stereo
An active stereo system enhances traditional stereo vision by projecting a known pattern (structured light) onto the scene to improve correspondence matching between images. This technique is particularly useful in environments with low texture or varying lighting conditions, as the projected pattern provides additional features for accurate depth estimation.
Lecture 8: Active Stereo & Volumetric Stereo - Stanford University
Anisotropic Gaussian
A Gaussian function with direction-dependent properties, allowing for the representation of elongated or oriented features in 3D space. In 3DGS, anisotropic Gaussians help model complex scene geometries more accurately.
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Bundle Adjustment
An optimization process in computer vision that refines camera parameters and 3D point positions simultaneously to minimize reprojection errors across multiple images, enhancing the accuracy of 3D reconstructions.
Bundle Adjustment - Wikipedia
Camera Calibration
The process of estimating a camera’s intrinsic parameters (such as focal length and optical center) and extrinsic parameters (such as rotation and translation) to accurately map 3D points in the world to 2D points in an image. This is crucial for precise 3D reconstruction and measurements.
Lecture 11: Image Formation, Camera Calibration, Stereo - MIT CSAIL
Density Field
A density field describes how dense (or solid) a point in space is within a 3D scene. This is crucial for volume rendering, as it defines how light interacts with the scene at each point. In NeRF, the density field is learned alongside the radiance field to create realistic 3D representations.
[Density Field Basics](https://kaolin.readthedocs.io/en/latest/notes/volume_rendering.html)
Depth Estimation
The process of determining the distance of objects from a viewpoint, often using stereo vision or monocular cues. Accurate depth estimation is crucial for tasks like 3D reconstruction and scene understanding.
Depth Estimation - Wikipedia
Depth Map
An image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint. Depth maps are used in various applications, including 3D reconstruction, to represent the spatial structure of a scene.
Depth Map - Wikipedia
Depth Sensor
A device that measures the distance between the sensor and objects within its field of view, producing depth maps that represent the 3D structure of a scene. Depth sensors employ various technologies, including stereo vision, structured light, and time-of-flight measurements, and are integral components in applications like 3D scanning, robotics, and augmented reality.
Range Imaging - Wikipedia
Differentiable Rendering
A rendering technique that allows gradients to be computed with respect to scene parameters, enabling the integration of rendering processes into neural network training for tasks like inverse rendering and 3D reconstruction.
Differentiable Rendering - Wikipedia
Epipolar Geometry
The geometric relationship between two views of a 3D scene, describing the intrinsic projective geometry between them. Understanding epipolar geometry simplifies the search for corresponding points between images, which is fundamental in stereo vision and 3D reconstruction.
Introduction to Epipolar Geometry and Stereo Vision - LearnOpenCV
Fourier Feature Mapping
A method that transforms spatial coordinates into higher-dimensional features using sine and cosine functions, enabling neural networks to capture high-frequency details in functions like those used in NeRFs.
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
Free Viewpoint Video
A video format that allows users to interactively change the viewing perspective of a scene, providing a more immersive experience. Techniques like NeRFs and 3DGS facilitate the creation of free viewpoint videos.
Free Viewpoint Video - Wikipedia
Gaussian Splatting
A volume rendering technique that represents volumetric data using Gaussian functions, allowing for efficient and high-quality rendering of complex scenes without converting data into surface primitives.
Gaussian Splatting - Wikipedia
Global Illumination
A set of rendering techniques that simulate both direct and indirect lighting to produce realistic images by accounting for light interactions like reflections and refractions within a scene.
Global Illumination - Wikipedia
Gradient Descent
An optimization algorithm that iteratively adjusts model parameters in the direction that most reduces the error, commonly used in training neural networks and optimizing 3D reconstructions.
Gradient Descent - Wikipedia
Homography
A transformation that maps points from one plane to another, preserving straight lines. In 3D reconstruction, homographies relate two images of the same planar surface, aiding in tasks like image stitching and object recognition.
Lecture 11: Image Formation, Camera Calibration, Stereo - MIT CSAIL
Implicit Representation
A continuous function to encode a scene’s geometry or appearance rather than discrete elements like meshes or voxels. For example, NeRF uses a neural network to implicitly model density and radiance at any point in space, allowing for high-resolution reconstructions. Implicit representations are compact, flexible, and ideal for neural scene synthesis.
[Implicit Neural Representations](https://arxiv.org/abs/1909.11659)
Inverse Rendering
The process of inferring scene properties such as geometry, materials, and lighting from observed images. By reversing the traditional rendering pipeline, inverse rendering enables the reconstruction of 3D scenes from 2D photographs, facilitating applications like augmented reality and scene understanding.
Inverse Rendering - Wikipedia
Latent Diffusion Models (LDM)
A class of generative models that operate in a compressed latent space to produce high-quality images or 3D structures. By focusing on essential features, LDMs enhance computational efficiency, making them suitable for applications like text-to-image synthesis and 3D reconstruction.
High-Resolution Image Synthesis with Latent Diffusion Models
Light Field
A function that describes the amount of light traveling in every direction through every point in space. Capturing the light field of a scene allows for post-capture adjustments of focus and perspective, contributing to advanced 3D imaging techniques.
Light Field - Wikipedia
Marching Cubes
An algorithm used to extract a polygonal mesh of an isosurface from a three-dimensional scalar field. Widely employed in medical imaging and scientific visualization, it facilitates the conversion of volumetric data into a surface representation.
Marching Cubes - Wikipedia
Mesh Reconstruction
The process of creating a polygonal mesh that represents the surface of a 3D object or scene. This involves connecting points (vertices) with edges and faces to form a continuous surface, essential for rendering and analysis in computer graphics and 3D modeling.
Large-Scale 3D Reconstruction from Multi-View Imagery - MDPI
Mip-NeRF
An extension of Neural Radiance Fields that incorporates multiscale representations to address aliasing issues. By modeling scenes at various levels of detail, Mip-NeRF improves rendering quality, especially when dealing with zoomed-out views or distant objects.
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
Multi-View Stereo (MVS)
A technique in computer vision that reconstructs 3D structures from multiple images taken from different viewpoints. By identifying corresponding points across images, MVS estimates depth information, enabling the creation of detailed 3D models.
Multi-View Stereo - Wikipedia
Multi-View Reconstruction
The process of recovering the 3D structure of a scene from multiple 2D images taken from different viewpoints. Techniques like structure-from-motion (SfM) and multi-view stereo (MVS) are used to align the images and extract depth and geometry. NeRF extends this idea using neural networks for more detailed reconstructions.
[Multi-View Geometry Book](https://www.robots.ox.ac.uk/~vgg/hzbook/)
Neural Radiance Fields (NeRF)
A method that uses neural networks to represent a scene’s volumetric radiance and density. By learning this representation from a set of images, NeRF enables the synthesis of novel views, contributing to advancements in 3D reconstruction and virtual reality.
Neural Radiance Fields - Wikipedia
Neural Rendering
The application of neural networks to generate or enhance images, often by learning complex scene representations. Neural rendering techniques enable tasks like novel view synthesis, image-based relighting, and the creation of photorealistic graphics.
Neural Rendering - Wikipedia
Novel View Synthesis
The process of generating new images of a scene from viewpoints that were not included in the input data. NeRF excels at this task by learning the underlying 3D scene representation and rendering high-quality views from arbitrary angles.
[NeRF Novel View Synthesis](https://arxiv.org/abs/2003.08934)
Normal Mapping
A technique in 3D graphics to simulate the lighting of bumps and dents on a surface without increasing the polygon count. By altering the surface normals, normal mapping adds detailed texture to 3D models, enhancing realism in rendering.
3D Reconstruction - Wikipedia
Occupancy Grid Mapping
A method used in robotics and 3D reconstruction to represent the environment as a grid, where each cell indicates the probability of occupancy. This probabilistic approach aids in building maps of unknown environments and is fundamental in Simultaneous Localization and Mapping (SLAM).
Learning-based Multi-View Stereo: A Survey - arXiv
Opacity
The amount of light that is blocked or absorbed by a point in a scene. In volume rendering, opacity is computed based on the density at sampled points and determines how much each point contributes to the final color of the pixel. High-opacity regions are typically solid, while low-opacity regions are transparent or semi-transparent.
[Opacity in Volume Rendering](https://en.wikipedia.org/wiki/Volume_rendering#Opacity)
Overfitting
This occurs when a model learns to replicate the training data too precisely, failing to generalize to new viewpoints or scenes. In the context of NeRF, overfitting might result in perfect reconstructions of training views but poor performance on novel views. Regularization techniques are often used to mitigate overfitting.
[Overfitting in NeRF](https://arxiv.org/abs/2003.08934)
Photogrammetry
The science of making measurements from photographs, especially for recovering the exact positions of surface points. In 3D reconstruction, photogrammetry is used to create models of real-world objects or environments by analyzing overlapping photographs taken from different angles.
Photogrammetry - Wikipedia
Photometric Stereo
A technique for estimating the surface normals of objects by observing them under different lighting conditions. This method captures fine surface details, beneficial for accurate 3D reconstruction.
3D Reconstruction - Wikipedia
Photon Mapping
A global illumination technique used in traditional rendering to approximate the behavior of light in a scene by tracing photons as they bounce off surfaces. While different from neural methods, it shares the goal of simulating realistic light transport.
[Photon Mapping Overview](https://en.wikipedia.org/wiki/Photon_mapping)
Plenoptic Function
A function that describes the intensity of light in a scene as a function of position, direction, wavelength, and time. Understanding the plenoptic function is fundamental in computational imaging and forms the basis for technologies like light field cameras.
Plenoptic Function - Wikipedia
Point Cloud
A collection of data points defined in a three-dimensional coordinate system, representing the external surface of objects or scenes. Point clouds are often generated by 3D scanners and are used in various applications, including 3D modeling and reconstruction.
Point Cloud - Wikipedia
Point-Based Rendering
A rendering technique that uses points as the basic rendering primitive instead of polygons. This approach is particularly useful for rendering complex 3D models derived from point cloud data, such as those obtained from 3D scanning.
3D Reconstruction - Wikipedia
Positional Encoding
A technique used to map low-dimensional input coordinates (e.g., 3D positions) to a higher-dimensional space to allow neural networks to learn high-frequency details. In NeRF, positional encoding transforms input positions and viewing directions into a more expressive feature space, enabling the model to represent fine details like sharp edges and texture. It’s critical for overcoming the limitations of standard neural networks.
[NeRF Paper - Positional Encoding Section](https://arxiv.org/abs/2003.08934)
Radiance Field
A mathematical representation that describes the distribution of light (color and brightness) in a 3D space, including how it varies depending on the direction. It captures both the geometry and appearance of a scene by encoding spatial and directional dependencies of light. Popular Radiance Field representations include 3D Gaussian Splatting and NeRFs.
[NeRF Original Paper](https://arxiv.org/abs/2003.08934)
Ray Marching
A rendering technique that traces rays through a scene to compute color and lighting by accumulating information along the ray’s path. Ray marching is commonly used in volume rendering and is integral to techniques like NeRFs.
Ray Marching - Wikipedia
Ray Tracing
A rendering technique that simulates the way light interacts with objects to produce realistic images. Ray tracing calculates the paths of rays of light as they travel through a scene, allowing for the simulation of effects like reflections, refractions, and shadows.
Ray Tracing (graphics) - Wikipedia
Ray Sampling
The process of tracing rays from a camera through a scene to gather information about the light and geometry at sampled points. These sampled points are used to compute the final rendered image. Efficient and accurate ray sampling is essential for training radiance fields and synthesizing novel views.
[Ray Sampling Explained](https://cs.stanford.edu/people/haotianz/slides/cs269-nerf.pdf)
Regularization
Methods used to constrain a model’s complexity and encourage it to generalize beyond the training data. Common techniques in radiance fields include sparsity constraints on density and smoothness priors. These help prevent overfitting and produce more realistic reconstructions.
[Regularization Techniques](https://en.wikipedia.org/wiki/Regularization_(mathematics))
Rendering
The process of generating an image from a model by means of computer programs. Rendering is used in various applications, including video games, simulations, and movies, to visualize 3D models in a realistic or stylized manner.
Rendering (computer graphics) - Wikipedia
Rendering Equation
A mathematical model describing how light is emitted, absorbed, and scattered within a scene. It serves as the foundation for physically-based rendering methods, including those used in radiance fields. Solving the rendering equation enables realistic image synthesis.
[Rendering Equation Overview](https://en.wikipedia.org/wiki/Rendering_equation)
Scene Optimization
The process of adjusting the parameters of a radiance field model to minimize the error between predicted and actual images. This iterative process ensures that the model accurately learns the geometry and appearance of the scene.
[Optimization Techniques in NeRF](https://arxiv.org/abs/2003.08934)
Scene Representation
The method used to encode the geometry, texture, and light properties of a 3D scene. Representations can be explicit (e.g., meshes, point clouds) or implicit (e.g., neural networks like NeRF). Implicit representations are compact and continuous, making them ideal for novel view synthesis.
[Scene Representations Overview](https://arxiv.org/abs/2106.05744)
Scene Representation Networks (SRNs)
Neural networks designed to learn implicit representations of 3D scenes from images, enabling tasks like novel view synthesis and 3D reconstruction without explicit 3D supervision.
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
Shape-from-Shading
A method that infers the 3D shape of a surface from the shading information in a single image. By analyzing the variations in brightness caused by lighting, this technique estimates the surface normals and reconstructs the object’s geometry, assuming known reflectance properties and lighting conditions.
Shape from Shading - Wikipedia
Signed Distance Function (SDF)
A function that returns the shortest distance from any point in space to the surface of a shape, with the sign indicating whether the point is inside or outside the shape. SDFs are used in 3D modeling and computer graphics for efficient surface representation.
Signed Distance Function - Wikipedia
Simultaneous Localization and Mapping (SLAM)
A computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent’s location within it. SLAM is essential in robotics and augmented reality for navigation and environment interaction.
Simultaneous Localization and Mapping - Wikipedia
Sparse Neural Representations
Neural network models that use sparse data structures to efficiently represent complex scenes or objects, reducing computational requirements while maintaining high-quality reconstructions.
NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
Splatting
A volume rendering technique that projects volumetric data onto the image plane by distributing energy (or color) from data points onto nearby pixels, often using Gaussian kernels. Splatting is used for efficient rendering of volumetric data without explicit surface extraction.
Splatting - Wikipedia
Structure from Motion (SfM)
A photogrammetric technique that reconstructs 3D structures from 2D image sequences by analyzing the motion of feature points across images. SfM is widely used in 3D reconstruction and computer vision to create 3D models from photographs.
Structure from Motion - Wikipedia
Structured Light Scanning
A 3D scanning technique that projects a known pattern of light (such as stripes or grids) onto an object and captures the deformation of this pattern with cameras. By analyzing these deformations, the system reconstructs the object’s surface geometry with high accuracy, making it suitable for applications in industrial inspection, reverse engineering, and biometrics.
Structured-Light 3D Scanner - Wikipedia
Surface Reconstruction
The process of creating a continuous surface from discrete data points, such as those obtained from 3D scanning or photogrammetry. Surface reconstruction is vital for generating usable 3D models from raw data, enabling applications in visualization, analysis, and manufacturing.
Large-Scale 3D Reconstruction from Multi-View Imagery - MDPI
Texture Mapping
The method of applying images (textures) to 3D models to add color, detail, and realism. Texture mapping enhances the visual appearance of 3D reconstructions by providing surface details without increasing geometric complexity.
3D Reconstruction - Wikipedia
Time-of-Flight (ToF) Camera
A depth-sensing device that measures the time it takes for emitted light (usually infrared) to travel to an object and back to the sensor. By calculating this time delay, the camera determines the distance to various points in the scene, producing a depth map useful in applications like gesture recognition, robotics, and 3D mapping.
Optical 3D Acquisition Methods: A Comprehensive Guide - ML6
Training Data
Radiance fields typically consist of images captured from multiple viewpoints, along with their corresponding camera poses. High-quality, diverse training data ensures the model learns an accurate representation of the scene. Noise or insufficient data can lead to poor reconstructions.
[NeRF Data Requirements](https://nerf.stanford.edu/)
Triangulation
A geometric method used to determine the location of a point by measuring angles from known points at either end of a fixed baseline. In 3D scanning, triangulation is employed by systems like stereo cameras and structured light scanners to calculate depth information based on the convergence of lines of sight from different viewpoints.
Range Imaging - Wikipedia
View Dependency
How an object’s appearance (e.g., its shading or reflections) changes depending on the observer’s viewing angle. NeRF explicitly models view dependency to capture effects like specular highlights, making rendered images more realistic. This is achieved by incorporating viewing direction as an input to the neural network.
[View-Dependent Effects](https://arxiv.org/abs/2003.08934)
Visual Hull
A geometric representation of an object obtained by intersecting the silhouettes captured from multiple camera viewpoints. The visual hull provides an approximate 3D shape of the object, which can be refined with additional data for more accurate reconstruction.
Lecture 8: Active Stereo & Volumetric Stereo - Stanford University
Volume / Volumetric Rendering
A set of techniques used to display three-dimensional volumetric data, allowing for the visualization of structures without explicit surface representation. Volumetric rendering is essential in medical imaging and scientific visualization.
Volumetric Rendering - Wikipedia
Voxel
A voxel, short for “volume element,” is the three-dimensional equivalent of a pixel. It represents a value on a regular grid in 3D space and is commonly used in volumetric data representation, such as in medical imaging and 3D modeling.
Voxel - Wikipedia
Voxel Grid
A volumetric representation of 3D space that divides the space into uniform cubes called voxels (volume elements). Each voxel contains information about the presence or absence of material at that location, enabling the modeling of complex structures and environments in applications like medical imaging and 3D modeling.
Voxel - Wikipedia
Warping
A process in image processing and computer graphics that involves deforming or mapping one image to align with another. Warping is used in applications such as texture mapping, image registration, and morphing, where precise alignment of images or textures is required for accurate representation or analysis.
Image Warping - Wikipedia
Z-Buffering
Z-buffering, also known as depth buffering, is a computer graphics technique that manages image depth coordinates in 3D graphics to handle occlusion. It keeps track of the depth of every pixel on the screen to determine which objects are in front and should be visible.
Z-buffering - Wikipedia
Zero-Shot Learning
Zero-shot learning is a machine learning paradigm where a model is capable of recognizing objects or performing tasks without having seen any previous examples during training. In 3D reconstruction, this can involve generating models of objects from categories that were not present in the training data.
Zero-Shot Learning - Wikipedia

Glossary

3D Gaussian Splatting (3DGS)

3D Reconstruction

3D Scanning

Active Stereo

Anisotropic Gaussian

Bundle Adjustment

Camera Calibration

Density Field

Depth Estimation

Depth Map

Depth Sensor

Differentiable Rendering

Epipolar Geometry

Fourier Feature Mapping

Free Viewpoint Video

Gaussian Splatting

Global Illumination

Gradient Descent

Homography

Implicit Representation

Inverse Rendering

Latent Diffusion Models (LDM)

Light Field

Marching Cubes

Mesh Reconstruction

Mip-NeRF

Multi-View Stereo (MVS)

Multi-View Reconstruction

Neural Radiance Fields (NeRF)

Neural Rendering

Novel View Synthesis

Normal Mapping

Occupancy Grid Mapping

Opacity

Overfitting

Photogrammetry

Photometric Stereo

Photon Mapping

Plenoptic Function

Point Cloud

Point-Based Rendering

Positional Encoding

Radiance Field

Ray Marching

Ray Tracing

Ray Sampling

Regularization

Rendering

Rendering Equation

Scene Optimization

Scene Representation

Scene Representation Networks (SRNs)

Shape-from-Shading

Signed Distance Function (SDF)

Simultaneous Localization and Mapping (SLAM)

Sparse Neural Representations

Splatting

Structure from Motion (SfM)

Structured Light Scanning

Surface Reconstruction

Texture Mapping

Time-of-Flight (ToF) Camera

Training Data

Triangulation

View Dependency

Visual Hull

Volume / Volumetric Rendering

Voxel

Voxel Grid

Warping

Z-Buffering

Zero-Shot Learning