Aditya Nisal
Email : anisal@wpi.edu
Description
Aim:
To render 3d scenes from 2d images using a fully connected deep neural network
​
Methodology:
NeRF (Neural Radiance Fields):
-
INITIALIZATION:
- Define a fully connected neural network `f` with weights `W`.
NeRF uses a fully connected network since we're representing a continuous volume, unlike image-based methods that typically use convolutional networks.
`f` takes in a 5D coordinate (x, y, z, θ, φ), where (x, y, z) is the 3D position and (θ, φ) are view directions.
Including view direction ensures that the network can model view-dependent effects like specular reflections.
`f` outputs a color (RGB) and a density (σ) for the given 5D coordinate.
The color indicates how the light interacts at that point, while density suggests how much light gets absorbed or blocked.
-
FOR each training image `I` in the dataset:
Determine the camera position `C` and image resolution.
Each image in the dataset is associated with metadata that gives the camera's position and orientation. -
FOR each pixel `p` in image `I`:
Compute the ray `R` originating from camera position `C` and passing through pixel `p`.
To simulate how light travels, we trace rays from the camera through each pixel into the scene.
-
HIERARCHICAL SAMPLING along ray `R`:
Coarse Sample: Uniformly sample a few points `{P_coarse}` along ray `R`.
This initial sampling provides a rough estimate of where relevant scene information (like surfaces) might be located.
Query network `f` for colors and densities at `{P_coarse}`.
Fine Sample: Based on the densities from the coarse sample, densely sample points `{P_fine}` near opaque regions.
Once we have a rough idea from `{P_coarse}`, we can smartly sample more points in regions of interest. This makes the process efficient.
-
VOLUME RENDERING along ray `R`:
Initialize accumulated color `ACC_COLOR` as [0, 0, 0] (black) and accumulated transparency `ACC_TRANSPARENCY` as
We're simulating how light accumulates color as it travels through the volume.
FOR each point `P` in `{P_coarse} U {P_fine}` (from near to far along the ray):
Query network `f` for color `COLOR_P` and density `DENSITY_P` at point `P`.
Compute `TRANSPARENCY_P` = exp(-DENSITY_P * distance_to_next_point).
This computes how much light is absorbed or blocked at this point.
`ACC_COLOR += COLOR_P * DENSITY_P * ACC_TRANSPARENCY`
This accumulates the color contribution of point `P` based on its color, density, and the accumulated transparency.
`ACC_TRANSPARENCY *= (1 - DENSITY_P) * TRANSPARENCY_P`
Update transparency based on how much light gets absorbed/blocked by this point.
Pixel `p`'s color is set to `ACC_COLOR`.
-
COMPUTE LOSS:
`LOSS` = Mean Squared Error between the rendered color of pixel `p` and its true color in image `I`.
This loss ensures the rendered image closely matches the actual photo, driving the network to learn correct scene representations.
-
BACKPROPAGATE through the neural network using `LOSS`.
Update weights `W` of the network using an optimizer, e.g., Adam.
EXPLANATION: We adjust the network weights to reduce the error in subsequent iterations.
-
AFTER sufficient training:
The network `f` can now be used to render novel views.
Given a new viewpoint, follow steps 3-5 to render a new image.
The power of NeRF is in its ability to generate photorealistic images from novel viewpoints, not seen during training.
​
​
​
Figure 1. NeRF Archetecture
Figure 2. Orignal Lego Image
Figure 3. 3D Reconstructed Model