Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistent training results with RGBA/PNG images #1193

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ndming
Copy link

@ndming ndming commented Mar 19, 2025

Issue summary

The training relies on PIL to resize the input images and extracts the resized alpha to mask the rendered image during training. Since PIL pre-multiplies the resized RGB with the resized alpha, the training produces different Gaussian points depending on whether the input get resized or not. Moreover, the extracted alpha channel from PIL is not perfectly binarized, causing floaters around the edges. The issue has been going around in #1039, #1121, and #1114 since they trained with either PNG images or a dataset containing masks in the 4th channel (preprocessed DTU, NeRF Synthetic).

The fix is self-contained in the PILtoTorch function. It checks if the input is of type RGBA and manually masks the RGB channels. This alpha channel is then discarded and the process continues as if the input was RGB, making the alpha multiplication step in the train script a no-op.

Details

In the current commit, here's how a ground truth RGBA is treated during the training:

  • The loaded image is resized to some resolution with PIL.Image.Image.resize in the PILtoTorch function.
  • The RGB channels of the result are extracted and becomes gt_image in train.py (via Camera.original_image).
  • The resized alpha channel is saved separatedly as alpha_mask. This mask is then multiplied with the rendered image in train.py and the loss is called on the gt_image and the masked image.

If the input RGBA is actually resized in PILtoTorch (the resolution param is different from the image's resolution), PIL automatically multiplies the resized RGB with the resized alpha:

RGB before resize RGB after resize
image image

This creates two different scenarios:

  • If the image is not resized (no -r flag), the RGB ground truth is the original image without masking, and the saved alpha_mask is perfectly binarized.
  • If the image is resized, the RGB ground truth is masked, but the saved alpha_mask is distorted along edges.

Scenario 1: no resize

The Gaussian points undergoes tension during training since they get masked before getting fed into the loss but the ground truth is the original image:

GT Render (Iter 7000)
00030 00030

Scenario 2: RGBA is resized

The resized alpha_mask is not perfectly binarized along the edge due to interpolation. This imperfect mask is multiplied with the rendered image, causing floaters:

GT Render -r 2 (Iter 7000)
00030 00030

The fix

To minimize the modification, when PILtoTorch encounters RGBA, we manually extract and mask the RGB channels and let the input become this new masked RGB. The remaining logic is as-is and the alpha multiplication step in train.py becomes no-op.

Render (Iter 7000) Render -r 2 (Iter 7000)
00030 00030

Test environment

  • Python 3.9
  • PyTorch 2.4.0
  • CUDA 12.4
  • MSVC 19.43

Notes

The render.py might need fixed to export the masked GT (rather than the original RGB) when running on trained model with original resolution settings (no - r).

The training relies on PIL to resize the input images and extracts the resized alpha
to mask the rendered image during training. Since PIL pre-multiplies the resized RGB
with the resized alpha, the training produces different Gaussian points depending on
whether the input get resized or not. Moreover, the extracted alpha channel from PIL
is not perfectly binarized, causing floaters around the edges.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant