YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

EndoGaussian-4D

Real-Time Deformable 4D Gaussian Splatting for Endoscopic Surgery

A research framework for reconstructing dynamic endoscopic scenes using deformable 3D Gaussian Splatting with HexPlane-encoded temporal deformation fields.

Architecture

G_t = G_0 + Ξ”_ΞΈ(t)
Component Description
Gβ‚€ Canonical Gaussians {ΞΌ, q, s, Ξ±, SH} initialized via HGI
Ξ”_ΞΈ HexPlane encoder β†’ MLP decoder β†’ per-Gaussian deltas
HexPlane 6 feature planes (XY,XZ,YZ,XT,YT,ZT) Γ— multi-resolution
Decoder Shared backbone β†’ 4 zero-initialized heads (Δμ, Ξ”q, Ξ”s, Δα)
Renderer gsplat differentiable rasterization with absgrad

Target performance: 37.9 PSNR, 0.97 SSIM, 195 FPS, 2 min training (EndoNeRF benchmark)

Project Structure

endogaussian4d/
β”œβ”€β”€ Dockerfile                    # CUDA 12.1 + PyTorch 2.3 + gsplat + COLMAP
β”œβ”€β”€ requirements.txt              # Pip dependencies
β”œβ”€β”€ __init__.py
β”‚
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ trainer.py               # EndoGaussianTrainer + HexPlane + Decoder
β”‚   └── metrics.py               # PSNR, SSIM, LPIPS, D-SSIM, depth metrics
β”‚
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ download_datasets.py     # Dataset registry + download + unified loader
β”‚   β”œβ”€β”€ extract_poses.py         # Multi-stage SfM (COLMAP + Depth-Anything + PnP)
β”‚   └── fast_fail_experiment.py  # Static 3DGS baseline + artifact taxonomy
β”‚
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ endonerf.yaml            # Default config for EndoNeRF dataset
β”‚   β”œβ”€β”€ c3vd.yaml                # Config for C3VD dataset
β”‚   └── fast_fail.yaml           # Config for fast-fail experiment
β”‚
└── docs/
    β”œβ”€β”€ formalization.tex        # LaTeX mathematical formalization (CVPR-style)
    └── formalization.md         # Markdown version of the math

Week 1 Sprint Roadmap

Phase 1 (Days 1-2): Data Pipeline βœ…

  • Dataset registry with 7 endoscopic datasets (EndoNeRF, EndoSLAM, C3VD, SCARED, StereoMIS, D4D, Hamlyn)
  • Unified EndoDataset loader supporting LLFF/C3VD/EndoSLAM formats
  • DatasetOrganizer for download, validation, and manifest generation
  • Multi-stage SfM pipeline (COLMAP sequential β†’ exhaustive β†’ Depth-Anything+PnP)
  • Holistic Gaussian Initialization via depth backprojection
  • Tool mask handling throughout the pipeline

Phase 2 (Days 3-4): Baseline & Fast-Fail βœ…

  • Full EndoGaussianTrainer with HexPlane deformation
  • gsplat-based rendering with absgrad densification
  • Complete metrics suite (PSNR, SSIM, LPIPS, D-SSIM loss, 7 depth metrics)
  • Fast-fail experiment: static 3DGS on dynamic tissue
  • Artifact taxonomy (6 failure modes with physics explanations)
  • Automated artifact detection and structured reporting

Phase 3 (Day 5): Mathematical Formalization βœ…

  • LaTeX formalization of deformation field G_t = Gβ‚€ + Ξ”_ΞΈ(t)
  • HexPlane encoding with memory complexity analysis
  • Full loss derivation with physics motivation
  • Tool occlusion handling (3-stage masking)
  • Training schedule and adaptive density control
  • Loss β†’ artifact mapping (which loss term fixes which artifact)

Quick Start

1. Environment Setup

# Using Docker (recommended)
docker build -t endogaussian4d .
docker run --gpus all -it endogaussian4d bash

# Or using pip
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

2. Download Data

# List available datasets
python scripts/download_datasets.py --list

# Download EndoNeRF (smallest, auto-downloadable, standard benchmark)
python scripts/download_datasets.py --datasets endonerf --data-root ./data

# Validate installation
python scripts/download_datasets.py --validate --data-root ./data

3. Extract Poses (if needed)

# Auto mode: tries COLMAP first, falls back to Depth-Anything + PnP
python scripts/extract_poses.py --input ./data/endonerf/cutting --mode auto

# Force learning-based pipeline (for texture-less tissue)
python scripts/extract_poses.py --input ./data/endonerf/cutting --mode depth_pnp --hgi

4. Run Fast-Fail Experiment

# Benchmark static 3DGS on dynamic tissue (documents failure modes)
python scripts/fast_fail_experiment.py \
    --data ./data/endonerf/cutting \
    --output ./experiments/fast_fail \
    --max-iters 1000

5. Train EndoGaussian-4D

from models.trainer import EndoGaussianTrainer, EndoGaussianConfig
from scripts.download_datasets import EndoDataset

# Load dataset
dataset = EndoDataset("./data/endonerf/cutting")
points, colors = dataset.get_point_cloud(subsample=0.001)

# Configure and train
config = EndoGaussianConfig(
    total_iters=3000,
    warmup_iters=1000,
    output_dir="./output",
    experiment_name="cutting_v1",
)

trainer = EndoGaussianTrainer(config)
trainer.initialize_from_point_cloud(points, colors, cameras=[])
# trainer.train(dataset)  # Requires GPU + torch Dataset wrapper

Dataset Summary

Dataset Sequences Resolution Depth Poses Access
EndoNeRF 2 320Γ—256 βœ“ GT βœ“ Auto (Dropbox)
C3VD 22 675Γ—540 βœ“ GT βœ“ Request form
SCARED 7 1280Γ—1024 βœ“ GT βœ“ Registration
StereoMIS 11 640Γ—480 Stereo βœ— Auto (Zenodo)
EndoSLAM 35 640Γ—480 Synthetic βœ“ Auto (GitHub)
D4D 98 640Γ—480 βœ“ GT βœ“ Auto (DOI)
Hamlyn 20 360Γ—288 βœ— βœ— Registration

Recommended for initial experiments: EndoNeRF (smallest, auto-download, standard benchmark, includes tool masks)

Key Design Decisions

  1. gsplat over diff-gaussian-rasterization: Modern API, absgrad support, maintained by nerfstudio team
  2. HexPlane over pure MLP: 37.8 vs 34.8 PSNR, 6Γ— faster training
  3. Zero-initialized decoder heads: Identity deformation at init β†’ stable training
  4. Multi-stage SfM: COLMAP first (accurate), Depth+PnP fallback (robust)
  5. Tool masking in 3 stages: Init exclusion + loss masking + densification exclusion

Loss Function Summary

Term Weight Purpose Artifact Fixed
L1 1-λ₁=0.8 Pixel reconstruction General quality
D-SSIM λ₁=0.2 Structural similarity Specular artifacts
Depth (SI) Ξ»β‚‚=0.1 Geometric accuracy Floaters
Smooth λ₃=0.01 Temporal coherence Ghosting, tearing
TV Ξ»β‚„=0.001 HexPlane regularization Scale bloat
Tool mask β€” Exclude instruments Tool smearing

References

  • Kerbl et al. "3D Gaussian Splatting." SIGGRAPH 2023.
  • Liu et al. "EndoGaussian." arXiv:2401.12561, 2024.
  • Huang et al. "Endo-4DGS." MICCAI 2024.
  • Cao & Johnson. "HexPlane." CVPR 2023.
  • Wang et al. "EndoNeRF." MICCAI 2022.
  • Ye et al. "gsplat." arXiv:2409.06765, 2024.

License

Research use only. See individual dataset licenses for data terms.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for mnunziant/EndoGaussian-4D