Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration

AGI Lab, Westlake University
*Corresponding author
arXiv Preprint

TL;DR: Fast3Dcache is a training-free framework that accelerates 3D geometry synthesis by leveraging voxel stabilization patterns, achieving significant speedup with negligible quality loss.

Abstract

Diffusion models have achieved impressive generative quality across modalities like 2D images, videos, and 3D shapes, but their inference remains computationally expensive due to the iterative denoising process. While recent caching-based methods effectively reuse redundant computations to speed up 2D and video generation, directly applying these techniques to 3D diffusion models can severely disrupt geometric consistency.

In 3D synthesis, even minor numerical errors in cached latent features accumulate, causing structural artifacts and topological inconsistencies. To overcome this limitation, we propose Fast3Dcache, a training-free geometry-aware caching framework that accelerates 3D diffusion inference while preserving geometric fidelity. Our method introduces a Predictive Caching Scheduler Constraint (PCSC) to dynamically determine cache quotas according to voxel stabilization patterns and a Spatiotemporal Stability Criterion (SSC) to select stable features for reuse based on velocity magnitude and acceleration criterion.

Comprehensive experiments show that Fast3Dcache accelerates inference significantly, achieving up to a 27.12% speed-up and a 54.8% reduction in FLOPs, with minimal degradation in geometric quality as measured by Chamfer Distance (2.48%) and F-Score (1.95%).

Methodology

Pipeline Overview

Fast3Dcache Pipeline

Figure 3: Overview of the Fast3Dcache three-stage acceleration strategy.

Our approach segments the inference process into three strategic phases: (1) Full Sampling to establish initial geometric stability, (2) Dynamic Caching using PCSC and SSC to skip redundant computations, and (3) CFG-Free Refinement with an aggressive fixed-ratio schedule.

Key Observation: Voxel Stabilization

We observe a distinct three-phase stabilization pattern in 3D generation. As denoising proceeds, an increasing number of voxel locations become static. Our method leverages this observation to design a dynamic caching mechanism.

Observation

Predictive Caching Scheduler Constraint (PCSC)

PCSC dynamically determines cache quotas at each timestep. It predicts the number of stable voxels and controls the caching ratios effectively, following the observed logarithmic decay pattern in the intermediate phase.

Spatiotemporal Stability Criterion (SSC)

SSC selects stable tokens for reuse. It evaluates voxel stability from two perspectives:

  • Velocity Magnitude: Reflects how much a voxel's latent representation changes.
  • Acceleration Criterion: Measures the stability of velocity through the rate of change.

By jointly considering both, SSC accurately identifies features corresponding to geometrically converged regions.

Comparisons

We compare Fast3Dcache against standard TRELLIS and modality-aware methods like RAS. Below we show interactive 3D results across different settings. Click and drag to rotate the models.

1. Fast3Dcache with τ = 8 (speedup ↑27.12%; FLOPs ↓54.83%)

Input
Input
Input
Input
Input

2. Teacache + Fast3Dcache (speedup 3.41×)

Input
Input
Input
Input
Input

3. EasyCache + Fast3Dcache (speedup 10.33×)

Input
Input
Input
Input
Input

Quantitative Performance Metrics

Table 1. Quantitative comparison on TRELLIS and DSO frameworks.
Frameworks Acceleration Methods Throughput (iter/s) ↑ FLOPs (T) ↓ CD ↓ F-Score ↑
TRELLIS vanilla 0.5055 244.2 0.0686 54.8244
RAS (sample ratio 25%) 0.6337 125.1 0.0867 40.2769
RAS (sample ratio 12.5%) 0.6177 125.8 0.0846 43.9622
Fast3Dcache (τ = 3) 0.5850 142.4 0.0697 54.0900
Fast3Dcache (τ = 5) 0.6344 121.3 0.0712 53.5003
Fast3Dcache (τ = 8) 0.6426 110.3 0.0703 53.7528
DSO vanilla 0.3496 244.2 0.0687 54.8350
RAS (sample ratio 25%) 0.4341 125.0 0.0805 46.4990
RAS (sample ratio 12.5%) 0.4047 125.8 0.0820 45.5584
Fast3Dcache (τ = 3) 0.3955 146.5 0.0698 54.0451
Fast3Dcache (τ = 5) 0.4114 126.0 0.0711 53.5506
Fast3Dcache (τ = 8) 0.4071 115.4 0.0704 53.5487
Table 2. Full quantitative results of Fast3Dcache combined with a modality-agnostic method.
Method Throughput↑ (iters/s) CD↓ F-Score↑
Vanilla 0.51 0.0686 54.8244
Teacache 1.45 (2.84×) 0.0705 53.5567
Teacache + ours 1.74 (3.41×) 0.0701 53.9420
Easycache 1.95 (3.82×) 0.0692 54.5051
Easycache + ours 5.27 (10.33×) 0.0694 54.7722

BibTeX


@misc{yang2025fast3dcachetrainingfree3dgeometry,
      title={Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration}, 
      author={Mengyu Yang and Yanming Yang and Chenyi Xu and Chenxi Song and Yufan Zuo and Tong Zhao and Ruibo Li and Chi Zhang},
      year={2025},
      eprint={2511.22533},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.22533}, 
}