NeRFPlayerD-NeRFZip-NeRFLERFLERF-TOGOGARFieldLangSplat- Tensor4D
- DepthSplat: Connecting Gaussian Splatting and Depth
- DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering
- MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes
- L3DG: Latent 3D Gaussian Diffusion
- Differentiable Robot Rendering
- Object Pose Estimation Using Implicit Representation For Transparent Objects
- Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices
- Diffusion Models in 3D Vision: A Survey
- Magnituder Layers for Implicit Neural Representations in 3D
- NeRF-enabled Analysis-Through-Synthesis for ISAR Imaging of Small Everyday Objects with Sparse and Noisy UWB Radar Data
- EG-HumanNeRF: Efficient Generalizable Human NeRF Utilizing Human Prior for Sparse View
- 4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting
- 4-LEGS: 4D Language Embedded Gaussian Splatting
- Scalable Indoor Novel-View Synthesis using Drone-Captured 360 Imagery with 3D Gaussian Splatting
- Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting
- GSORB-SLAM: Gaussian Splatting SLAM benefits from ORB features and Transmittance information
- LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images
- SplatPose+: Real-time Image-Based Pose-Agnostic 3D Anomaly Detection
- Gaussian Splatting Visual MPC for Granular Media Manipulation
- GS^3: Efficient Relighting with Triple Gaussian Splatting
- 3D Gaussian Splatting in Robotics: A Survey
FreeNeRF- InstantSplat
EmerNeRFDistillNeRFJacobiNeRF- Efficient Geometry-aware 3D Generative Adversarial Networks
- MultiPhys: Multi-Person Physics-aware 3D Motion Estimation
ActAnywhereInstruct-NeRF2NeRFGenN2N- LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
- Sort-free Gaussian Splatting via Weighted Sum Rendering
- VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points
- EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting
- 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors
- PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting
- 3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation
- Fully Explicit Dynamic Gaussian Splatting
- SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes
- Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling
- E-3DGS: Gaussian Splatting with Exposure and Motion Events
- AG-SLAM: Active Gaussian Splatting SLAM
- 3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation
- Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis
- Real-time 3D-aware Portrait Video Relighting
- Few-shot NeRF by Adaptive Rendering Loss Regularization
- FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors
- Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions
2D Gaussian Splatting for Geometrically Accurate Radiance FieldsSuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh RenderingNeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting GuidanceMuRF: Multi-Baseline Radiance Fields- NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis
Mip-Splatting: Alias-free 3D Gaussian Splatting- Factor Fields: A Unified Framework for Neural Fields and Beyond
LaRa: Efficient Large-Baseline Radiance Fields- GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond
NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance- GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction
- 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting
- Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes
- High-quality Surface Reconstruction using Gaussian Surfels
- PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction
Multi-Scale 3D Gaussian Splatting for Anti-Aliased RenderingRelaxing Accurate Initialization Constraint for 3D Gaussian SplattingSA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing- LargeSpatialModel: End-to-end Unposed Images to Semantic 3D
- RaDe-GS: Rasterizing Depth in Gaussian Splatting
- GaussianPro: 3D Gaussian Splatting with Progressive Propagation
- Trim 3D Gaussian Splatting for Accurate Geometry Representation
- PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging
- Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
- GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization
- GS-IR: 3D Gaussian Splatting for Inverse Rendering
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
- Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians
MVSNeRFGeoNeRFSlowfast networks for video recognition- A closer look at spatiotemporal convolutions for action recognition
- Sementic Gaussians
A Survey on 3D Gaussian Splatting- Photo Tourism: Exploring Photo Collections in 3D
- Multi-View Stereo for Community Photo Collections
- Light Field Rendering.pdf
- The Lumigraph
- Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction
- Deep blending for free-viewpoint image-based rendering
- Deferred neural rendering: Image synthesis using neural textures
- DeepVoxels: Learning Persistent 3D Feature Embeddings
- Neural Point-Based Graphics
- ADOP: Approximate Differentiable One-Pixel Point Rendering
- GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
- FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
- GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction
- Scaffold-gs: Structured 3d gaussians for view-adaptive rendering
- Geogaussian: Geometry-aware gaussian splatting for scene rendering
- Gaussianpro: 3d gaussian splatting with progressive propagation
- COLMAP-Free 3D Gaussian Splatting
- Fsgs: Real-time fewshot view synthesis using gaussian splatting
- DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
- MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
- Corgs: Sparse-view 3d gaussian splatting via co-regularization
- pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
- Splatter image: Ultra-fast single-view 3d reconstruction
- Compact 3d gaussian representation for radiance field
- Hac: Hash-grid assisted context for 3d gaussian splatting compression
- Reducing the memory footprint of 3d gaussian splatting
- Compressed 3d gaussian splatting for accelerated novel view synthesis
- Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces
- Relightable gaussian codec avatars
- Mirror-3dgs: Incorporating mirror reflections into 3d gaussian splatting
- Scaffold-gs: Structured 3d gaussians for view-adaptive rendering
- GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal
- Language embedded 3d gaussians for open-vocabulary scene understanding
- Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields
- mgs: Foundation model embedded 3d gaussian splatting for holistic 3d scene understanding,
- Gaussian grouping: Segment and edit anything in 3d scenes
- Segment any 3d gaussians
- Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle
- Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction- Gaussian Splatting in Stylefde
- DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark
- TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering
- Dual-Camera Smooth Zoom on Mobile Phones
Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis- CoGS: Controllable Gaussian Splatting
- GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time
- Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
- Neural parametric gaussians for monocular non-rigid object reconstruction
- Control4d: Efficient 4d portrait editing with text
- Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting
- Swags: Sampling windows adaptively for dynamic 3d gaussian splatting
- Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis
- An efficient 3d gaussian representation for monocular/multi-view dynamic scenes
- Motion-aware 3d gaussian splatting for efficient dynamic scene reconstruction
- 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis
- DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
- Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
- HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
- A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets
- CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
- VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
- Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians
- On Scaling Up 3D Gaussian Splatting Training
- HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation in Urban Scenes
- GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting
- MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes
- GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
- SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
- Gaussian Splatting SLAM
- Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
- High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization
- Image Quality Assessment: From Error Visibility to Structural Similarity
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
- Fast Dynamic Radiance Fields with Time-Aware Neural Voxels
- HexPlane: A Fast Representation for Dynamic Scenes
- Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction
- Forward Flow for Novel View Synthesis of Dynamic Scenes
- VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality
- Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
- Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians
- MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
- GaussReg: Fast 3D Registration with Gaussian Splatting
- GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
- Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering
- AutoInt: Automatic Integration for Fast Neural Volume Rendering
- Neural Sparse Voxel Fields
- Learned Initializations for Optimizing Coordinate-Based Neural Representations
- DeRF: Decomposed Radiance Fields
- Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes
- Space-time Neural Irradiance Fields for Free-Viewpoint Video
- Neural Radiance Flow for 4D View Synthesis and Video Processing
- STaR: Bootstrapping Reasoning With Reasoning
- Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video
- Portrait Neural Radiance Fields from a Single Image
- NeRV: Neural Representations for Videos
- NeRD: Neural Reflectance Decomposition from Image Collections
- Neural Reflectance Fields for Appearance Acquisition
- pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
- Object-Centric Neural Scene Rendering
- GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields
- Neural Scene Graphs for Dynamic Scenes
- INeRF: Inverting Neural Radiance Fields for Pose Estimation
- Dense Depth Priors for Neural Radiance Fields from Sparse Input Views
- MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis
- Single-View View Synthesis with Multiplane Images
- FWD: Real-time Novel View Synthesis with Forward Warping and Depth
- Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation
- RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
- GRF: Learning a General Radiance Field for 3D Representation and Rendering
- IBRNet: Learning Multi-View Image-Based Rendering
- Depth-supervised NeRF: Fewer Views and Faster Training for Free
- Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes
- Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
- NeROIC: Neural Rendering of Objects from Online Image Collections
- Neural Scene Graphs for Dynamic Scenes
- CG-NeRF: Conditional Generative Neural Radiance Fields
- Neural 3D Video Synthesis from Multi-view Video
- TöRF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis
- Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video
- HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields
- Neural Radiance Flow for 4D View Synthesis and Video Processing
- Weakly Supervised 3D Open-vocabulary Segmentation
- ScanQA: 3D Question Answering for Spatial Scene Understanding
- IQA: Visual Question Answering in Interactive Environments
- SimVQA: Exploring Simulated Environments for Visual Question Answering
- Visual Language Maps for Robot Navigation
- Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation
- In-Place Scene Labelling and Understanding with Implicit Scene Representation
- Decomposing NeRF for Editing via Feature Field Distillation
- ConceptFusion: Open-set Multimodal 3D Mapping
- Weakly Supervised 3D Open-vocabulary Segmentation
- Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations
- Language-driven Semantic Segmentation
- Emerging Properties in Self-Supervised Vision Transformers
- In-Place Scene Labelling and Understanding with Implicit Scene Representation
- Panoptic Lifting for 3D Scene Understanding with Neural Fields
- OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
- Learning Transferable Visual Models From Natural Language Supervision
- CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory
- Learning to Prompt for Vision-Language Models
- Deep Stereo using Adaptive Thin Volume Representation with Uncertainty Awareness
- Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching
MVSNet: Depth Inference for Unstructured Multi-view Stereo- A theory of shape by space carving
Shape and Motion from Image Streams under Orthography: A Factorization Method
<<<<<<< HEAD
-
Structure-from-Motion Revisited -
InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering
-
pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
-
The Platonic Representation Hypothesis -
A Survey on Multimodal Large Language Models
-
DINOv2: Learning Robust Visual Features without Supervision
-
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
-
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
-
OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding
-
ICARUS: A Specialized Architecture for Neural Radiance Fields Rendering
-
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction
-
Anisotropic Fourier Features for Neural Image-Based Rendering and Relighting
-
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
-
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
-
Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels
-
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
-
OPENNERF: OPEN SET 3D NEURAL SCENE SEGMEN- TATION WITH PIXEL-WISE FEATURES AND RENDERED NOVEL VIEWS
-
SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
-
Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds
-
Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels
-
AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation
-
3D Segmentation of Humans in Point Clouds with Synthetic Data
-
Mask3D: Mask Transformer for 3D Semantic Instance Segmentation
-
Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes
-
Mix3D: Out-of-Context Data Augmentation for 3D Scenes
-
4D-StOP: Panoptic Segmentation of 4D LiDAR using Spatio-temporal Object Proposal Generation and Aggregation
-
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting -
Splatter a Video: Video Gaussian Representation for Versatile Processing -
CONTINUOUS AND DISCRETE WAVELET TRANSFORMS
-
Discrete Cosine Transfonn
-
Implicit Neural Representations with Periodic Activation Functions -
Compositional pattern producing networks: A novel abstraction of development
-
COIN++: Neural Compression Across Modalities
-
COIN: COmpression with Implicit Neural representations
-
Compression with Bayesian Implicit Neural Representations
-
Implicit Neural Representations for Image Compression -
Single Image Defocus Deblurring via Implicit Neural Inverse Kernels
-
Signal Processing for Implicit Neural Representations
-
Revisiting Implicit Neural Representations in Low-Level Vision
-
Learning Continuous Image Representation with Local Implicit Image Function
-
Single Image Super-Resolution via a Dual Interactive Implicit Neural Network
-
Beyond Periodicity: Towards a Unifying Framework for Activations in Coordinate-MLPs
-
WIRE: Wavelet Implicit Neural Representations
-
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
-
NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions
-
ACORN: Adaptive Coordinate Networks for Neural Scene Representation
-
Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes
-
LSQ+: Improving low-bit quantization through learnable offsets and better initialization -
SoundStream: An End-to-End Neural Audio Codec -
Split Hierarchical Variational Compression
-
Practical Lossless Compression with Latent Variables using Bits Back Coding
-
Point-NeRF: Point-based Neural Radiance Fields
-
HNeRV: A Hybrid Neural Representation for Videos
-
NeRV: Neural Representations for Videos
-
E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context
-
Boosting Neural Representations for Videos with a Conditional Decoder
-
Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis
-
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering -
Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
-
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
-
Text-to-3D using Gaussian Splatting
-
Drivable 3D Gaussian Avatars
-
Density Modeling of Images using a Generalized Normalization Transformation
-
Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules
-
ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding
-
Learned Image Compression with Mixed Transformer-CNN Architectures
-
End-to-end Optimized Image Compression
-
Variational image compression with a scale hyperprior
-
Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression
-
Joint Autoregressive and Hierarchical Priors for Learned Image Compression
-
COOL-CHIC: Coordinate-based Low Complexity Hierarchical Image Codec
-
Low-complexity Overfitted Neural Image Codec
-
Split Hierarchical Variational Compression
-
Practical Lossless Compression with Latent Variables using Bits Back Coding
-
SoundStream: An End-to-End Neural Audio Codec
-
Practical Lossless Compression with Latent Variables using Bits Back Coding
-
Entropy Coding of Unordered Data Structures
-
Occupancy Networks: Learning 3D Reconstruction in Function Space
-
Boosting Neural Representations for Videos with a Conditional Decoder
-
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
-
4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes
-
EndoGaussian: Real-time Gaussian Splatting for Dynamic Endoscopic Scene Reconstruction
-
NERV++: An Enhanced Implicit Neural Video Representation
-
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
-
VideoLCM: Video Latent Consistency Model
-
Generating 3D-Consistent Videos from Unposed Internet Photos
-
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
-
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
-
MixConv: Mixed Depthwise Convolutional Kernels
-
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
-
Handbook of image and video processing
-
Layered neural atlases for consistent video editing.
-
Consistent video depth estimation.
-
Representing moving images with layers
-
Deformable sprites for unsupervised video decomposition
-
Space-time correspondence as a contrastive random walk.
-
Codef: Content deformation fields for temporally consistent video processing.
-
Gendef: Learning generative deformation field for video generation.
-
Mft: Long-term tracking of every pixel
-
Layered neural rendering for retiming people in video.
-
Inve: Interactive neural video editing.
-
Tracking everything everywhere all at once -
Raft: Recurrent all-pairs field transforms for optical flow -
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
-
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
-
Diffusion models trained with large data are transferable visual models
-
Layered Neural Atlases for Consistent Video Editing
-
GenDeF: Learning Generative Deformation Field for Video Generation
-
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
-
Hashing Neural Video Decomposition with Multiplicative Residuals in Space-Time
-
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
-
VidToMe: Video Token Merging for Zero-Shot Video Editing
-
Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories -
CoTracker: It is Better to Track Together
-
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
-
MFT: Long-Term Tracking of Every Pixel
-
Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis -
SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
-
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
-
Neural trajectory fields for dynamic novel view synthesis.
-
Dynibar: Neural dynamic image-based rendering.
-
Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting.
-
Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle.
-
Trajectory space: A dual representation for nonrigid structure from motion.
-
q-bernstein polynomials and bézier curves.
-
B-spline curves and surfaces.
-
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
-
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
-
Density estimation using Real NVP.
-
Shape of Motion: 4D Reconstruction from a Single Video
-
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
-
Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting
-
3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors
-
3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
-
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
-
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
-
Make-A-Video: Text-to-Video Generation without Text-Video Data
-
LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis -
Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction
-
DiffGS: Functional Gaussian Splatting Diffusion
-
HexPlane: A Fast Representation for Dynamic Scenes
-
Suds: Scalable urban dynamic scenes
-
Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis
-
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
-
Humanrf: High-fidelity neural radiance fields for humans in motion.
-
Tava: Template-free animatable volumetric actors.
-
High-Resolution Image Synthesis with Latent Diffusion Models -
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
-
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
-
Scalable Diffusion Models with Transformers
-
All are Worth Words: A ViT Backbone for Diffusion Models
-
Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting
-
Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections -
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering -
Denoising diffusion probabilistic models. -
Taming Transformers for High-Resolution Image Synthesis
-
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks -
A Survey on Video Diffusion Models
-
Diffusion Models and Representation Learning: A Survey
-
Diffusion Models: A Comprehensive Survey of Methods and Applications
-
From Sora What We Can See: A Survey of Text-to-Video Generation
-
VDT: General-purpose Video Diffusion Transformers via Mask Modeling
-
Latte: Latent Diffusion Transformer for Video Generation
-
Scalable Diffusion Models with Transformers
-
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
-
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
-
VideoComposer: Compositional Video Synthesis with Motion Controllability
-
Imagen Video: High Definition Video Generation with Diffusion Models
-
Diffusion Models Beat GANs on Image Synthesis
-
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
-
Video Diffusion Models
-
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
-
Neural Network Parameter Diffusion
-
WonderWorld: Interactive 3D Scene Generation from a Single Image
-
Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Videos
-
Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects
-
Adding Conditional Control to Text-to-Image Diffusion Models
-
IC-light
-
Tutorial on Variational Autoencoders
-
Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction -
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
-
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
-
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
-
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
-
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
-
DUSt3R: Geometric 3D Vision Made Easy
-
MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification
-
Stopthepop: Sorted gaussian splatting for view-consistent real-time rendering.
-
Rade-gs: Rasterizing depth in gaussian splatting.
-
Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction.
-
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
-
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
-
Autoregressive Image Generation without Vector Quantization
-
PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence
-
SAMPart3D: Segment Any Part in 3D Objects
-
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
-
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
-
EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis
-
GStex: Per-Primitive Texturing of 2D Gaussian Splatting for Decoupled Appearance and Geometry Modeling
-
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets
-
Image Neural Field Diffusion Models
-
Neural Gaussian Scale-Space Fields
-
NICE: Non-linear Independent Components Estimation
-
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models -
Scalable diffusion models with transformers
-
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models -
Motion Prompting: Controlling Video Generation with Motion Trajectories
-
A Survey on Video Diffusion Models -
Video Diffusion Models -
Make-A-Video: Text-to-Video Generation without Text-Video Data -
Imagen Video: High Definition Video Generation with Diffusion Models -
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models -
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation -
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation -
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models -
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
-
SimDA: Simple Diffusion Adapter for Efficient Video Generation
-
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
-
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
-
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
-
Make Pixels Dance: High-Dynamic Video Generation
-
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
-
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
-
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
-
LaMD: Latent Motion Diffusion for Video Generation
-
Video Probabilistic Diffusion Models in Projected Latent Space
-
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
-
VDT: General-purpose Video Diffusion Transformers via Mask Modeling
-
Latte: Latent Diffusion Transformer for Video Generation -
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
-
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
-
Generative Image Dynamics
-
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
-
Motion Prompting: Controlling Video Generation with Motion Trajectories
-
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
-
REDUCIO! Generating 1024×1024 Video within 16 Seconds using Extremely Compressed Motion Latents -
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
-
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
-
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
-
Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis.
-
Learning Continuous Image Representation with Local Implicit Image Function -
FeatUp: A Model-Agnostic Framework for Features at Any Resolution
-
Ref-GS: Directional Factorization for 2D Gaussian Splatting
-
GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization
-
3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes
-
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
-
MovingParts: Motion-based 3D Part Discovery in Dynamic Radiance Field
-
Monocular Dynamic Gaussian Splatting is Fast and Brittle but Smooth Motion Helps
-
Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer
-
LRM: Large Reconstruction Model for Single Image to 3D
-
CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner
-
3DShape2VecSet: A 3D Shape Representation for Neural Fields and Generative Diffusion Models
-
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets
-
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
-
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
-
Evaluating Multiview Object Consistency in Humans and Image Models
-
MeshArt: Generating Articulated Meshes with Structure-guided Transformers
-
Flow Matching for Generative Modeling
-
Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields
-
Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale
-
Structured 3D Latents for Scalable and Versatile 3D Generation
-
LEARNING INTERACTIVE REAL-WORLD SIMULATORS