Skip to content

Latest commit

 

History

History
768 lines (501 loc) · 33.8 KB

paper.md

File metadata and controls

768 lines (501 loc) · 33.8 KB
  • NeRFPlayer
  • D-NeRF
  • Zip-NeRF
  • LERF
  • LERF-TOGO
  • GARField
  • LangSplat
  • Tensor4D
  • DepthSplat: Connecting Gaussian Splatting and Depth
  • DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering
  • MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes
  • L3DG: Latent 3D Gaussian Diffusion
  • Differentiable Robot Rendering
  • Object Pose Estimation Using Implicit Representation For Transparent Objects
  • Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices
  • Diffusion Models in 3D Vision: A Survey
  • Magnituder Layers for Implicit Neural Representations in 3D
  • NeRF-enabled Analysis-Through-Synthesis for ISAR Imaging of Small Everyday Objects with Sparse and Noisy UWB Radar Data
  • EG-HumanNeRF: Efficient Generalizable Human NeRF Utilizing Human Prior for Sparse View
  • 4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting
  • 4-LEGS: 4D Language Embedded Gaussian Splatting
  • Scalable Indoor Novel-View Synthesis using Drone-Captured 360 Imagery with 3D Gaussian Splatting
  • Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting
  • GSORB-SLAM: Gaussian Splatting SLAM benefits from ORB features and Transmittance information
  • LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images
  • SplatPose+: Real-time Image-Based Pose-Agnostic 3D Anomaly Detection
  • Gaussian Splatting Visual MPC for Granular Media Manipulation
  • GS^3: Efficient Relighting with Triple Gaussian Splatting
  • 3D Gaussian Splatting in Robotics: A Survey
  • FreeNeRF
  • InstantSplat
  • EmerNeRF
  • DistillNeRF
  • JacobiNeRF
  • Efficient Geometry-aware 3D Generative Adversarial Networks
  • MultiPhys: Multi-Person Physics-aware 3D Motion Estimation
  • ActAnywhere
  • Instruct-NeRF2NeRF
  • GenN2N
  • LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
  • Sort-free Gaussian Splatting via Weighted Sum Rendering
  • VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points
  • EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting
  • 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors
  • PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting
  • 3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation
  • Fully Explicit Dynamic Gaussian Splatting
  • SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes
  • Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling
  • E-3DGS: Gaussian Splatting with Exposure and Motion Events
  • AG-SLAM: Active Gaussian Splatting SLAM
  • 3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation
  • Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis
  • Real-time 3D-aware Portrait Video Relighting
  • Few-shot NeRF by Adaptive Rendering Loss Regularization
  • FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors
  • Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions
  • 2D Gaussian Splatting for Geometrically Accurate Radiance Fields
  • SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
  • NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance
  • MuRF: Multi-Baseline Radiance Fields
  • NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis
  • Mip-Splatting: Alias-free 3D Gaussian Splatting
  • Factor Fields: A Unified Framework for Neural Fields and Beyond
  • LaRa: Efficient Large-Baseline Radiance Fields
  • GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond
  • NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance
  • GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction
  • 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting
  • Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes
  • High-quality Surface Reconstruction using Gaussian Surfels
  • PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction
  • Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
  • Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting
  • SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing
  • LargeSpatialModel: End-to-end Unposed Images to Semantic 3D
  • RaDe-GS: Rasterizing Depth in Gaussian Splatting
  • GaussianPro: 3D Gaussian Splatting with Progressive Propagation
  • Trim 3D Gaussian Splatting for Accurate Geometry Representation
  • PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging
  • Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
  • GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization
  • GS-IR: 3D Gaussian Splatting for Inverse Rendering
  • No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
  • Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians
  • MVSNeRF
  • GeoNeRF
  • Slowfast networks for video recognition
  • A closer look at spatiotemporal convolutions for action recognition
  • Sementic Gaussians
  • A Survey on 3D Gaussian Splatting
  • Photo Tourism: Exploring Photo Collections in 3D
  • Multi-View Stereo for Community Photo Collections
  • Light Field Rendering.pdf
  • The Lumigraph
  • Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction
  • Deep blending for free-viewpoint image-based rendering
  • Deferred neural rendering: Image synthesis using neural textures
  • DeepVoxels: Learning Persistent 3D Feature Embeddings
  • Neural Point-Based Graphics
  • ADOP: Approximate Differentiable One-Pixel Point Rendering
  • GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
  • FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
  • GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction
  • Scaffold-gs: Structured 3d gaussians for view-adaptive rendering
  • Geogaussian: Geometry-aware gaussian splatting for scene rendering
  • Gaussianpro: 3d gaussian splatting with progressive propagation
  • COLMAP-Free 3D Gaussian Splatting
  • Fsgs: Real-time fewshot view synthesis using gaussian splatting
  • DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
  • MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
  • Corgs: Sparse-view 3d gaussian splatting via co-regularization
  • pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
  • Splatter image: Ultra-fast single-view 3d reconstruction
  • Compact 3d gaussian representation for radiance field
  • Hac: Hash-grid assisted context for 3d gaussian splatting compression
  • Reducing the memory footprint of 3d gaussian splatting
  • Compressed 3d gaussian splatting for accelerated novel view synthesis
  • Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces
  • Relightable gaussian codec avatars
  • Mirror-3dgs: Incorporating mirror reflections into 3d gaussian splatting
  • Scaffold-gs: Structured 3d gaussians for view-adaptive rendering
  • GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal
  • Language embedded 3d gaussians for open-vocabulary scene understanding
  • Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields
  • mgs: Foundation model embedded 3d gaussian splatting for holistic 3d scene understanding,
  • Gaussian grouping: Segment and edit anything in 3d scenes
  • Segment any 3d gaussians
  • Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle
  • Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
  • Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
  • Gaussian Splatting in Stylefde
  • DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark
  • TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering
  • Dual-Camera Smooth Zoom on Mobile Phones
  • Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis
  • CoGS: Controllable Gaussian Splatting
  • GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time
  • Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
  • Neural parametric gaussians for monocular non-rigid object reconstruction
  • Control4d: Efficient 4d portrait editing with text
  • Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting
  • Swags: Sampling windows adaptively for dynamic 3d gaussian splatting
  • Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis
  • An efficient 3d gaussian representation for monocular/multi-view dynamic scenes
  • Motion-aware 3d gaussian splatting for efficient dynamic scene reconstruction
  • 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis
  • DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
  • Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
  • HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
  • A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets
  • CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
  • VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
  • Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians
  • On Scaling Up 3D Gaussian Splatting Training
  • HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation in Urban Scenes
  • GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting
  • MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes
  • GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
  • SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
  • Gaussian Splatting SLAM
  • Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting
  • High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization
  • Image Quality Assessment: From Error Visibility to Structural Similarity
  • The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
  • Fast Dynamic Radiance Fields with Time-Aware Neural Voxels
  • HexPlane: A Fast Representation for Dynamic Scenes
  • Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction
  • Forward Flow for Novel View Synthesis of Dynamic Scenes
  • VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality
  • Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
  • Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians
  • MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
  • GaussReg: Fast 3D Registration with Gaussian Splatting
  • GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
  • Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering
  • AutoInt: Automatic Integration for Fast Neural Volume Rendering
  • Neural Sparse Voxel Fields
  • Learned Initializations for Optimizing Coordinate-Based Neural Representations
  • DeRF: Decomposed Radiance Fields
  • Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes
  • Space-time Neural Irradiance Fields for Free-Viewpoint Video
  • Neural Radiance Flow for 4D View Synthesis and Video Processing
  • STaR: Bootstrapping Reasoning With Reasoning
  • Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video
  • Portrait Neural Radiance Fields from a Single Image
  • NeRV: Neural Representations for Videos
  • NeRD: Neural Reflectance Decomposition from Image Collections
  • Neural Reflectance Fields for Appearance Acquisition
  • pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
  • Object-Centric Neural Scene Rendering
  • GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields
  • Neural Scene Graphs for Dynamic Scenes
  • INeRF: Inverting Neural Radiance Fields for Pose Estimation
  • Dense Depth Priors for Neural Radiance Fields from Sparse Input Views
  • MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis
  • Single-View View Synthesis with Multiplane Images
  • FWD: Real-time Novel View Synthesis with Forward Warping and Depth
  • Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation
  • RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
  • GRF: Learning a General Radiance Field for 3D Representation and Rendering
  • IBRNet: Learning Multi-View Image-Based Rendering
  • Depth-supervised NeRF: Fewer Views and Faster Training for Free
  • Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes
  • Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
  • NeROIC: Neural Rendering of Objects from Online Image Collections
  • Neural Scene Graphs for Dynamic Scenes
  • CG-NeRF: Conditional Generative Neural Radiance Fields
  • Neural 3D Video Synthesis from Multi-view Video
  • TöRF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis
  • Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video
  • HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields
  • Neural Radiance Flow for 4D View Synthesis and Video Processing
  • Weakly Supervised 3D Open-vocabulary Segmentation
  • ScanQA: 3D Question Answering for Spatial Scene Understanding
  • IQA: Visual Question Answering in Interactive Environments
  • SimVQA: Exploring Simulated Environments for Visual Question Answering
  • Visual Language Maps for Robot Navigation
  • Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation
  • In-Place Scene Labelling and Understanding with Implicit Scene Representation
  • Decomposing NeRF for Editing via Feature Field Distillation
  • ConceptFusion: Open-set Multimodal 3D Mapping
  • Weakly Supervised 3D Open-vocabulary Segmentation
  • Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations
  • Language-driven Semantic Segmentation
  • Emerging Properties in Self-Supervised Vision Transformers
  • In-Place Scene Labelling and Understanding with Implicit Scene Representation
  • Panoptic Lifting for 3D Scene Understanding with Neural Fields
  • OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
  • Learning Transferable Visual Models From Natural Language Supervision
  • CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory
  • Learning to Prompt for Vision-Language Models
  • Deep Stereo using Adaptive Thin Volume Representation with Uncertainty Awareness
  • Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching
  • MVSNet: Depth Inference for Unstructured Multi-view Stereo
  • A theory of shape by space carving
  • Shape and Motion from Image Streams under Orthography: A Factorization Method

<<<<<<< HEAD

  • Structure-from-Motion Revisited

  • InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering

  • pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

  • The Platonic Representation Hypothesis

  • A Survey on Multimodal Large Language Models

  • DINOv2: Learning Robust Visual Features without Supervision

  • SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

  • Improving 2D Feature Representations by 3D-Aware Fine-Tuning

  • OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

  • ICARUS: A Specialized Architecture for Neural Radiance Fields Rendering

  • Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

  • Anisotropic Fourier Features for Neural Image-Based Rendering and Relighting

  • GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

  • Improving 2D Feature Representations by 3D-Aware Fine-Tuning

  • Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels

  • OpenMask3D: Open-Vocabulary 3D Instance Segmentation

  • OPENNERF: OPEN SET 3D NEURAL SCENE SEGMEN- TATION WITH PIXEL-WISE FEATURES AND RENDERED NOVEL VIEWS

  • SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

  • Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds

  • Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels

  • AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

  • 3D Segmentation of Humans in Point Clouds with Synthetic Data

  • Mask3D: Mask Transformer for 3D Semantic Instance Segmentation

  • Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes

  • Mix3D: Out-of-Context Data Augmentation for 3D Scenes

  • 4D-StOP: Panoptic Segmentation of 4D LiDAR using Spatio-temporal Object Proposal Generation and Aggregation

  • GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

  • Splatter a Video: Video Gaussian Representation for Versatile Processing

  • CONTINUOUS AND DISCRETE WAVELET TRANSFORMS

  • Discrete Cosine Transfonn

  • Implicit Neural Representations with Periodic Activation Functions

  • Compositional pattern producing networks: A novel abstraction of development

  • COIN++: Neural Compression Across Modalities

  • COIN: COmpression with Implicit Neural representations

  • Compression with Bayesian Implicit Neural Representations

  • Implicit Neural Representations for Image Compression

  • Single Image Defocus Deblurring via Implicit Neural Inverse Kernels

  • Signal Processing for Implicit Neural Representations

  • Revisiting Implicit Neural Representations in Low-Level Vision

  • Learning Continuous Image Representation with Local Implicit Image Function

  • Single Image Super-Resolution via a Dual Interactive Implicit Neural Network

  • Beyond Periodicity: Towards a Unifying Framework for Activations in Coordinate-MLPs

  • WIRE: Wavelet Implicit Neural Representations

  • Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains

  • NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions

  • ACORN: Adaptive Coordinate Networks for Neural Scene Representation

  • Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes

  • LSQ+: Improving low-bit quantization through learnable offsets and better initialization

  • SoundStream: An End-to-End Neural Audio Codec

  • Split Hierarchical Variational Compression

  • Practical Lossless Compression with Latent Variables using Bits Back Coding

  • Point-NeRF: Point-based Neural Radiance Fields

  • HNeRV: A Hybrid Neural Representation for Videos

  • NeRV: Neural Representations for Videos

  • E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context

  • Boosting Neural Representations for Videos with a Conditional Decoder

  • Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis

  • 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

  • Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

  • GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

  • Text-to-3D using Gaussian Splatting

  • Drivable 3D Gaussian Avatars

  • Density Modeling of Images using a Generalized Normalization Transformation

  • Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules

  • ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding

  • Learned Image Compression with Mixed Transformer-CNN Architectures

  • End-to-end Optimized Image Compression

  • Variational image compression with a scale hyperprior

  • Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression

  • Joint Autoregressive and Hierarchical Priors for Learned Image Compression

  • COOL-CHIC: Coordinate-based Low Complexity Hierarchical Image Codec

  • Low-complexity Overfitted Neural Image Codec

  • Split Hierarchical Variational Compression

  • Practical Lossless Compression with Latent Variables using Bits Back Coding

  • SoundStream: An End-to-End Neural Audio Codec

  • Practical Lossless Compression with Latent Variables using Bits Back Coding

  • Entropy Coding of Unordered Data Structures

  • Occupancy Networks: Learning 3D Reconstruction in Function Space

  • Boosting Neural Representations for Videos with a Conditional Decoder

  • 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos

  • 4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes

  • EndoGaussian: Real-time Gaussian Splatting for Dynamic Endoscopic Scene Reconstruction

  • NERV++: An Enhanced Implicit Neural Video Representation

  • T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

  • VideoLCM: Video Latent Consistency Model

  • Generating 3D-Consistent Videos from Unposed Internet Photos

  • ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

  • Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

  • MixConv: Mixed Depthwise Convolutional Kernels

  • EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

  • Handbook of image and video processing

  • Layered neural atlases for consistent video editing.

  • Consistent video depth estimation.

  • Representing moving images with layers

  • Deformable sprites for unsupervised video decomposition

  • Space-time correspondence as a contrastive random walk.

  • Codef: Content deformation fields for temporally consistent video processing.

  • Gendef: Learning generative deformation field for video generation.

  • Mft: Long-term tracking of every pixel

  • Layered neural rendering for retiming people in video.

  • Inve: Interactive neural video editing.

  • Tracking everything everywhere all at once

  • Raft: Recurrent all-pairs field transforms for optical flow

  • Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

  • Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

  • Diffusion models trained with large data are transferable visual models

  • Layered Neural Atlases for Consistent Video Editing

  • GenDeF: Learning Generative Deformation Field for Video Generation

  • CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

  • Hashing Neural Video Decomposition with Multiplicative Residuals in Space-Time

  • MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers

  • VidToMe: Video Token Merging for Zero-Shot Video Editing

  • Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories

  • CoTracker: It is Better to Track Together

  • TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

  • MFT: Long-Term Tracking of Every Pixel

  • Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis

  • SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

  • 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos

  • Neural trajectory fields for dynamic novel view synthesis.

  • Dynibar: Neural dynamic image-based rendering.

  • Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting.

  • Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle.

  • Trajectory space: A dual representation for nonrigid structure from motion.

  • q-bernstein polynomials and bézier curves.

  • B-spline curves and surfaces.

  • Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

  • Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

  • Density estimation using Real NVP.

  • Shape of Motion: 4D Reconstruction from a Single Video

  • 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

  • Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

  • 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors

  • 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

  • Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

  • Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

  • Make-A-Video: Text-to-Video Generation without Text-Video Data

  • LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis

  • Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction

  • DiffGS: Functional Gaussian Splatting Diffusion

  • HexPlane: A Fast Representation for Dynamic Scenes

  • Suds: Scalable urban dynamic scenes

  • Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis

  • BANMo: Building Animatable 3D Neural Models from Many Casual Videos

  • Humanrf: High-fidelity neural radiance fields for humans in motion.

  • Tava: Template-free animatable volumetric actors.

  • High-Resolution Image Synthesis with Latent Diffusion Models

  • Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

  • PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

  • Scalable Diffusion Models with Transformers

  • All are Worth Words: A ViT Backbone for Diffusion Models

  • Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

  • Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections

  • Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

  • Denoising diffusion probabilistic models.

  • Taming Transformers for High-Resolution Image Synthesis

  • The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

  • A Survey on Video Diffusion Models

  • Diffusion Models and Representation Learning: A Survey

  • Diffusion Models: A Comprehensive Survey of Methods and Applications

  • From Sora What We Can See: A Survey of Text-to-Video Generation

  • VDT: General-purpose Video Diffusion Transformers via Mask Modeling

  • Latte: Latent Diffusion Transformer for Video Generation

  • Scalable Diffusion Models with Transformers

  • VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

  • AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

  • VideoComposer: Compositional Video Synthesis with Motion Controllability

  • Imagen Video: High Definition Video Generation with Diffusion Models

  • Diffusion Models Beat GANs on Image Synthesis

  • AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

  • Video Diffusion Models

  • Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

  • Neural Network Parameter Diffusion

  • WonderWorld: Interactive 3D Scene Generation from a Single Image

  • Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Videos

  • Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects

  • Adding Conditional Control to Text-to-Image Diffusion Models

  • IC-light

  • Tutorial on Variational Autoencoders

  • Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction

  • HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

  • SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

  • VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

  • SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

  • DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

  • DUSt3R: Geometric 3D Vision Made Easy

  • MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification

  • Stopthepop: Sorted gaussian splatting for view-consistent real-time rendering.

  • Rade-gs: Rasterizing depth in gaussian splatting.

  • Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction.

  • MarDini: Masked Autoregressive Diffusion for Video Generation at Scale

  • The Scene Language: Representing Scenes with Programs, Words, and Embeddings

  • Autoregressive Image Generation without Vector Quantization

  • PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence

  • SAMPart3D: Segment Any Part in 3D Objects

  • LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

  • Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

  • EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis

  • GStex: Per-Primitive Texturing of 2D Gaussian Splatting for Decoupled Appearance and Geometry Modeling

  • CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

  • Image Neural Field Diffusion Models

  • Neural Gaussian Scale-Space Fields

  • NICE: Non-linear Independent Components Estimation

  • AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

  • Scalable diffusion models with transformers

  • Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

  • Motion Prompting: Controlling Video Generation with Motion Trajectories

  • A Survey on Video Diffusion Models

  • Video Diffusion Models

  • Make-A-Video: Text-to-Video Generation without Text-Video Data

  • Imagen Video: High Definition Video Generation with Diffusion Models

  • Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

  • Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

  • Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

  • LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

  • Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

  • SimDA: Simple Diffusion Adapter for Efficient Video Generation

  • AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

  • VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

  • VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

  • Make Pixels Dance: High-Dynamic Video Generation

  • Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

  • MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

  • Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

  • LaMD: Latent Motion Diffusion for Video Generation

  • Video Probabilistic Diffusion Models in Projected Latent Space

  • Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

  • VDT: General-purpose Video Diffusion Transformers via Mask Modeling

  • Latte: Latent Diffusion Transformer for Video Generation

  • Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

  • TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

  • Generative Image Dynamics

  • Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

  • Motion Prompting: Controlling Video Generation with Motion Trajectories

  • SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

  • REDUCIO! Generating 1024×1024 Video within 16 Seconds using Extremely Compressed Motion Latents

  • DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

  • InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

  • Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

  • Pixart-α: Fast training of diffusion transformer for photorealistic text-to-image synthesis.

  • Learning Continuous Image Representation with Local Implicit Image Function

  • FeatUp: A Model-Agnostic Framework for Features at Any Resolution

  • Ref-GS: Directional Factorization for 2D Gaussian Splatting

  • GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization

  • 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

  • Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

  • MovingParts: Motion-based 3D Part Discovery in Dynamic Radiance Field

  • Monocular Dynamic Gaussian Splatting is Fast and Brittle but Smooth Motion Helps

  • Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer

  • LRM: Large Reconstruction Model for Single Image to 3D

  • CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner

  • 3DShape2VecSet: A 3D Shape Representation for Neural Fields and Generative Diffusion Models

  • CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

  • You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

  • Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

  • Evaluating Multiview Object Consistency in Humans and Image Models

  • MeshArt: Generating Articulated Meshes with Structure-guided Transformers

  • Flow Matching for Generative Modeling

  • Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields

  • Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale

  • Structured 3D Latents for Scalable and Versatile 3D Generation

  • LEARNING INTERACTIVE REAL-WORLD SIMULATORS