GitHub - vLAR-group/GrabS: 🔥GrabS in PyTorch (ICLR 2025 Spotlight)

GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision (ICLR 2025 Spotlight)

Zihui Zhang, Yafei Yang, Hongtao Wen, Bo Yang

Overview

We propose an unsupervised framework to separate learning objectness and search objects in 3D scenes.

Our method can segment objects in 3D scenes with the aid of an embodied agent:

Full demo (Youtube)

1. Environment

Installing dependencies

### CUDA 11.3  GCC 9.4
conda env create -f env.yml
source activate GrabS

pip3 install 'git+https://github.com/facebookresearch/detectron2.git@710e7795d0eeadf9def0e7ef957eea13532e34cf' --no-deps

sudo apt-get install libopenblas-dev
git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine"
cd MinkowskiEngine
git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
python setup.py install --force_cuda --blas=openblas

cd ../pointnet2
python setup.py install

Install superpoint dependencies

We also create SPG superpoints on S3DIS and the Synthetic data, which are used to help training. So, please compile the dependencies.

conda install -c anaconda boost
conda install -c omnia eigen3
conda install eigen

CONDAENV=YOUR_CONDA_ENVIRONMENT_LOCATION ## e.g. /home/zihui/anaconda3/envs/GrabS
cd partition/ply_c
cmake . -DPYTHON_LIBRARY=$CONDAENV/lib/libpython3.9.so -DPYTHON_INCLUDE_DIR=$CONDAENV/include/python3.9 -DBOOST_INCLUDEDIR=$CONDAENV/include -DEIGEN3_INCLUDE_DIR=$CONDAENV/include/eigen3
make
cd ..
cd cut-pursuit
mkdir build
cd build
cmake .. -DPYTHON_LIBRARY=$CONDAENV/lib/libpython3.9.so -DPYTHON_INCLUDE_DIR=$CONDAENV/include/python3.9 -DBOOST_INCLUDEDIR=$CONDAENV/include -DEIGEN3_INCLUDE_DIR=$CONDAENV/include/eigen3
make

2. Data Preparation

ShapeNet

We conduct chair segmentation on ScanNet and S3DIS datasets. To train an object-centric network for chair, we resume SDF data link from EFEM.

In addition, we create a synthetic dataset and segment it into multiple categories. To collect the training data of the object-centric network for multiple classes, we first download the watertight mesh from link, then follow EFEM to install and use GAPS to compute Ground Truth SDF. The watertight mesh folder should be manually reorganized like this:

ONet_data
└── 02691156
|   └── 1a04e3eab45ca15dd86060f189eb133.off
|   └── 1a6ad7a24bb89733f412783097373bdc.off
|   └── ...
└── 02828884
...

After compiling GAPS and downloading watertight Shapenet data, we can run the following command to compute Ground Truth SDF:

python cal_gaps.py

The well-prepared multi-class data can also be directly downloaded from Dropbox or 百度网盘(提取码qdts). Some category data are divided into two parts because they are large and need to be manually decompressed and put together.

ScanNet

We exactly follow Mask3D to preprocess the ScanNet dataset. Download the ScanNet dataset from here. Uncompress the folder and move it to data/scannet/raw. Follow Mask3D, we also built superpoints by applying Felzenszwalb and Huttenlocher's Graph-Based Image Segmentation algorithm to the test scenes using the default parameters. Please download the ScanNet tool link and come into ScanNet/Segmentor to build by running make (or create makefiles for your system using cmake). This will create a segmentator binary file. Finally, go outside the ScanNet to run the segmentator:

./run_segmentator.sh your_scannet_tranval_path ## e.g ./data/scannet/raw/scans
./run_segmentator.sh your_scannet_test_path ## e.g ./data/scannet/raw/scans_test

Having the superpoints file, we can run the preprocessing code:

python preprocessing/scannet_preprocessing.py

S3DIS

S3DIS dataset can be found here. Download the files named "Stanford3dDataset_v1.2_Aligned_Version.zip". Uncompress the folder and move it to data/s3dis_align/raw. There is an error in line 180389 of file Area_5/hallway_6/Annotations/ceiling_1.txt which needs to be fixed manually and modify the copy_Room_1.txt in Area_6/copyRoom_1 to copyRoom_1.txt. Then run the below commands to begin preprocessing:

python preprocessing/s3dis_preprocessing.py
python prepare_superpoints/initialSP_prepare_s3dis_SPG.py

Synthetic Scenes

Download our data from Google Drive and put it under the data/sys_scene/processed, then run the below command:

python prepare_superpoints/initialSP_prepare_sys_SPG.py

Data Structure

After previous downloading and preprocessing, the data structure should be:

data
└── scannet
|   └── raw
|   └── processed
└── s3dis_align
|   └── raw
|   └── processed
|   └── SPG_0.05
└── sys_scene
|   └── processed
|   └── SPG_0.01
└── chairs
|   └── 03001627
|   └── 03001627_dep
└── GAPS_SDF
|   └── 02691156
|   └── 02828884
|   └── ...
└── shapenet_splits

3. Object-centric Network training

We have two versions of the object-centric network in the paper. The first one is for chair segmentation on ScanNet and S3DIS. To train it, please run the command to construct augmentation data:

# Prepare point clouds for more categories as augmentation data in ./data/other_cls_data/
python create_aug_data.py

The chair SDF is trained as follows:

# Train the rotation estimation part, this will produce a ckpt in ./objnet/chair/pos/
CUDA_VISIBLE_DEVICES=0 python train_vae_chair.py --stage="pos"

# Train the SDF in VAE version, this will produce a ckpt in ./objnet/chair/vae/
CUDA_VISIBLE_DEVICES=0 python train_vae_chair.py --stage="vae"

# diffusion version (optional): 
# And our latent diffusion model is based on the VAE feature space, so we need to train the VAE at first.
# ddpm
CUDA_VISIBLE_DEVICES=0 python train_ddpm_chair.py
# or rectflow 
CUDA_VISIBLE_DEVICES=0 python train_rectflow_chair.py

The second object-centric network is for multiple category segmentation on our synthetic scenes and trained as follows:

# Train the rotation estimation part, this will produce a ckpt in ./objnet/multi-cate/pos/
CUDA_VISIBLE_DEVICES=0 python train_vae_multiclass.py --stage="pos"

# Train the SDF in VAE version, this will produce a ckpt in ./objnet/multi-cate/vae/
CUDA_VISIBLE_DEVICES=0 python train_vae_multiclass.py --stage="vae"

# diffusion model (optional): 
CUDA_VISIBLE_DEVICES=0 python train_ddpm_multiclass.py

4. Object Segmentation Network training

ScanNet

The well-trained object-centric model for chair is saved in ./objnet/chair/vae/ or ./objnet/chair/ddpm or ``./objnet/chair/rectflow``` by default. The segmentation model on ScanNet can be trained by:

# Train the segnet by VAE SDF
CUDA_VISIBLE_DEVICES=0 python train_seg_scannet.py

# Train the segnet by ddpm
CUDA_VISIBLE_DEVICES=0 python train_ddpmseg_scannet.py
# or train it by rectflow
CUDA_VISIBLE_DEVICES=0 python train_rectflowseg_scannet.py

S3DIS

In our main experiments, we conduct a cross-dataset validation that uses the well-trained segmentation model from ScaNet to evaluate on S3DIS. For example, to evaluate on S3DIS Area5:

# ScanNet to S3DIS eval
CUDA_VISIBLE_DEVICES=0 python train_seg_s3dis.py --use_sp=False --cross_test=True --cross_test_ckpt=your_ckpt # e.g.'ckpt_segnet/scannet_VAE_chair/checkpoint_450.tar'

Optional: Train Segmentation models on S3DIS. The training results on S3DIS are not performed in our paper, but we can also do it by:

# Train the segnet by VAE SDF
CUDA_VISIBLE_DEVICES=0 python train_seg_s3dis.py

# Train the segnet by Diffusion SDF
CUDA_VISIBLE_DEVICES=0 python train_ddpmseg_s3dis.py

Synthetic Dataset

The well-trained object-centric models on multiple categories are saved in ./objnet/multi-cate/vae/ or ./objnet/multi-cate/diff/ by default. We can train a segmentation model on a Synthetic dataset by simply running:

# Train the segnet by VAE SDF
CUDA_VISIBLE_DEVICES=0 python train_seg_sys.py

# Train the segnet by Diffusion SDF
CUDA_VISIBLE_DEVICES=0 python train_ddpmseg_sys.py

5. Model checkpoints

We also provide well-trained checkpoints for ScanNet and the synthetic dataset in Google Drive. Note that the checkpoints for cross-dataset evaluation on S3DIS are also trained on ScanNet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision (ICLR 2025 Spotlight)

Overview

Full demo (Youtube)

1. Environment

Installing dependencies

Install superpoint dependencies

2. Data Preparation

ShapeNet

ScanNet

S3DIS

Synthetic Scenes

Data Structure

3. Object-centric Network training

4. Object Segmentation Network training

ScanNet

S3DIS

Synthetic Dataset

5. Model checkpoints

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Render_Occ		Render_Occ
benchmark		benchmark
data/shapenet_splits/data		data/shapenet_splits/data
figs		figs
lib		lib
mask3d_models		mask3d_models
models		models
pointnet2		pointnet2
prepare_superpoints		prepare_superpoints
preprocessing		preprocessing
vis_in_obj		vis_in_obj
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RLnet.py		RLnet.py
build_dataset.py		build_dataset.py
cal_gaps.py		cal_gaps.py
create_aug_data.py		create_aug_data.py
env.yml		env.yml
run_segmentator.sh		run_segmentator.sh
train_ddpm_chair.py		train_ddpm_chair.py
train_ddpm_multiclass.py		train_ddpm_multiclass.py
train_ddpmseg_s3dis.py		train_ddpmseg_s3dis.py
train_ddpmseg_scannet.py		train_ddpmseg_scannet.py
train_ddpmseg_sys.py		train_ddpmseg_sys.py
train_rectflow_chair.py		train_rectflow_chair.py
train_rectflowseg_scannet.py		train_rectflowseg_scannet.py
train_seg_s3dis.py		train_seg_s3dis.py
train_seg_scannet.py		train_seg_scannet.py
train_seg_sys.py		train_seg_sys.py
train_sup_scannet.py		train_sup_scannet.py
train_vae_chair.py		train_vae_chair.py
train_vae_multiclass.py		train_vae_multiclass.py

License

vLAR-group/GrabS

Folders and files

Latest commit

History

Repository files navigation

GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision (ICLR 2025 Spotlight)

Overview

Full demo (Youtube)

1. Environment

Installing dependencies

Install superpoint dependencies

2. Data Preparation

ShapeNet

ScanNet

S3DIS

Synthetic Scenes

Data Structure

3. Object-centric Network training

4. Object Segmentation Network training

ScanNet

S3DIS

Synthetic Dataset

5. Model checkpoints

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages