How can I get the labels for each segmentation result? #53

Griseo-Kaslana · 2025-04-11T09:36:45Z

Thank you for your excellent work!
I successfully ran the visualization demo with my own data (ReplicaCAD) using the following command:
CUDA_VISIBLE_DEVICES=0 python vis_demo/stream_demo.py --data_root ./data/office2 --config ./configs/ESAM-E_CA/ESAM-E_online_stream.py --checkpoint ./work_dirs/ESAM_online_3rscan_CA_test/ESAM_CA_online_epoch_128.pth --online_vis

However, I now have a question: How can I get the labels for each segmentation result? For example, identifying whether a segmented object is a "tomato" or an "apple." What should I do?

The text was updated successfully, but these errors were encountered:

xuxw98 · 2025-04-13T06:00:46Z

Hi,
Thanks for your interest. We are glad that you successfully run ESAM on your own data! Currently you use the _CA model. which means class-agnostic. So ESAM will only conduct online 3D instance segmentation without semantic label. To incorporate labels, you can use the provided model without _CA suffix, or train one by yourself.

Besides, if you want to perform open-vocabulary classification, you can refer to here. We describe how to combine ESAM with OpenMask3D for this purpose.

HeSongy · 2025-04-22T03:50:42Z

Hello, I was running online_demo.py using the Replica room0 dataset, but it seems the merging process failed. I’d like to ask if you happened to modify the config file or any part of the code? Thanks a lot !

This is my ESAM-E_online_stream.py config file, I just change the num_instance_classes and score_thr in test_cfg

_base_ = [
    'mmdet3d::_base_/default_runtime.py',
    'mmdet3d::_base_/datasets/scannet-seg.py'
]
custom_imports = dict(imports=['oneformer3d'])

num_instance_classes = 100
num_semantic_classes = 200
num_instance_classes_eval = 100
use_bbox = True
voxel_size = 0.02

model = dict(
    type='ScanNet200MixFormer3D_Stream',
    data_preprocessor=dict(type='Det3DDataPreprocessor_'),
    voxel_size=voxel_size,
    num_classes=num_instance_classes_eval,
    query_thr=0.5,
    backbone=dict(
        type='Res16UNet34C',
        in_channels=3,
        out_channels=96,
        config=dict(
            dilations=[1, 1, 1, 1],
            conv1_kernel_size=5,
            bn_momentum=0.02)),
    memory=dict(type='MultilevelMemory', in_channels=[32, 64, 128, 256], queue=-1, vmp_layer=(0,1,2,3)),
    pool=dict(type='GeoAwarePooling', channel_proj=96),
    decoder=dict(
        type='ScanNetMixQueryDecoder',
        num_layers=3,
        share_attn_mlp=False, 
        share_mask_mlp=False,
        temporal_attn=False,
        # the last mp_mode should be "P"
        cross_attn_mode=["", "SP", "SP", "SP"], 
        mask_pred_mode=["SP", "SP", "P", "P"],
        num_instance_queries=0,
        num_semantic_queries=0,
        num_instance_classes=num_instance_classes,
        num_semantic_classes=num_semantic_classes,
        num_semantic_linears=1,
        in_channels=96,
        d_model=256,
        num_heads=8,
        hidden_dim=1024,
        dropout=0.0,
        activation_fn='gelu',
        iter_pred=True,
        attn_mask=True,
        fix_attention=True,
        objectness_flag=False,
        bbox_flag=use_bbox),
    merge_head=dict(type='MergeHead', in_channels=256, out_channels=256, norm='layer'),
    merge_criterion=dict(type='ScanNetMergeCriterion_Fast', tmp=True, p2s=False),
    criterion=dict(
        type='ScanNetMixedCriterion',
        num_semantic_classes=num_semantic_classes,
        sem_criterion=dict(
            type='ScanNetSemanticCriterion',
            ignore_index=num_semantic_classes,
            loss_weight=0.5),
        inst_criterion=dict(
            type='MixedInstanceCriterion',
            matcher=dict(
                type='SparseMatcher',
                costs=[
                    dict(type='QueryClassificationCost', weight=0.5),
                    dict(type='MaskBCECost', weight=1.0),
                    dict(type='MaskDiceCost', weight=1.0)],
                topk=1),
            bbox_loss=dict(type='AxisAlignedIoULoss'),
            loss_weight=[0.5, 1.0, 1.0, 0.5, 0.5],
            num_classes=num_instance_classes,
            non_object_weight=0.1,
            fix_dice_loss_weight=True,
            iter_matcher=True,
            fix_mean_loss=True)),
    train_cfg=None,
    test_cfg=dict(
        # TODO: a larger topK may be better
        topk_insts=40,
        inscat_topk_insts=200,
        inst_score_thr=0.01,     # modified from 0.3
        pan_score_thr=0.04,
        npoint_thr=10,
        obj_normalization=True,
        sp_score_thr=0.02,
        nms=True,
        matrix_nms_kernel='linear',
        stuff_classes=[0, 1],
        merge_type='learnable_online'))

color_mean = (
    0.47793125906962 * 255,
    0.4303257521323044 * 255,
    0.3749598901421883 * 255)
color_std = (
    0.2834475483823543 * 255,
    0.27566157565723015 * 255,
    0.27018971370874995 * 255)

# dataset settings
train_pipeline = None
test_pipeline = None

train_dataloader = None
val_dataloader =None
test_dataloader = None

val_evaluator = None
test_evaluator = None

optim_wrapper = None

# learning rate
param_scheduler = None

custom_hooks = None
default_hooks = None

# training schedule for 1x
train_cfg = None
val_cfg = None
test_cfg = None

HeSongy mentioned this issue Apr 22, 2025

Issue about my own data #57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I get the labels for each segmentation result? #53

How can I get the labels for each segmentation result? #53

Griseo-Kaslana commented Apr 11, 2025

xuxw98 commented Apr 13, 2025

HeSongy commented Apr 22, 2025

How can I get the labels for each segmentation result? #53

How can I get the labels for each segmentation result? #53

Comments

Griseo-Kaslana commented Apr 11, 2025

xuxw98 commented Apr 13, 2025

HeSongy commented Apr 22, 2025