Skip to content

How can I get the labels for each segmentation result? #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Griseo-Kaslana opened this issue Apr 11, 2025 · 2 comments
Open

How can I get the labels for each segmentation result? #53

Griseo-Kaslana opened this issue Apr 11, 2025 · 2 comments

Comments

@Griseo-Kaslana
Copy link

Thank you for your excellent work!
I successfully ran the visualization demo with my own data (ReplicaCAD) using the following command:
CUDA_VISIBLE_DEVICES=0 python vis_demo/stream_demo.py --data_root ./data/office2 --config ./configs/ESAM-E_CA/ESAM-E_online_stream.py --checkpoint ./work_dirs/ESAM_online_3rscan_CA_test/ESAM_CA_online_epoch_128.pth --online_vis

However, I now have a question: How can I get the labels for each segmentation result? For example, identifying whether a segmented object is a "tomato" or an "apple." What should I do?

Image

@xuxw98
Copy link
Owner

xuxw98 commented Apr 13, 2025

Hi,
Thanks for your interest. We are glad that you successfully run ESAM on your own data! Currently you use the _CA model. which means class-agnostic. So ESAM will only conduct online 3D instance segmentation without semantic label. To incorporate labels, you can use the provided model without _CA suffix, or train one by yourself.

Besides, if you want to perform open-vocabulary classification, you can refer to here. We describe how to combine ESAM with OpenMask3D for this purpose.

@HeSongy
Copy link

HeSongy commented Apr 22, 2025

Hello, I was running online_demo.py using the Replica room0 dataset, but it seems the merging process failed. I’d like to ask if you happened to modify the config file or any part of the code? Thanks a lot !

Image


This is my ESAM-E_online_stream.py config file, I just change the num_instance_classes and score_thr in test_cfg

_base_ = [
    'mmdet3d::_base_/default_runtime.py',
    'mmdet3d::_base_/datasets/scannet-seg.py'
]
custom_imports = dict(imports=['oneformer3d'])

num_instance_classes = 100
num_semantic_classes = 200
num_instance_classes_eval = 100
use_bbox = True
voxel_size = 0.02

model = dict(
    type='ScanNet200MixFormer3D_Stream',
    data_preprocessor=dict(type='Det3DDataPreprocessor_'),
    voxel_size=voxel_size,
    num_classes=num_instance_classes_eval,
    query_thr=0.5,
    backbone=dict(
        type='Res16UNet34C',
        in_channels=3,
        out_channels=96,
        config=dict(
            dilations=[1, 1, 1, 1],
            conv1_kernel_size=5,
            bn_momentum=0.02)),
    memory=dict(type='MultilevelMemory', in_channels=[32, 64, 128, 256], queue=-1, vmp_layer=(0,1,2,3)),
    pool=dict(type='GeoAwarePooling', channel_proj=96),
    decoder=dict(
        type='ScanNetMixQueryDecoder',
        num_layers=3,
        share_attn_mlp=False, 
        share_mask_mlp=False,
        temporal_attn=False,
        # the last mp_mode should be "P"
        cross_attn_mode=["", "SP", "SP", "SP"], 
        mask_pred_mode=["SP", "SP", "P", "P"],
        num_instance_queries=0,
        num_semantic_queries=0,
        num_instance_classes=num_instance_classes,
        num_semantic_classes=num_semantic_classes,
        num_semantic_linears=1,
        in_channels=96,
        d_model=256,
        num_heads=8,
        hidden_dim=1024,
        dropout=0.0,
        activation_fn='gelu',
        iter_pred=True,
        attn_mask=True,
        fix_attention=True,
        objectness_flag=False,
        bbox_flag=use_bbox),
    merge_head=dict(type='MergeHead', in_channels=256, out_channels=256, norm='layer'),
    merge_criterion=dict(type='ScanNetMergeCriterion_Fast', tmp=True, p2s=False),
    criterion=dict(
        type='ScanNetMixedCriterion',
        num_semantic_classes=num_semantic_classes,
        sem_criterion=dict(
            type='ScanNetSemanticCriterion',
            ignore_index=num_semantic_classes,
            loss_weight=0.5),
        inst_criterion=dict(
            type='MixedInstanceCriterion',
            matcher=dict(
                type='SparseMatcher',
                costs=[
                    dict(type='QueryClassificationCost', weight=0.5),
                    dict(type='MaskBCECost', weight=1.0),
                    dict(type='MaskDiceCost', weight=1.0)],
                topk=1),
            bbox_loss=dict(type='AxisAlignedIoULoss'),
            loss_weight=[0.5, 1.0, 1.0, 0.5, 0.5],
            num_classes=num_instance_classes,
            non_object_weight=0.1,
            fix_dice_loss_weight=True,
            iter_matcher=True,
            fix_mean_loss=True)),
    train_cfg=None,
    test_cfg=dict(
        # TODO: a larger topK may be better
        topk_insts=40,
        inscat_topk_insts=200,
        inst_score_thr=0.01,     # modified from 0.3
        pan_score_thr=0.04,
        npoint_thr=10,
        obj_normalization=True,
        sp_score_thr=0.02,
        nms=True,
        matrix_nms_kernel='linear',
        stuff_classes=[0, 1],
        merge_type='learnable_online'))

color_mean = (
    0.47793125906962 * 255,
    0.4303257521323044 * 255,
    0.3749598901421883 * 255)
color_std = (
    0.2834475483823543 * 255,
    0.27566157565723015 * 255,
    0.27018971370874995 * 255)

# dataset settings
train_pipeline = None
test_pipeline = None

train_dataloader = None
val_dataloader =None
test_dataloader = None

val_evaluator = None
test_evaluator = None

optim_wrapper = None

# learning rate
param_scheduler = None

custom_hooks = None
default_hooks = None

# training schedule for 1x
train_cfg = None
val_cfg = None
test_cfg = None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants