Skip to content

Did the classification module help? #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chenyzh28 opened this issue Feb 23, 2019 · 18 comments
Open

Did the classification module help? #6

chenyzh28 opened this issue Feb 23, 2019 · 18 comments

Comments

@chenyzh28
Copy link

chenyzh28 commented Feb 23, 2019

Thanks for your work! I removed the classification module and its related loss and the performance is about 77%. I wonder if you have done the experiments with classification loss (it seems to serve as a guide to segmentation, if not detach with previous part).

@chezhizhong
Copy link

how do you work it? i work it like you and i get some wrongs: /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [452,0,0] Assertion t >= 0 && t < n_classes failed. could you give some advice? Thank you!

@chenyzh28
Copy link
Author

This problem may be caused by the maximum value in label exceeding n_classes. Have you pre-processed the VOC data following https://www.sun11.me/blog/2018/how-to-use-10582-trainaug-images-on-DeeplabV3-code/

@chezhizhong
Copy link

Thanks for your reply! I work my code again, i get a new wrong: pixel_acc += mask_pred.max(dim=1)[1].data.cpu().eq(mask_labels.squeeze(1).cpu()).float().mean(), i check the two's shape, they are identical.

@chenyzh28
Copy link
Author

What is the error message?

@chezhizhong
Copy link

Traceback (most recent call last):
File "train.py", line 205, in
train(epoch, optimizer, training_loader)
File "train.py", line 131, in train
pixel_acc += mask_pred.max(dim=1)[1].data.cpu().eq(mask_labels.squeeze(1).cpu()).float().mean()
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/generic/THCTensorCopy.cpp:70

@chenyzh28
Copy link
Author

This is because your GPU has not enough memory for the training process. You can decrease the batch size and try again.

@chezhizhong
Copy link

Sorry to bother you again. i get the error again: /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [228,0,0] Assertion t >= 0 && t < n_classes failed. how can i resolve this error? how the maximum set?

@chenyzh28
Copy link
Author

You can follow this link https://www.sun11.me/blog/2018/how-to-use-10582-trainaug-images-on-DeeplabV3-code/ and re-construct your dataset.

@zhenmafan7
Copy link

I'm sorry to bother u,I want to ask in dataset.py,
"labels = np.load('/home/liekkas/DISK2/jian/PASCAL/VOC2012/cls_labels.npy')[()]"
I can't find cls_label.npy,what should I do to slove this problem.(明明都是中国人,但是我还是要用渣英文来求助,心好累哦,哭)

@chenyzh28
Copy link
Author

这个是分类的标签,作者没有提供,你可以把分类那个分支给去掉,然后这句话可以注释掉,不输出分类的标签。

@zhenmafan7
Copy link

这个是分类的标签,作者没有提供,你可以把分类那个分支给去掉,然后这句话可以注释掉,不输出分类的标签。

我知道有点蠢,但是我还是想问,分类的分支是哪个部分啊?TAT

@chenyzh28
Copy link
Author

classifier模块(不是mask_classifier)就是分类的分支,你把相关部分都删掉就可以。

@zhenmafan7
Copy link

classifier模块(不是mask_classifier)就是分类的分支,你把相关部分都删掉就可以。

谢谢你,我理论上知道了,实际上操作...我慢慢、慢慢、慢慢、慢慢琢磨吧(真难TAT)
thank you
ありがとうございます
감사합니다

@zhenmafan7
Copy link

不好意思再次打扰你,我按照你所说把分支模块删除后进行训练,随后运行eval.py时加了print打印了测试结果(如果不加我看不到任何结果,不知道是不是代码哪里改错了),打印的结果如下:
Length of test set:1449 Each_cls_IOU:{'background': 0.0, 'aeroplane': 0.0, 'bicycle': 0.0, 'bird': 0.0, 'boat': 0.0, 'bottle': 0.0, 'bus': 0.0, 'car': 0.0, 'cat': 0.0, 'chair': 0.0, 'cow': 0.0, 'diningtable': 0.0, 'dog': 0.0, 'horse': 0.0, 'motorbike': 0.0, 'person': 18.41086525189786, 'pottedplant': 0.0, 'sheep': 0.0, 'sofa': 0.0, 'train': 0.0, 'tvmonitor': 0.0} mIOU:0.8767 PA:5.03% loss_ic0.000000
这很奇怪,我可以对照一下你修改后的文件吗?

@LeeThrzz
Copy link

LeeThrzz commented May 8, 2019

不好意思再次打扰你,我按照你所说把分支模块删除后进行训练,随后运行eval.py时加了print打印了测试结果(如果不加我看不到任何结果,不知道是不是代码哪里改错了),打印的结果如下:
Length of test set:1449 Each_cls_IOU:{'background': 0.0, 'aeroplane': 0.0, 'bicycle': 0.0, 'bird': 0.0, 'boat': 0.0, 'bottle': 0.0, 'bus': 0.0, 'car': 0.0, 'cat': 0.0, 'chair': 0.0, 'cow': 0.0, 'diningtable': 0.0, 'dog': 0.0, 'horse': 0.0, 'motorbike': 0.0, 'person': 18.41086525189786, 'pottedplant': 0.0, 'sheep': 0.0, 'sofa': 0.0, 'train': 0.0, 'tvmonitor': 0.0} mIOU:0.8767 PA:5.03% loss_ic0.000000
这很奇怪,我可以对照一下你修改后的文件吗?

你好,请问你这个开源代码跑出来了吗,我最近也开始尝试这个代码,遇到了和你一样的问题,在你的个人主页上没有找到邮件,请问可以联系你吗?

@zhenmafan7
Copy link

不好意思再次打扰你,我按照你所说把分支模块删除后进行训练,随后运行eval.py时加了print打印了测试结果(如果不加我看不到任何结果,不知道是不是代码哪里改错了),打印的结果如下:
Length of test set:1449 Each_cls_IOU:{'background': 0.0, 'aeroplane': 0.0, 'bicycle': 0.0, 'bird': 0.0, 'boat': 0.0, 'bottle': 0.0, 'bus': 0.0, 'car': 0.0, 'cat': 0.0, 'chair': 0.0, 'cow': 0.0, 'diningtable': 0.0, 'dog': 0.0, 'horse': 0.0, 'motorbike': 0.0, 'person': 18.41086525189786, 'pottedplant': 0.0, 'sheep': 0.0, 'sofa': 0.0, 'train': 0.0, 'tvmonitor': 0.0} mIOU:0.8767 PA:5.03% loss_ic0.000000
这很奇怪,我可以对照一下你修改后的文件吗?

你好,请问你这个开源代码跑出来了吗,我最近也开始尝试这个代码,遇到了和你一样的问题,在你的个人主页上没有找到邮件,请问可以联系你吗?

我训练完之后就是上述的结果,后续没有再尝试过了,主页开放了邮箱可以联系,关于这套代码的话我放弃了,不知道是否还能帮上你。

@FantasyJXF
Copy link

if you use torch.nn.CrossEntropyLoss, set the ignore_index may help.

The Pascal VOC ignore classifier 255, which was its white border, if you have boader, just ignore the board ID.

@LiouCZ
Copy link

LiouCZ commented May 15, 2020

那个分类分支的loss能提高性能吗,有什么用?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants