Loss is zero while training ViTPose Base with custom dataset #138

MaxRondelli · 2024-06-05T15:22:03Z

I am trying to fine tuning with a custom dataset ViTPose Base trained on COCO 256x192. At the beginning of the training my losses are already zero.

2024-06-05 17:09:38,939 - mmpose - INFO - Epoch [1][1/18] lr: 2.376e-10, eta: 14 days, 5:07:40, time: 682.635, data_time: 2.816, heatmap_loss: 0.0000, acc_pose: 0.0000, loss: 0.0000, grad_norm: 0.0000

Debugging I've seen that the target tensor is composed of all zeros. target. Any() = False and the losses object is {'heatmap_loss': tensor(0., grad_fn=<MulBackward0>), 'acc_pose': 0.0}.

The images are all in images folder. My train.json and val.json follow this format (as seen in the documentation):

[ {     "image_file": "100-0.png",
        "image_size": [ ... ],
        "bbox": [ ... ],
        "keypoints": [ ... ] ,
 ... } ]

Does anyone know why is that? Does anyone can suggest me a documentation/tutorial to fine-tune a network with a custom dataset? Since I've seen some overlap and misunderstanding information between ViTPose and MMPose docs.

Thank you in advance.

The text was updated successfully, but these errors were encountered:

Logancreator · 2025-03-18T18:14:54Z

I am trying to fine tuning with a custom dataset ViTPose Base trained on COCO 256x192. At the beginning of the training my losses are already zero.

2024-06-05 17:09:38,939 - mmpose - INFO - Epoch [1][1/18] lr: 2.376e-10, eta: 14 days, 5:07:40, time: 682.635, data_time: 2.816, heatmap_loss: 0.0000, acc_pose: 0.0000, loss: 0.0000, grad_norm: 0.0000

Debugging I've seen that the target tensor is composed of all zeros. target. Any() = False and the losses object is {'heatmap_loss': tensor(0., grad_fn=<MulBackward0>), 'acc_pose': 0.0}.

The images are all in images folder. My train.json and val.json follow this format (as seen in the documentation):
[ {     "image_file": "100-0.png",
        "image_size": [ ... ],
        "bbox": [ ... ],
        "keypoints": [ ... ] ,
 ... } ]
Does anyone know why is that? Does anyone can suggest me a documentation/tutorial to fine-tune a network with a custom dataset? Since I've seen some overlap and misunderstanding information between ViTPose and MMPose docs.

Thank you in advance.

Hi MaxRondell,

I noticed that you closed this issue, which makes me think you might have already resolved it. Could you share some insights or suggestions on how you tackled it?

Regards,

MaxRondelli · 2025-03-18T23:53:12Z

Hi @Logancreator,

Actually I closed the issue after a long period of time since I found another solution without using ViTPose.

I only closed it since I wasn't getting any feedback from the community. I could re-open it though, it might be helpful.

Best,

KevinChan1799 · 2025-04-07T12:16:41Z

@MaxRondelli @Logancreator
你好，我之前训练的时候也遇到过这样的情况，但是当我回看自定义数据集格式（coco格式）时发现了问题。
我通过labelme标注关键点数据，通过转换的脚本将labelme格式转换为coco格式，但是转换脚本并没有转换 "area”这个字段（以下是我改正后正常训练的coco标注格式，仅供参考）

Hi, I've encountered this before when training, but when I looked back at the custom dataset format (coco format) I found the problem.
I labelled the keypoint data via labelme, and converted the labelme format to coco format via the conversion script, but the conversion script didn't convert the ‘area’ field (here's the coco labelled format after I corrected it for normal training, just for reference)

{ "id": 56, "image_id": 57, "category_id": 1, "iscrowd": 0, "bbox": [ 430.59523809523813, 282.49999999999994, 1455.9523809523812, 989.2857142857144 ], "area": 1440352.891156463, "segmentation": [ [ 430.59523809523813, 282.49999999999994, 1886.5476190476193, 1271.7857142857144 ] ], "keypoints": [ 606.7857142857144, 476.54761904761904, 1, 1856.7857142857144, 713.452380952381, 1, 1716.309523809524, 746.7857142857142, 1, 1656.7857142857144, 558.6904761904761, 1, 1573.4523809523812, 902.7380952380952, 1, 1512.7380952380954, 968.2142857142857, 1, 778.2142857142859, 614.6428571428571, 1, 763.9285714285716, 1001.5476190476189, 1, 738.9285714285716, 1214.6428571428573, 1, 693.6904761904763, 1003.9285714285714, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], "num_keypoints": 10 }

在改正area字段后，我的程序能正常计算AP和AR的值，所以你可能需要检查你的标注文件是否正确。
以上是我的解决方法，希望对你有所帮助！

After correcting the AREA field, my program calculates the AP and AR values correctly, so you may need to check that your annotation file is correct.
Above is my solution, hope it helps you!

MaxRondelli closed this as completed Aug 4, 2024

MaxRondelli reopened this Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Loss is zero while training ViTPose Base with custom dataset #138

Loss is zero while training ViTPose Base with custom dataset #138

MaxRondelli commented Jun 5, 2024 •

edited

Loading

Logancreator commented Mar 18, 2025

Uh oh!

MaxRondelli commented Mar 18, 2025

Uh oh!

KevinChan1799 commented Apr 7, 2025

Uh oh!

Loss is zero while training ViTPose Base with custom dataset #138

Loss is zero while training ViTPose Base with custom dataset #138

Comments

MaxRondelli commented Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Logancreator commented Mar 18, 2025

Uh oh!

MaxRondelli commented Mar 18, 2025

Uh oh!

KevinChan1799 commented Apr 7, 2025

Uh oh!

MaxRondelli commented Jun 5, 2024 •

edited

Loading