You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
File "train.py", line 229, in
main()
File "train.py", line 226, in main
run(config)
File "train.py", line 184, in run
metrics = train(x, y)
File "/BigGAN-PyTorch/train_fns.py", line 42, in train
split_D=config['split_D'])
File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 140, in forward
return self.module(*inputs, **kwargs)
File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/BigGAN-PyTorch/BigGAN.py", line 443, in forward
D_out = self.D(D_input, D_class)
File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/BigGAN-PyTorch/BigGAN.py", line 403, in forward
out = out + torch.sum(self.embed(y) * h, 1, keepdim=True)
RuntimeError: CUDA error: device-side assert triggered
The interesting thing is when I create a "mini dataset" by randomly selecting 500 images per label from the original ImageNet dataset, code runs fine. What could be the problem? How can I solve this issue?
The text was updated successfully, but these errors were encountered:
This is quite strange, I haven't seen this behaviour before. Is it possible that self.embed(y) is receiving values greater than the number of classes in the dataset? That seems to be a particularly common failure case that produces this error.
Otherwise you could try running with the flag CUDA_LAUNCH_BLOCKING=1 (if you haven't already) for a more informative stack trace.
Hello,
I am using ImageNet 64x64 and run the code with the following command :
python BigGAN-PyTorch/train.py --dataset I64_hdf5 --parallel --shuffle --num_workers 8 --batch_size 128 --num_G_accumulations 1 --num_D_accumulations 1 --num_D_steps 1--G_lr 1e-4 --D_lr 4e-4 --D_B2 0.999 --G_B2 0.999 --G_attn 32 --D_attn 32 --G_nl relu --D_nl relu --SN_eps 1e-8 --BN_eps 1e-5 --adam_eps 1e-8 --G_ortho 0.0 --G_init xavier --D_init xavier --G_eval_mode --G_ch 32 --D_ch 32 --ema --use_ema --ema_start 2000 --test_every 5000 --save_every 1000 --num_best_copies 5 --num_save_copies 2 --seed 0 --which_best FID --num_iters 200000 --num_epochs 1000 --embedding inceptionv3 --density_measure gaussian --retention_ratio 50
and getting this error:
File "train.py", line 229, in
main()
File "train.py", line 226, in main
run(config)
File "train.py", line 184, in run
metrics = train(x, y)
File "/BigGAN-PyTorch/train_fns.py", line 42, in train
split_D=config['split_D'])
File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 140, in forward
return self.module(*inputs, **kwargs)
File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/BigGAN-PyTorch/BigGAN.py", line 443, in forward
D_out = self.D(D_input, D_class)
File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/BigGAN-PyTorch/BigGAN.py", line 403, in forward
out = out + torch.sum(self.embed(y) * h, 1, keepdim=True)
RuntimeError: CUDA error: device-side assert triggered
The interesting thing is when I create a "mini dataset" by randomly selecting 500 images per label from the original ImageNet dataset, code runs fine. What could be the problem? How can I solve this issue?
The text was updated successfully, but these errors were encountered: