Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: operation not supported when calling cusparseCreate(handle) #14

Open
zzidlezz opened this issue Jun 18, 2024 · 4 comments

Comments

@zzidlezz
Copy link

When I run the following command:

python train_gcond_transduct.py --dataset cora --nlayers=2 --lr_feat=1e-4 --gpu_id=0 --lr_adj=1e-4 --r=0.5

the testing phase after 400 training sessions will encounter the following bugs:

File "E:\数据蒸馏代码\GCond-main\models\gcn.py", line 43, in forward
Epoch 350, loss_avg: 0.15287520616782013
Epoch 400, loss_avg: 0.15193571531805686
Traceback (most recent call last):
File "E:\数据蒸馏代码\GCond-main\train_gcond_transduct.py", line 57, in
agent.train()
File "E:\数据蒸馏代码\GCond-main\gcond_agent_transduct.py", line 271, in train
res.append(self.test_with_val())
File "E:\数据蒸馏代码\GCond-main\gcond_agent_transduct.py", line 99, in test_with_val
model.fit_with_val(feat_syn, adj_syn, labels_syn, data,
File "E:\数据蒸馏代码\GCond-main\models\gcn.py", line 255, in fit_with_val
self._train_with_val(labels, data, train_iters, verbose)
File "E:\数据蒸馏代码\GCond-main\models\gcn.py", line 289, in _train_with_val
output = self.forward(feat_full, adj_full_norm)
File "E:\数据蒸馏代码\GCond-main\models\gcn.py", line 100, in forward
x = layer(x, adj)
File "D:\ProgramData\Anaconda3\envs\graph_cond\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\数据蒸馏代码\GCond-main\models\gcn.py", line 43, in forward
output = torch.spmm(adj, support)
RuntimeError: CUDA error: operation not supported when calling cusparseCreate(handle)

After checking, it was said to be related to the CUDA version, but the environment I used is consistent with the one you provided. Can you provide a solution?

My environment configuration is as follows:

Package Version


ase 3.22.1
certifi 2022.12.7
charset-normalizer 3.3.2
colorama 0.4.6
cycler 0.11.0
Cython 0.29.14
deeprobust 0.2.4
fonttools 4.38.0
gensim 3.8.3
googledrivedownloader 0.4
h5py 3.8.0
idna 3.7
imageio 2.31.2
importlib-metadata 4.13.0
isodate 0.6.1
Jinja2 3.1.4
joblib 1.3.2
kiwisolver 1.4.5
littleutils 0.2.2
llvmlite 0.39.1
MarkupSafe 2.1.5
matplotlib 3.5.3
networkx 2.6.3
numba 0.56.4
numpy 1.21.6
ogb 1.3.0
outdated 0.2.2
packaging 24.0
pandas 1.3.5
Pillow 9.5.0
pip 22.3.1
protobuf 4.24.4
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-louvain 0.16
pytz 2024.1
PyWavelets 1.3.0
rdflib 6.3.2
requests 2.31.0
scikit-image 0.19.3
scikit-learn 1.0.2
scipy 1.7.3
setuptools 65.6.3
six 1.16.0
smart-open 7.0.4
tensorboardX 2.6.2.2
texttable 1.7.0
threadpoolctl 3.1.0
tifffile 2021.11.2
torch 1.7.1+cu110
torch-cluster 1.5.9
torch-geometric 1.6.3
torch-scatter 2.0.7
torch-sparse 0.6.8
torch-spline-conv 1.2.1
torchaudio 0.7.2
torchvision 0.8.2+cu110
tqdm 4.66.4
typing_extensions 4.7.1
urllib3 2.0.7
wheel 0.38.4
wincertstore 0.2
wrapt 1.16.0
zipp 3.15.0

@ChandlerBang
Copy link
Owner

Hmmmm, we only tested the code on linux/macos environments. So it could be an issue specific to the WIndows system.

I would suggest you test the function of torch.spmm() in a separate file and try to adjust the pytorch or cuda versions.

@zzidlezz
Copy link
Author

Thank you for your reply. I noticed that some of your subsequent articles directly reference the results from your original text when comparing. If my PyTorch or Cuda versions are different, can I also directly reference your results

@rockcor
Copy link

rockcor commented Jun 24, 2024

Thank you for your reply. I noticed that some of your subsequent articles directly reference the results from your original text when comparing. If my PyTorch or Cuda versions are different, can I also directly reference your results

Your torch-sparse 0.6.8 may not be intalled properly. You may intall its cuda version by:
pip install torch_sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html

I can run it on windows (CUDA 11.3) with the following environment:

deeprobust==0.2.10
matplotlib==3.5.2
networkx==2.8
numpy==1.24.4
ogb==1.3.6
PyGSP==0.5.1
scikit_learn==1.3.0
scipy==1.13.1
sortedcontainers==2.4.0
torch==1.12.1
torch_geometric==2.5.3
torch_scatter==2.0.9
torch_sparse==0.6.16+pt112cu113
tqdm==4.64.0

@ChandlerBang
Copy link
Owner

Thank you for your reply. I noticed that some of your subsequent articles directly reference the results from your original text when comparing. If my PyTorch or Cuda versions are different, can I also directly reference your results

Yeah, as long as you are working on the same setting, referencing my results should be fine. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants