Replies: 1 comment 3 replies
-
@NicPy4 does it distribute the data when you don't use checkpointing? We will be deprecating the checkpointing feature in favor of KeOps. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I try to use the MultiDeviceKernel for a time series forecast. My data has ~100.000 data samples and one input feature. To start with, I just used the example from GPyTorch repository for ExactGP with MultiDeviceKernel (https://docs.gpytorch.ai/en/latest/examples/02_Scalable_Exact_GPs/Simple_MultiGPU_GP_Regression.html) but instead of the protein data, I used my own data. I have 8 GPUs (NVIDIA A100 40Gb or NVIDIA Tesla V100 32Gb), and I work on a supercomputer at my university.
According to the paper linked to the example, running this with this set-up and the provided code should be no problem. However, I always run into 'CUDA of out memory' and to be more particular it is because of imbalanced memory usage. So when I display my memory usage with 'nividia-smi -l', I can see that my GPU0 is used ~99% while the rest is used around 30-40%. This then leads to a stop of the code at some point. This is my code:
To test the program I run it on only 4 GPUs with less data (average over half an hour instead of a data point every 5min). The memory usage while running the program with ~18000 training samples and a kernel partition of 29 looks like this:
and I'll get this error message:
When I try to run it on the full set with ~100.000 points, I get this error message:
Traceback (most recent call last):
File "main_regression.py", line 192, in
train_model()
File "main_regression.py", line 181, in train_model
torch_model = reg_model.Exact_gp(x_train, y_train, x_test, y_test)
File "/cluster/home/niclasfl/PMLaks_opt/Regression/RegressionModel.py", line 358, in Exact_gp
model, likelihood = train(train_x, train_y,
File "/cluster/home/niclasfl/PMLaks_opt/Regression/RegressionModel.py", line 293, in train
loss = closure()
File "/cluster/home/niclasfl/PMLaks_opt/Regression/RegressionModel.py", line 288, in closure
loss = -mll(output, train_y)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/gpytorch/module.py", line 31, in call
outputs = self.forward(*inputs, **kwargs)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py", line 64, in forward
res = output.log_prob(target)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/gpytorch/distributions/multivariate_normal.py", line 193, in log_prob
inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/linear_operator/operators/_linear_operator.py", line 1642, in inv_quad_logdet
preconditioner, precond_lt, logdet_p = self._preconditioner()
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/linear_operator/operators/added_diag_linear_operator.py", line 114, in _preconditioner
self._piv_chol_self = self._linear_op.pivoted_cholesky(rank=max_iter)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/linear_operator/operators/_linear_operator.py", line 1850, in pivoted_cholesky
res, pivots = func(self.representation_tree(), rank, error_tol, *self.representation())
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/linear_operator/functions/_pivoted_cholesky.py", line 72, in forward
row = apply_permutation(matrix, pi_m.unsqueeze(-1), right_permutation=None).squeeze(-2)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/linear_operator/utils/permutation.py", line 79, in apply_permutation
return to_dense(matrix.getitem((*batch_idx, left_permutation.unsqueeze(-1), right_permutation.unsqueeze(-2))))
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py", line 25, in wrapped
output = method(self, *args, **kwargs)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py", line 426, in getitem
return super().getitem(index)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/linear_operator/operators/_linear_operator.py", line 2692, in getitem
res = self._get_indices(new_row_index, new_col_index, *new_batch_indices)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/linear_operator/operators/_linear_operator.py", line 407, in _get_indices
base_linear_op = self._getitem(_noop_index, _noop_index, *batch_indices)._expand_batch(final_shape)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/linear_operator/operators/_linear_operator.py", line 380, in _expand_batch
return self.repeat(*batch_repeat, 1, 1)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py", line 25, in wrapped
output = method(self, *args, **kwargs)
File "/cluster/home/niclasfl/.local/lib/python3.8/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py", line 381, in repeat
x2 = self.x2.repeat(*batch_repeat, col_repeat, 1)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.09 GiB (GPU 0; 39.44 GiB total capacity; 28.12 GiB already allocated; 10.77 GiB free; 28.12 GiB reserved in total by PyTorch) If reser
ved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
And it doesn't matter how big the partition is. I hope someone can help me with this allocation problem of the memory usage.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions