Skip to content

Support of custom allocator for atoms_positions, atoms_total_forces and atoms_new_colvar_forces #784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
HanatoK opened this issue Mar 23, 2025 · 3 comments

Comments

@HanatoK
Copy link
Member

HanatoK commented Mar 23, 2025

I would like to avoid extra data copying when implementing the interface for GPU-resident NAMD. Currently #783 holds an intermediate buffer for transposition, and copy the data from the buffer to atoms_positions, atoms_total_forces and atoms_new_colvar_forces. The CUDA kernel cannot write to atoms_positions, atoms_total_forces and atoms_new_colvar_forces directly because they are not allocated with page-locked memory managed by the CUDA runtime. To solve the issue, a custom allocator can be used for them:

#if defined(COLVARS_CUDA)
#include <cuda_runtime.h>
#endif

#if defined(COLVARS_CUDA)
  template <typename T>
  class CudaHostAllocator {
  public:
    using value_type = T;

    CudaHostAllocator() = default;

    template<typename U>
    constexpr CudaHostAllocator(const CudaHostAllocator<U>&) noexcept {}

    friend bool operator==(const CudaHostAllocator&, const CudaHostAllocator&) { return true; }
    friend bool operator!=(const CudaHostAllocator&, const CudaHostAllocator&) { return false; }

    T* allocate(size_t n) {
      T* ptr;
      if (cudaHostAlloc(&ptr, n * sizeof(T), cudaHostAllocMapped) != cudaSuccess) {
        std::cerr << "BAD ALLOC!" << std::endl;
        throw std::bad_alloc();
      }
      std::cerr << "CudaHostAllocator: allocate at " << ptr << std::endl;
      return ptr;
    }
    void deallocate(T* ptr, size_t n) noexcept {
      cudaFreeHost(&ptr);
    }
    template<typename U, typename... Args>
    void construct(U* p, Args&&... args) {
        new(p) U(std::forward<Args>(args)...);
    }

    template<typename U>
    void destroy(U* p) noexcept {
        p->~U();
    }
  };
#endif

#if defined(COLVARS_CUDA)
  template <typename T>
  using allocator_type = CudaHostAllocator<T>;
#else
  template <typename T>
  using allocator_type = std::allocator<T>;
#endif

/// \brief Current three-dimensional positions of the atoms
std::vector<cvm::rvector, allocator_type<cvm::rvector>> atoms_positions;
/// \brief Most recent total forces on each atom
std::vector<cvm::rvector, allocator_type<cvm::rvector>> atoms_total_forces;
/// \brief Forces applied from colvars, to be communicated to the MD integrator
std::vector<cvm::rvector, allocator_type<cvm::rvector>> atoms_new_colvar_forces;

But to use this allocator there will be pervasive changes to Colvars code where only std::vector<T> is used.

@HanatoK
Copy link
Member Author

HanatoK commented Mar 24, 2025

Actually after thinking again I am not quite if this is necessary. If Colvars finishes #655 this might not be necessary.

@HanatoK
Copy link
Member Author

HanatoK commented Mar 26, 2025

After testing, I found that it is still necessary to support allocating vectors on CUDA managed host-pinned memory for better performance. This has been done in b830dca and d8dd54f.

@HanatoK
Copy link
Member Author

HanatoK commented May 13, 2025

Done in #783.

@HanatoK HanatoK closed this as completed May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant