Support of custom allocator for `atoms_positions`, `atoms_total_forces` and `atoms_new_colvar_forces` #784

HanatoK · 2025-03-23T15:30:58Z

I would like to avoid extra data copying when implementing the interface for GPU-resident NAMD. Currently #783 holds an intermediate buffer for transposition, and copy the data from the buffer to atoms_positions, atoms_total_forces and atoms_new_colvar_forces. The CUDA kernel cannot write to atoms_positions, atoms_total_forces and atoms_new_colvar_forces directly because they are not allocated with page-locked memory managed by the CUDA runtime. To solve the issue, a custom allocator can be used for them:

#if defined(COLVARS_CUDA)
#include <cuda_runtime.h>
#endif

#if defined(COLVARS_CUDA)
  template <typename T>
  class CudaHostAllocator {
  public:
    using value_type = T;

    CudaHostAllocator() = default;

    template<typename U>
    constexpr CudaHostAllocator(const CudaHostAllocator<U>&) noexcept {}

    friend bool operator==(const CudaHostAllocator&, const CudaHostAllocator&) { return true; }
    friend bool operator!=(const CudaHostAllocator&, const CudaHostAllocator&) { return false; }

    T* allocate(size_t n) {
      T* ptr;
      if (cudaHostAlloc(&ptr, n * sizeof(T), cudaHostAllocMapped) != cudaSuccess) {
        std::cerr << "BAD ALLOC!" << std::endl;
        throw std::bad_alloc();
      }
      std::cerr << "CudaHostAllocator: allocate at " << ptr << std::endl;
      return ptr;
    }
    void deallocate(T* ptr, size_t n) noexcept {
      cudaFreeHost(&ptr);
    }
    template<typename U, typename... Args>
    void construct(U* p, Args&&... args) {
        new(p) U(std::forward<Args>(args)...);
    }

    template<typename U>
    void destroy(U* p) noexcept {
        p->~U();
    }
  };
#endif

#if defined(COLVARS_CUDA)
  template <typename T>
  using allocator_type = CudaHostAllocator<T>;
#else
  template <typename T>
  using allocator_type = std::allocator<T>;
#endif

/// \brief Current three-dimensional positions of the atoms
std::vector<cvm::rvector, allocator_type<cvm::rvector>> atoms_positions;
/// \brief Most recent total forces on each atom
std::vector<cvm::rvector, allocator_type<cvm::rvector>> atoms_total_forces;
/// \brief Forces applied from colvars, to be communicated to the MD integrator
std::vector<cvm::rvector, allocator_type<cvm::rvector>> atoms_new_colvar_forces;

But to use this allocator there will be pervasive changes to Colvars code where only std::vector<T> is used.

The text was updated successfully, but these errors were encountered:

HanatoK · 2025-03-24T15:07:02Z

Actually after thinking again I am not quite if this is necessary. If Colvars finishes #655 this might not be necessary.

HanatoK · 2025-03-26T15:52:47Z

After testing, I found that it is still necessary to support allocating vectors on CUDA managed host-pinned memory for better performance. This has been done in b830dca and d8dd54f.

HanatoK · 2025-05-13T14:04:09Z

Done in #783.

HanatoK added the optimization label Mar 26, 2025

HanatoK closed this as completed May 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of custom allocator for `atoms_positions`, `atoms_total_forces` and `atoms_new_colvar_forces` #784

Support of custom allocator for `atoms_positions`, `atoms_total_forces` and `atoms_new_colvar_forces` #784

HanatoK commented Mar 23, 2025

HanatoK commented Mar 24, 2025

HanatoK commented Mar 26, 2025

HanatoK commented May 13, 2025

Support of custom allocator for atoms_positions, atoms_total_forces and atoms_new_colvar_forces #784

Support of custom allocator for atoms_positions, atoms_total_forces and atoms_new_colvar_forces #784

Comments

HanatoK commented Mar 23, 2025

HanatoK commented Mar 24, 2025

HanatoK commented Mar 26, 2025

HanatoK commented May 13, 2025

Support of custom allocator for `atoms_positions`, `atoms_total_forces` and `atoms_new_colvar_forces` #784

Support of custom allocator for `atoms_positions`, `atoms_total_forces` and `atoms_new_colvar_forces` #784