You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to create dashboard on azure for GPU Monitoring.
I followed the steps mentioned and updated the fields with the required details in start_gpu_data_collector.sh file, when I tried to execute the scripts it throws below error. gethostbyname("::1") failed.
Then I updated the script and executed as below
./gpu_data_collector.py -tis $INTERVAL_SECS -dfi $DCGM_FIELD_IDS > /tmp/gpu_data_collector.log, though I don't see any GPU Monitor custom fields created in the Log Analytics Workspace.
The text was updated successfully, but these errors were encountered:
I am curious what modification to the script did you make to overcome the gethostbyname error?
Are you using this GPU monitoring script to monitor SLURM jobs GPU activity? (This is the default behavior). So, if you do not have a SLURM job running no GPU monitoring data will be sent to Azure monitor.
If you would like all processes on nodes to be monitored (even if they are not associated with a SLURM job) for GPU activity and the data to be sent to Azure Monitor (then add the -fgm command line argument).
I have made some corrections to the start-up and shutdown scripts (start_gpu_data_collector.sh, stop_gpu_data_collector.sh), see #584
If you still do not see any data being sent to log analytics (Custom logs), then please send me the stdout/stderr (/tmp/gpu_data_collector.log) when you execute the python script.
Hello,
I am trying to create dashboard on azure for GPU Monitoring.
I followed the steps mentioned and updated the fields with the required details in start_gpu_data_collector.sh file, when I tried to execute the scripts it throws below error.
gethostbyname("::1") failed.
Then I updated the script and executed as below
./gpu_data_collector.py -tis $INTERVAL_SECS -dfi $DCGM_FIELD_IDS > /tmp/gpu_data_collector.log, though I don't see any GPU Monitor custom fields created in the Log Analytics Workspace.
The text was updated successfully, but these errors were encountered: