Skip to content

Commit e5243e4

Browse files
iwatake2222pre-commit-ci[bot]ito-san
authored
feat(system_monitor): check UDP network errors (#9538)
* feat(system_monitor): generalize logic for /proc/net/snmp Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> * feat(system_monitor): add UDP buf errors check Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> * fix calculation for errors per unit time at the first time Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> * style(pre-commit): autofix * organize code Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> * style(pre-commit): autofix * fix warnings Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> * remove unnecessary fmt::format Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> * organize code for metrics from /proc/net/snmp Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> * style(pre-commit): autofix * separate ROS 2 parameters from constructor Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> * suppress log Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> * style(pre-commit): autofix * fix bugprone-fold-init-type Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> * dummy commit to kick workflows Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> --------- Signed-off-by: takeshi.iwanari <take.iwiw2222@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ito-san <57388357+ito-san@users.noreply.github.com>
1 parent 160e47b commit e5243e4

File tree

6 files changed

+333
-140
lines changed

6 files changed

+333
-140
lines changed

system/system_monitor/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ Every topic is published in 1 minute interval.
7777
| | Network Usage |||| Notification of usage only, normally error not generated. |
7878
| | Network CRC Error |||| Warning occurs when the number of CRC errors in the period reaches the threshold value. The number of CRC errors that occur is the same as the value that can be confirmed with the ip command. |
7979
| | IP Packet Reassembles Failed |||| |
80+
| | UDP Buf Errors |||| |
8081
| NTP Monitor | NTP Offset |||| |
8182
| Process Monitor | Tasks Summary |||| |
8283
| | High-load Proc[0-9] |||| |

system/system_monitor/config/net_monitor.param.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,5 @@
77
crc_error_count_threshold: 1
88
reassembles_failed_check_duration: 1
99
reassembles_failed_check_count: 1
10+
udp_buf_errors_check_duration: 1
11+
udp_buf_errors_check_count: 1

system/system_monitor/docs/ros_parameters.md

+2
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,8 @@ net_monitor:
6969
| crc_error_count_threshold | int | n/a | 1 | Generates warning when count of CRC errors during CRC error check duration reaches a specified value or higher. |
7070
| reassembles_failed_check_duration | int | sec | 1 | IP packet reassembles failed check duration. |
7171
| reassembles_failed_check_count | int | n/a | 1 | Generates warning when count of IP packet reassembles failed during IP packet reassembles failed check duration reaches a specified value or higher. |
72+
| udp_buf_errors_check_duration | int | sec | 1 | UDP buf errors check duration. |
73+
| udp_buf_errors_check_count | int | n/a | 1 | Generates warning when count of UDP buf errors during udp_buf_errors_check_duration reaches a specified value or higher. |
7274

7375
## <u>NTP Monitor</u>
7476

system/system_monitor/docs/topics_net_monitor.md

+20
Original file line numberDiff line numberDiff line change
@@ -106,3 +106,23 @@
106106
| --------------------------------------- | --------------- |
107107
| total packet reassembles failed | 0 |
108108
| packet reassembles failed per unit time | 0 |
109+
110+
## <u>UDP Buf Errors</u>
111+
112+
/diagnostics/net_monitor: UDP Buf Errors
113+
114+
<b>[summary]</b>
115+
116+
| level | message |
117+
| ----- | -------------- |
118+
| OK | OK |
119+
| WARN | UDP buf errors |
120+
121+
<b>[values]</b>
122+
123+
| key | value (example) |
124+
| -------------------------------- | --------------- |
125+
| total UDP rcv buf errors | 0 |
126+
| UDP rcv buf errors per unit time | 0 |
127+
| total UDP snd buf errors | 0 |
128+
| UDP snd buf errors per unit time | 0 |

system/system_monitor/include/system_monitor/net_monitor/net_monitor.hpp

+96-22
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,93 @@ struct CrcErrors
8181
unsigned int last_rx_crc_errors{0}; //!< @brief rx_crc_error at the time of the last monitoring
8282
};
8383

84+
/**
85+
* @brief /proc/net/snmp information
86+
*/
87+
class NetSnmp
88+
{
89+
public:
90+
enum class Result {
91+
OK,
92+
CHECK_WARNING,
93+
READ_ERROR,
94+
};
95+
96+
/**
97+
* @brief Constructor
98+
* @param [in] node node using this class.
99+
*/
100+
explicit NetSnmp(rclcpp::Node * node);
101+
102+
/**
103+
* @brief Constructor
104+
*/
105+
NetSnmp() = delete;
106+
107+
/**
108+
* @brief Copy constructor
109+
*/
110+
NetSnmp(const NetSnmp &) = delete;
111+
112+
/**
113+
* @brief Copy assignment operator
114+
*/
115+
NetSnmp & operator=(const NetSnmp &) = delete;
116+
117+
/**
118+
* @brief Move constructor
119+
*/
120+
NetSnmp(const NetSnmp &&) = delete;
121+
122+
/**
123+
* @brief Move assignment operator
124+
*/
125+
NetSnmp & operator=(const NetSnmp &&) = delete;
126+
127+
/**
128+
* @brief Set parameters for check
129+
* @param [in] check_duration the value for check_duration
130+
* @param [in] check_count the value for check_count
131+
*/
132+
void set_check_parameters(unsigned int check_duration, unsigned int check_count);
133+
134+
/**
135+
* @brief Find index in `/proc/net/snmp`
136+
* @param [in] protocol Protocol name (the first column string). e.g. "Ip:" or "Udp:"
137+
* @param [in] metrics Metrics name. e.g. "ReasmFails"
138+
*/
139+
void find_index(const std::string & protocol, const std::string & metrics);
140+
141+
/**
142+
* @brief Check metrics
143+
* @param [out] current_value the value read from snmp
144+
* @param [out] value_per_unit_time the increase of the value during the duration
145+
* @return the result of check
146+
*/
147+
Result check_metrics(uint64_t & current_value, uint64_t & value_per_unit_time);
148+
149+
private:
150+
/**
151+
* @brief Read value from `/proc/net/snmp`
152+
* @param [in] index_row row in `/proc/net/snmp`
153+
* @param [in] index_col col in `/proc/net/snmp`
154+
* @param [out] output_value retrieved value
155+
* @return execution result
156+
*/
157+
bool read_value_from_proc(
158+
unsigned int index_row, unsigned int index_col, uint64_t & output_value);
159+
160+
rclcpp::Logger logger_; //!< @brief logger gotten from user node
161+
unsigned int check_duration_; //!< @brief check duration
162+
unsigned int check_count_; //!< @brief check count threshold
163+
unsigned int index_row_; //!< @brief index for the target metrics in /proc/net/snmp
164+
unsigned int index_col_; //!< @brief index for the target metrics in /proc/net/snmp
165+
uint64_t current_value_; //!< @brief the value read from snmp
166+
uint64_t last_value_; //!< @brief the value read from snmp at the last monitoring
167+
uint64_t value_per_unit_time_; //!< @brief the increase of the value during the duration
168+
std::deque<unsigned int> queue_; //!< @brief queue that holds the delta of the value
169+
};
170+
84171
namespace local = boost::asio::local;
85172

86173
class NetMonitor : public rclcpp::Node
@@ -150,6 +237,12 @@ class NetMonitor : public rclcpp::Node
150237
*/
151238
void check_reassembles_failed(diagnostic_updater::DiagnosticStatusWrapper & status);
152239

240+
/**
241+
* @brief Check UDP buf errors
242+
* @param [out] status diagnostic message passed directly to diagnostic publish calls
243+
*/
244+
void check_udp_buf_errors(diagnostic_updater::DiagnosticStatusWrapper & status);
245+
153246
/**
154247
* @brief Timer callback
155248
*/
@@ -273,18 +366,6 @@ class NetMonitor : public rclcpp::Node
273366
*/
274367
void close_connection();
275368

276-
/**
277-
* @brief Get column index of IP packet reassembles failed from `/proc/net/snmp`
278-
*/
279-
void get_reassembles_failed_column_index();
280-
281-
/**
282-
* @brief get IP packet reassembles failed
283-
* @param [out] reassembles_failed IP packet reassembles failed
284-
* @return execution result
285-
*/
286-
bool get_reassembles_failed(uint64_t & reassembles_failed);
287-
288369
diagnostic_updater::Updater updater_; //!< @brief Updater class which advertises to /diagnostics
289370
rclcpp::TimerBase::SharedPtr timer_; //!< @brief timer to get Network information
290371

@@ -307,16 +388,9 @@ class NetMonitor : public rclcpp::Node
307388
unsigned int crc_error_check_duration_; //!< @brief CRC error check duration
308389
unsigned int crc_error_count_threshold_; //!< @brief CRC error count threshold
309390

310-
std::deque<unsigned int>
311-
reassembles_failed_queue_; //!< @brief queue that holds count of IP packet reassembles failed
312-
uint64_t last_reassembles_failed_; //!< @brief IP packet reassembles failed at the time of the
313-
//!< last monitoring
314-
unsigned int
315-
reassembles_failed_check_duration_; //!< @brief IP packet reassembles failed check duration
316-
unsigned int
317-
reassembles_failed_check_count_; //!< @brief IP packet reassembles failed check count threshold
318-
unsigned int reassembles_failed_column_index_; //!< @brief column index of IP Reassembles failed
319-
//!< in /proc/net/snmp
391+
NetSnmp reassembles_failed_info_; //!< @brief information of IP packet reassembles failed
392+
NetSnmp udp_rcvbuf_errors_info_; //!< @brief information of UDP rcv buf errors
393+
NetSnmp udp_sndbuf_errors_info_; //!< @brief information of UDP snd buf errors
320394

321395
/**
322396
* @brief Network connection status messages

0 commit comments

Comments
 (0)