-
Notifications
You must be signed in to change notification settings - Fork 828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
容器启动monitor_collector_main-coredump #171
Comments
这个出core的原因,像是容器内的网卡都是ib开头的,当前3fs 建立rdma 连接,都需要一张非ib开头的网卡,否则服务会无法启动 |
容器启动是用的宿主机namespace,tcp的网卡有的 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 同时我也试过把ip地址配置在bond0 |
monitorCollectorOperator_ = std::make_unique(config_.monitor_collector()); (出现coredump的位置在这里) Result MonitorCollectorServer::beforeStart() { |
同遇到coredump ,大佬知道怎么解决吗?
|
看下这个 #178 (comment) |
@vsxen 谢谢,这个可以了,但mgmtd也启动不成功,初始化是成功了的
|
这个要看 tail -f /var/log/3fs/mgmtd_main.log,发现mgmtd_main还会起一个9000端口,跟我本地ssh端口冲突了,通过修改mgmtd_main.toml的端口为9001解决。 |
@vsxen 看起来不是同一个问题,拆掉bond网卡一样coredump [2025-03-18T07:15:04.921363415+00:00 monitor_collect: 2655 IBDevice.cc:169 INFO] ibdev2netdev: mlx5_0 port 1 ==> eth0 (Up) |
环境:
宿主机系统: Anolis OS release 8.8 amd64. (GNU libc) 2.28 kernel: 5.10
容器:ubuntu:22.04
容器启动:
docker run -it --network=host --name 3fs3 --device=/dev/infiniband:/dev/infiniband -v /etc/libibverbs.d:/etc/libibverbs.d --cap-add=NET_RAW --cap-add=IPC_LOCK --cap-add=CAP_NET_ADMIN --privileged 3fs:v1 /bin/bash
容器:
show_gids
DEV PORT INDEX GID IPv4 VER DEV
mlx5_bond_0 1 0 fe80:0000:0000:0000:0ac0:ebff:fe5a:2008 v1 bond0
mlx5_bond_0 1 1 fe80:0000:0000:0000:0ac0:ebff:fe5a:2008 v2 bond0
mlx5_bond_0 1 2 0000:0000:0000:0000:0000:ffff:0ac7:2516 192.168.1.2 v1 bond0.600
mlx5_bond_0 1 3 0000:0000:0000:0000:0000:ffff:0ac7:2516 192.168.1.2 v2 bond0.600
启动:
/opt/3fs/bin/monitor_collector_main --cfg /opt/3fs/etc/monitor_collector_main.toml
[2025-03-14T01:04:43.799468495+00:00 monitor_collect:81240 IBDevice.cc:169 INFO] ibdev2netdev: mlx5_bond_0 port 1 ==> bond0 (Up)
[2025-03-14T01:04:43.799533255+00:00 monitor_collect:81240 IBDevice.cc:186 INFO] ibdev2netdev parsed: mlx5_bond_0 => bond0
[2025-03-14T01:04:43.799691852+00:00 monitor_collect:81240 IfAddrs.h:102 INFO] Get ifaddr of bond0.600, addr 192.168.1.2, subnet 192.168.1.0/24, up true
[2025-03-14T01:04:43.802160844+00:00 monitor_collect:81240 IBDevice.cc:386 WARNING] IfAddr of mlx5_bond_0:1 -> bond0 not found, maybe running in container!
[2025-03-14T01:04:43.802173396+00:00 monitor_collect:81240 IBDevice.cc:441 CRITICAL] IBDevice mlx5_bond_0:1 can't set zone by IP, fallback to UNKNOWN
[2025-03-14T01:04:43.802249521+00:00 monitor_collect:81240 IBDevice.cc:367 INFO] IBDevice mlx5_bond_0 add active port 1, linklayer ETHERNET, addrs , zones UNKNOWN, RoCE v2 GID 0:0:0:0:0:0:0:0:0:0:ff:ff:a:c7:25:16
[2025-03-14T01:04:43.802260518+00:00 monitor_collect:81240 IBDevice.cc:256 INFO] IBDevice add mlx5_bond_0, id 0, 1 available ports
[2025-03-14T01:04:43.803790460+00:00 IBManager:81267 EventLoop.cc:116 INFO] EventLoop::loop() started.
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] Folly log json configure: {
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "categories": {
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] ".": {
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "level": "INFO",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "inherit": true,
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "propagate": "NONE",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "handlers": [
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "normal",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "err",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "fatal"
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] ]
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] }
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] },
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "handlers": {
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "normal": {
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "type": "file",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "options": {
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "path": "/var/log/3fs/monitor_collector_main.log",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "async": "true",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "rotate": "true",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "max_files": "10",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "max_file_size": "104857600",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "rotate_on_open": "false"
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] }
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] },
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "err": {
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "type": "file",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "options": {
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "level": "ERR",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "path": "/var/log/3fs/monitor_collector_main-err.log",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "async": "false",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "rotate": "true",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "max_files": "10",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "max_file_size": "104857600",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "rotate_on_open": "false"
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] }
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] },
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "fatal": {
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "type": "stream",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "options": {
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "level": "FATAL",
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] "stream": "stderr"
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] }
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] }
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] }
[2025-03-14T01:04:43.803936435+00:00 monitor_collect:81240 LogConfig.cc:96 INFO] }
[2025-03-14T01:04:43.804024793+00:00 monitor_collect:81240 OnePhaseApplication.h:87 INFO] LogConfig: {"categories":{".":{"level":"INFO","inherit":true,"propagate":"NONE","handlers":["normal","err","fatal"]}},"handlers":{"normal":{"type":"file","options":{"path":"/var/log/3fs/monitor_collector_main.log","async":"true","rotate":"true","max_files":"10","max_file_size":"104857600","rotate_on_open":"false"}},"err":{"type":"file","options":{"level":"ERR","path":"/var/log/3fs/monitor_collector_main-err.log","async":"false","rotate":"true","max_files":"10","max_file_size":"104857600","rotate_on_open":"false"}},"fatal":{"type":"stream","options":{"level":"FATAL","stream":"stderr"}}}}
Segmentation fault (core dumped)
gdb:
#0 0x00001494f414391f in make_request (pid=81294, fd=30) at ../sysdeps/unix/sysv/linux/check_pf.c:147
147 ../sysdeps/unix/sysv/linux/check_pf.c: No such file or directory.
[Current thread is 1 (Thread 0x1494b96f8640 (LWP 81343))]
(gdb) bt full
#0 0x00001494f414391f in make_request (pid=81294, fd=30) at ../sysdeps/unix/sysv/linux/check_pf.c:147
__result =
result_len = 0
nladdr = {nl_family = 16, nl_pad = 0, nl_pid = 0, nl_groups = 0}
buf = '\000' <repeats 2468 times>...
seen_ipv6 =
result_cap = 32
req = {nlh = {nlmsg_len = 20, nlmsg_type = 22, nlmsg_flags = 769, nlmsg_seq = 1741914359, nlmsg_pid = 0}, g = {rtgen_family = 0 '\000'}, pad = "\000\000"}
done =
seen_ipv4 =
result = 0x0
buf_size = 4096
iov = {iov_base = 0x1494b92ddd80, iov_len = 4096}
result =
result_len =
result_cap =
req =
nladdr =
PRETTY_FUNCTION =
buf_size =
buf =
iov =
out_fail =
done =
seen_ipv4 =
seen_ipv6 =
out =
__result =
msg =
read_len =
nlmh =
__result =
ifam =
rta =
len =
local =
address =
info =
__a =
#1 __check_pf (seen_ipv4=seen_ipv4@entry=0x1494b92defd6, seen_ipv6=seen_ipv6@entry=0x1494b92defd7, in6ai=in6ai@entry=0x1494b92defe8,
in6ailen=in6ailen@entry=0x1494b92deff0) at ../sysdeps/unix/sysv/linux/check_pf.c:329
@echaozh
The text was updated successfully, but these errors were encountered: