Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encounter error: 6004(MgmtdClient::RoutingInfoNotReady) #175

Open
izxl007 opened this issue Mar 14, 2025 · 5 comments
Open

Encounter error: 6004(MgmtdClient::RoutingInfoNotReady) #175

izxl007 opened this issue Mar 14, 2025 · 5 comments

Comments

@izxl007
Copy link
Contributor

izxl007 commented Mar 14, 2025

root@host-214:/3fs# /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.234.5.197:8000"]' "set-config --type FUSE --file /opt/3fs/etc/hf3fs_fuse_main.toml"
Encounter error: 6004(MgmtdClient::RoutingInfoNotReady)
同样的命令和OS,一个节点执行成功,一个节点执行失败。
大家有没有碰到过?

@Icedroid
Copy link

Icedroid commented Mar 15, 2025

我也遇到了这个问题,怎么解决的呢

# /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.3.25:8000"]' "list-nodes"
Encounter error: 6004(MgmtdClient::RoutingInfoNotReady)

然后mgmtd_main-err.log的错误如下:

[2025-03-15T10:06:53.201396617+00:00 SvrConn0:  228 IBSocket.cc:738 CRITICAL] IBSocket RDMA://localhost.localdomain/mlx5_0:1/953 accept timeout, port {mlx5_1:1, ifaddrs [], zones [UNKNOWN], ETHERNET, ACTIVE}
[2025-03-15T10:06:53.201434547+00:00 SvrConn0:  228 Listener.cc:218 ERROR] IBSocket RDMA://localhost.localdomain/mlx5_0:1/953 still in ACCEPTED state after wait 15s
[2025-03-15T10:07:02.929891558+00:00 SvrConn1:  229 IBSocket.cc:738 CRITICAL] IBSocket RDMA://localhost.localdomain/mlx5_0:1/954 accept timeout, port {mlx5_1:1, ifaddrs [], zones [UNKNOWN], ETHERNET, ACTIVE}
[2025-03-15T10:07:02.929936771+00:00 SvrConn1:  229 Listener.cc:218 ERROR] IBSocket RDMA://localhost.localdomain/mlx5_0:1/954 still in ACCEPTED state after wait 15s
[2025-03-15T10:07:04.138540139+00:00 SvrConn1:  229 IBSocket.cc:738 CRITICAL] IBSocket RDMA://localhost.localdomain/mlx5_0:1/955 accept timeout, port {mlx5_2:1, ifaddrs [], zones [UNKNOWN], ETHERNET, ACTIVE}

@izxl007
Copy link
Contributor Author

izxl007 commented Mar 15, 2025

我也遇到了这个问题,怎么解决的呢

# /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.3.25:8000"]' "list-nodes"
Encounter error: 6004(MgmtdClient::RoutingInfoNotReady)

然后mgmtd_main-err.log的错误如下:

[2025-03-15T10:06:53.201396617+00:00 SvrConn0:  228 IBSocket.cc:738 CRITICAL] IBSocket RDMA://localhost.localdomain/mlx5_0:1/953 accept timeout, port {mlx5_1:1, ifaddrs [], zones [UNKNOWN], ETHERNET, ACTIVE}
[2025-03-15T10:06:53.201434547+00:00 SvrConn0:  228 Listener.cc:218 ERROR] IBSocket RDMA://localhost.localdomain/mlx5_0:1/953 still in ACCEPTED state after wait 15s
[2025-03-15T10:07:02.929891558+00:00 SvrConn1:  229 IBSocket.cc:738 CRITICAL] IBSocket RDMA://localhost.localdomain/mlx5_0:1/954 accept timeout, port {mlx5_1:1, ifaddrs [], zones [UNKNOWN], ETHERNET, ACTIVE}
[2025-03-15T10:07:02.929936771+00:00 SvrConn1:  229 Listener.cc:218 ERROR] IBSocket RDMA://localhost.localdomain/mlx5_0:1/954 still in ACCEPTED state after wait 15s
[2025-03-15T10:07:04.138540139+00:00 SvrConn1:  229 IBSocket.cc:738 CRITICAL] IBSocket RDMA://localhost.localdomain/mlx5_0:1/955 accept timeout, port {mlx5_2:1, ifaddrs [], zones [UNKNOWN], ETHERNET, ACTIVE}

应该是网络的问题。我这边重启节点就好了。
3fs对网络很敏感,不要轻易动网络配置。

@Icedroid
Copy link

@izxl007 我是容器化部署的,重启没用

@feeyman
Copy link

feeyman commented Mar 28, 2025

192.168.3.25

你好,请问192.168.3.25是IB卡的地址吗?我部署时候mgmtd_main的8000端口总是绑定到管理网地址上,请问遇到过吗?

@izxl007
Copy link
Contributor Author

izxl007 commented Apr 1, 2025

192.168.3.25

你好,请问192.168.3.25是IB卡的地址吗?我部署时候mgmtd_main的8000端口总是绑定到管理网地址上,请问遇到过吗?

3fs的表现是和部署时候网卡的顺序相关,后面如果修改网卡的顺序,可能就会导致服务起不来

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants