Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognized import path error in Py Integration tests #43044

Open
oakrizan opened this issue Mar 5, 2025 · 4 comments
Open

Unrecognized import path error in Py Integration tests #43044

oakrizan opened this issue Mar 5, 2025 · 4 comments
Assignees

Comments

@oakrizan
Copy link
Contributor

oakrizan commented Mar 5, 2025

Discovered while backporting #42825 on Feb 24. Backports to 8.16 and 8.17 for (x-pack/)metricbeat were failing with module/mongodb/mongodb.go:27:2: unrecognized import path "go.mongodb.org/mongo-driver": https fetch: Get "https://go.mongodb.org/mongo-driver?go-get=1": dial tcp: lookup go.mongodb.org on 127.0.0.53:53: no such host.

During analysis, it was discovered that these errors started occurring since Feb 14 on base 8.16 and Feb 17 on base 8.17:
8.16: https://buildkite.com/elastic/beats-metricbeat/builds/13988#019505f4-9df4-4e14-bca0-29ec89fcc370/112-322
8.17: https://buildkite.com/elastic/beats-xpack-metricbeat/builds/11665#0195137d-710d-4390-bdc9-b2cca3d28ea3/105-370

What was tried:

  1. Rebuilt beats images in case if something was broken during the weekly image build: https://buildkite.com/elastic/vm-images-platform-ingest/builds/556
    After triggering builds - some tests succeeded, some failed again. After triggering retries, those were successful/failing without any pattern.
  2. Added dig go.mongodb.org and service systemd-resolved status commands to x-pack/metricbeat: Python Integration Tests (Module): https://buildkite.com/elastic/beats-xpack-metricbeat/builds/12049#01953e84-c117-47b4-924f-b9db384580f8/144-151
service systemd-resolved status
cd x-pack/metricbeat
mage pythonIntegTest
; <<>> DiG 9.18.30-0ubuntu0.22.04.1-Ubuntu <<>> go.mongodb.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56393
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;go.mongodb.org.			IN	A
;; AUTHORITY SECTION:
mongodb.org.		300	IN	SOA	ns-cloud-c1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
;; Query time: 7 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Tue Feb 25 19:18:41 UTC 2025
;; MSG SIZE  rcvd: 136
● systemd-resolved.service - Network Name Resolution
     Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2025-02-25 19:17:58 UTC; 43s ago
       Docs: man:systemd-resolved.service(8)
             man:org.freedesktop.resolve1(5)
             https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
             https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients
   Main PID: 391 (systemd-resolve)
     Status: "Processing requests..."
      Tasks: 1 (limit: 19178)
     Memory: 9.5M
        CPU: 93ms
     CGroup: /system.slice/systemd-resolved.service
             └─391 /lib/systemd/systemd-resolved

Based on dig output DNS query completed without errors. The authority showed that ns-cloud-c1.googledomains.com is authoritative for the mongodb.org, which might be the problem root-cause, but no proof here.
systemd-resolved seems to be working as expected.
3. Compared mongo.db and other dependency versions with 8.17 and 8.x, which didn't show any differences that could affect the behaviour.
4. Retried failed steps for multiple times, and occasionally tests were successful. The next day the problem was not detected on 8.16 and 8.17 base branches.

Need to mention, that 8.18 and 8.x didn't face the issue with import path, so the problem with beats image itself can be excluded.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 5, 2025
@oakrizan oakrizan self-assigned this Mar 6, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 6, 2025
@oakrizan
Copy link
Contributor Author

oakrizan commented Mar 6, 2025

Compared .buildkite dir contencts8.16 and 8.17 branches with main - there are differences that affect PyIntegration tests.
Checked that for beats custom VM images no specific DNS configuration is applied.

@shmsr found the related issue was found: aws/aws-k8s-tester#577 (comment). Which pointed that there might be some DNS issue; because port is 53 (127.0.0.53:53). Related PR with problem debug: #43084
Local:


Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
go.mongodb.org  canonical name = glb.mongodb.com.
Name:   glb.mongodb.com
Address: 52.21.89.200
Name:   glb.mongodb.com
Address: 54.175.147.155
Name:   glb.mongodb.com
Address: 52.206.222.245

In VM:


Server:		8.8.8.8
Address:	8.8.8.8#53
Non-authoritative answer:
*** Can't find go.mongodb.org: No answer

So, using 8.8.8.8 also did not work.

@oakrizan
Copy link
Contributor Author

oakrizan commented Mar 6, 2025

Found that core images for Ubuntu 2204 were failing since Wed 12th Feb: https://buildkite.com/elastic/ci-vm-images/builds/8546, and fails up to this date.

Will try to add CoreDns to and image.
Related PR: https://github.com/elastic/ci-agent-images/pull/1297
VM Image builds: https://buildkite.com/elastic/vm-images-platform-ingest/builds?branch=fix-dns-error
Beats test PR: #43091

Added CoreDns with Google address 8.8.8.8 - which resulted in a same error: https://buildkite.com/elastic/beats-xpack-metricbeat/builds/12536#01956c6a-75c4-47c6-9a98-f8520ca9ea29/126-391
After changing to 1.1.1.1 - step seems to be consistently successful: 1, 2, 3
Running same step on AWS - was successful as well: https://buildkite.com/elastic/beats-xpack-metricbeat/builds/12554

Potentially relevant issue was found in go repo: golang/go#18588. There's no solid proof that current situation is a result of too many open files, but if it is, then it means that changing to 1.1.1.1 and successful runs on AWS is just a coincidence.
Tried to set higher ulimit: https://github.com/elastic/ci-agent-images/pull/1297/commits/0a73987bbc640acfc919152ff4a93e2a26e991c7
The test failed wth same error: https://buildkite.com/elastic/beats-xpack-metricbeat/builds/12566#01956dae-c137-4b04-ba4f-5122ecdc7c99/152-414

@oakrizan
Copy link
Contributor Author

oakrizan commented Mar 7, 2025

On March 3 - no issue is detected while running PyInt tests on a default beats image: https://buildkite.com/elastic/beats-xpack-metricbeat/builds?branch=debug-go-imports

Beats Ubuntu 2204 was last successfully built on Feb 27: https://buildkite.com/elastic/vm-images-platform-ingest/builds/557#01954522-c485-4494-b164-cc734a272b56

Since the error is not currently reproduced, it's barely possible to make any researches and changes.

@oakrizan
Copy link
Contributor Author

oakrizan commented Mar 7, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant