Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to collect hang dump from dotnet test running on arm64 MacOS in helix #5196

Open
marcpopMSFT opened this issue Jan 21, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@marcpopMSFT
Copy link
Member

marcpopMSFT commented Jan 21, 2025

Description

The SDK is having timeouts in our arm64 test leg. To try to track this down, we've switched to using dotnet test with blame hang to collect hang dumps from helix. However, we've been unsuccessful getting dumps collected and saved to helix.

console.d86c9d4a.log

Configuration

I do not know if the error mentioned is specific to arm64 MacOS but that's where it reproduced

Regression?

Unknown

Other information

From the test team, they said to report the issue here: dotnet/sdk#45520 (comment)

dotnet test command run
dotnet test Microsoft.NET.Build.Tests.dll -e HELIX_WORK_ITEM_TIMEOUT=02:00:00 -e DOTNET_SDK_TEST_EXECUTION_DIRECTORY=/private/tmp/helix/working/96B108B4/w/A86508FF/e/testExecutionDirectory --results-directory ./ --logger trx --logger 'console;verbosity=detailed' --blame-hang --blame-hang-timeout 15m --filter <list of tests> --

Output from the log file:

[xUnit.net 00:21:37.57] Microsoft.NET.Build.Tests: [Long Running Test] 'Microsoft.NET.Build.Tests.GivenThatWeWantToVerifyProjectReferenceCompat.Project_reference_compat(referencerTarget: "netstandard1.4", testIDPostFix: "Full", rawDependencyTargets: "netstandard1.0 netstandard1.1 netstandard1.2 netst"···, restoreSucceeds: True, buildSucceeds: True)', Elapsed: 00:21:08
[createdump] Gathering state for process 22368 
[createdump] Target process is alive
[createdump] thread_get_state(627f7) FAILED (os/kern) invalid argument (4)
[createdump] Failure took 13ms
The active test run was aborted. Reason: Test host process crashed : [createdump] thread_get_state(627f7) FAILED (os/kern) invalid argument (4)
[createdump] Failure took 13ms

Data collector 'Blame' message: The specified inactivity time of 15 minutes has elapsed. Collecting hang dumps from testhost and its child processes.
Data collector 'Blame' message: Data collector caught an exception of type 'System.IO.FileNotFoundException': 'Collect dump was enabled but no dump file was generated.'. More details: Blame: Collecting hang dump failed with error...
Results File: /private/tmp/helix/working/96B108B4/w/A86508FF/e/_dci-macm2-build-013_2025-01-13_18_59_49.trx

Attachments:
  /private/tmp/helix/working/96B108B4/w/A86508FF/e/57527c31-fa14-4d1f-ab93-ddde833bffa9/Sequence_86717a59614b4b9cbd87c41593080a68.xml
Test Run Aborted.
Total tests: Unknown
     Passed: 43
 Total time: 21.9121 Minutes

The active Test Run was aborted because the host process exited unexpectedly. Please inspect the call stack above, if available, to get more information about where the exception originated from.
The test running when the crash occurred: 
Microsoft.NET.Build.Tests.GivenThatWeWantToVerifyProjectReferenceCompat.Project_reference_compat

This test may, or may not be the source of the crash.
+ export _commandExitCode=1
+ _commandExitCode=1
@hoyosjs
Copy link
Member

hoyosjs commented Jan 21, 2025

That invalid arg is a weird one for this one. Might mean the thread died here - there's no suspicious parameter otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants