Skip to content

Segfaults in Windows 2022 MSVC CI job #304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Dobiasd opened this issue Jun 9, 2024 · 14 comments
Closed

Segfaults in Windows 2022 MSVC CI job #304

Dobiasd opened this issue Jun 9, 2024 · 14 comments

Comments

@Dobiasd
Copy link
Owner

Dobiasd commented Jun 9, 2024

https://github.com/Dobiasd/FunctionalPlus/actions/runs/9435686090/job/25989531257

97% tests passed, 11 tests failed out of 423
Total Test time (real) =   2.88 sec
The following tests FAILED:
	  2 - benchmark_example (SEGFAULT)
	  3 - benchmark_input_output_args (SEGFAULT)
	307 - queue_test - full (SEGFAULT)
	364 - side_effects_test - execute_parallelly (SEGFAULT)
	368 - side_effects_test - execute_parallelly_atomic (SEGFAULT)
	369 - side_effects_test - for_each (serial and parallel (SEGFAULT)
	384 - transform_test - transform (SEGFAULT)
	385 - transform_test - reduce (SEGFAULT)
	386 - transform_test - keep_if_parallelly (SEGFAULT)
	389 - transform_test - transform_reduce_parallelly (SEGFAULT)
	390 - transform_test - transform_reduce_1_parallelly (SEGFAULT)

Looks like something thread-related.

@pthom
Copy link
Contributor

pthom commented Jun 11, 2024

Hello Dobias,

I tried to tackle this issue. Unfortunately I failed, and I cannot study it more in detail because I'm stuck on other projects.

However, I have some details which might be of interest to you.

  1. I cannot reproduce this on my computer, using Visual Studio 2022 Preview for ARM
image

The test will pass without any issue if I compile for x64 (I have an issue when compiling for ARM64, see below)

  1. Inconsistencies in the CI:
  • On Windows:
    I'm a bit confused about this error because it is a bit inconsistent: I swear I saw the CI work, then fail for no good reason.

  • on macOS:
    the CI works 70% of the times, but can fail: see this run for example.

  1. Inconsistencies when compiling for Windows ARM:
  • on Windows ARM, running ci_build.sh will fail with the following message:
99% tests passed, 1 tests failed out of 423

Total Test time (real) =   4.63 sec

The following tests FAILED:
         52 - container_common_test - interweave (Failed)
Errors while running CTest

pascal@PASCALTHOMEA90C MINGW64 /f/dvp/OpenSource/FunctionalPlus (fix_win_ci_segfault)
$ ./script/ci_build.sh 

when trying to diagnose this on my side, I see that it works perfectly in debug mode, But I can reproduce the issue when using "RelWithDebInfo"

I'm not sure this is related to the seg fault you're experiencing but let me show you what I saw:

TEST_CASE("container_common_test - interweave")
{
    using namespace fplus;
    REQUIRE_EQ(interweave(IntVector({ 1, 3 }), IntVector({ 2, 4 })), IntVector({ 1, 2, 3, 4 }));
    REQUIRE_EQ(interweave(IntVector({ 1, 3, 5, 7 }), IntVector({ 2, 4 })), IntVector({ 1, 2, 3, 4, 5, 7 }));

    // This test fails in release mode (but will work in Debug mode), when compiling for windows ARM
    //REQUIRE_EQ(unweave(IntVector({ 0, 1, 2, 3 })), std::make_pair(IntVector({ 0, 2 }), IntVector({ 1, 3 })));   
    
    // Let's show the result
    auto u = unweave(IntVector({ 0, 1, 2, 3 }));
    std::cout << fplus::show(u) << std::endl;     // will show ([1, 3], [0, 2]) instead of ([0, 2], [1, 3])

    //REQUIRE_EQ(unweave(IntVector({ 0, 1, 2, 3, 4 })), std::make_pair(IntVector({ 0, 2, 4 }), IntVector({ 1, 3 })));
}

I don't know if this is related.


PS: I might contact you by email in the next days because I'm working hard on a new project which could be interesting for researchers. I'm looking for feedback, and I felt you might have an interesting feedback about it (if you have time)

@Dobiasd
Copy link
Owner Author

Dobiasd commented Jun 11, 2024

Wow, thanks a lot for the investigation! ❤️

The flakiness of the macOS CI should be fixed by this commit, which I just pushed. ✅

The problem with the Windows tests, however, is still a riddle to me. 🧐


And, sure, looking forward to your email. 👍

@pthom
Copy link
Contributor

pthom commented Jun 11, 2024

The interweave issue is probably unrelated, and due to a compiler bug.

See https://stackoverflow.com/questions/78608330/postfix-increment-compiler-bug-on-visual-studio-2022-arm-preview

You should probably move the ++ increment at the end of the loop.

@pthom
Copy link
Contributor

pthom commented Jun 11, 2024

You are probably fighting against the limitation of the GitHub runners. Maybe some limitations in threading.

I just rented a visual studio machine on Azure to test it. With Visual Studio Professional 2022 17.10 on x86 64bits / Azure, the tests do run fine.

@Dobiasd
Copy link
Owner Author

Dobiasd commented Jun 12, 2024

Thanks a lot for the interweave fix! 😍

Regarding the Windows tests: If they cause so much trouble and additional complexity, maybe we should just drop them completely?
Edit: Meh, I now remember there were quite a few problems with MSVC, which required code changes in fplus. Not having test coverage for those would also not be too nice. : /

@Dobiasd
Copy link
Owner Author

Dobiasd commented Jun 12, 2024

On a test branch, I reproduced the segfault with a minimal example:

TEST_CASE("show_versions - std_thread")
{
    auto t = std::thread([]() { });
    t.join();
}

TEST_CASE("show_versions - std_async")
{
    auto handle = std::async(std::launch::async, []() { return 1; });
    handle.get();
}

Windows 2022 MSVC result:

...
2/3 Test #2: show_versions - std_thread .......   Passed    0.17 sec
3/3 Test #3: show_versions - std_async ........***Exception: SegFault  0.29 sec

@pthom
Copy link
Contributor

pthom commented Jun 12, 2024

On a test branch, I reproduced the segfault with a minimal example:

Wow, well done!

The next step to be able to gather more feedback from other users would be to set up a very simple repository.
It would contain:

#include ...
int main()
{
    auto handle = std::async(std::launch::async, []() { return 1; });
    handle.get();
}

and it would compile and run this in CI. What do you think? I think I will try it (or you can if you are impatient to see the result: I cannot do it before this afternoon).

@pthom
Copy link
Contributor

pthom commented Jun 12, 2024

The advantage of this version is that if it does not fail, we could then add doctest, and see if the combination doctest + thread is the cause of the issue

@Dobiasd
Copy link
Owner Author

Dobiasd commented Jun 12, 2024

Good idea, thanks, and no hurry. ☺️

@pthom
Copy link
Contributor

pthom commented Jun 12, 2024

Ok, it can be reproduced with the simplest possible async code!

See https://github.com/pthom/study_ci_thread_fault

And https://stackoverflow.com/questions/78612314/github-runners-issue-with-visual-studio-2022-segmentation-fault-with-the-simple

@pthom
Copy link
Contributor

pthom commented Jun 12, 2024

Look at the comments on this SO question:

https://stackoverflow.com/questions/78612314/github-runners-issue-with-visual-studio-2022-segmentation-fault-with-the-simple

I think the bug was fixed - under my feet - by GitHub while I was reporting it to Stack Overflow.
You may want to rerun the CI, it worked on my side

I am not lucky these days with compiler and platform bugs 🙃

@Dobiasd
Copy link
Owner Author

Dobiasd commented Jun 12, 2024

Cool minimal example! Perhaps your great report made somebody fix the bug (silently).

@Dobiasd
Copy link
Owner Author

Dobiasd commented Jun 12, 2024

Ah, no, it seems the work on the other issue happened before.

Yeah, not lucky indeed. 😬 We would have solved the problem too by simply doing nothing for a few days. 🦥

@Dobiasd
Copy link
Owner Author

Dobiasd commented Jun 12, 2024

Closing this issue, since the CI runner is fixed. ✔️

Thanks again for the nice collaboration. Situations like this (especially with long-term contributors like you) are what make maintaining an open-source project worthwhile for me. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants