[GR-57064] Lazy virtual thread JFR registration and epoch generation caching #9570

roberttoyonaga · 2024-08-26T18:46:24Z

Summary

In Hotspot, JFR registers virtual threads lazily when they emit an event. Currently, in SubstrateVM, we register virtual threads eagerly whenever they are mounted. This can be problematic if many virtual threads mount/unmount while not emitting many events. This is especially bad since SubstrateVM uses a global mutex to protect the "thread constant pool repository". This mutex must be acquired every time we register or even just check thread registration status. In Hotspot, there is no such global "thread constant pool repository", and no global synchronization -- instead a thread "checkpoint event" is written to thread local JFR buffers lazily when needed.

This PR does 2 things:

Make virtual thread registration lazy instead of eager. Register virtual threads only when they need to emit an event, not upon mounting.
Avoid locking global "thread constant pool repo" mutex every time we need to check virtual thread registration status. Do this by caching the "epoch generation" in the Target_java_lang_VirtualThread.java object (similar to Target_java_lang_Thread.jfrExcluded. Before locking the thread repo mutex, first compare "epoch generation" in case the virtual thread is already registered. A very similar techniqie is used in Hotspot to avoid writing thread checkpoint events unless necessary.

Results

Scenario 1: Many virtual thread mounts/unmounts while not emitting JFR events.
Used test app 1. The new lazy method takes ~1105ms to complete, while the old eager approach takes ~1140ms.

Scenario 2: Many JFR events emitted concurrently by many virtual threads.
Used test app 2. The registration with cached epoch generation checking takes ~42.600s to complete, while registration with global locking takes ~73.650s.

Related issue: #9536

christianhaeubl

Thanks for the PR, I added a few comments.

christianhaeubl · 2024-09-02T13:57:16Z

substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/SubstrateJVM.java

+     * eagerly when started and at chunk rotations.
+     */
+    @Uninterruptible(reason = "Epoch should not change while checking generation.")
+    private static void maybeRegisterVirtualThread(Thread thread) {


Doesn't this break once we leave uninterruptible code?

we start in epoch 0

thread A calls getThreadId(...), which registers a vthread for epoch 0 in maybeRegisterVirtualThread

thread A leaves the uninterruptible code and gets blocked at a safepoint

thread B changes the epoch from 0 to 1

after the safepoint, thread A continues execution and uses the thread id that was returned by getThreadId(...) even though the vthread was only registered for epoch 0

One example where I think that this might happen: JavaMonitorQueuedSynchronizer.ConditionNode.notifierJfrTid

I think the situation you describe is actually ok. Once the epoch changes, new event emissions from "thread A" will fail the epoch generation check, and registration will be re-done. Since getThreadID() is called for every event emission, the important thing is that gathering event info with getThreadID() and the writing of the event data to the JFR buffer are done within the same block of uninterruptible code, which does indeed happen (ex. all emit0 methods).

Calls to getThreadID() outside of the event emission path (ex. with JavaMonitorQueuedSynchronizer.ConditionNode.notifierJfrTid) result in unnecessary registration checks, but shouldn't break anything.

Ok, so you essentially assume that getThreadID() will be called again for thread A once the thread ID is actually used in a JFR event. However, I don't think that this assumption is valid. Here is one example:

Thread B blocks at ConditionNode.

Thread A notifies thread B and sets ConditionNode.notifierJfrTid.

The JFR epoch changes.

Once thread B wakes up, it uses the value of ConditionNode.notifierJfrTid (i.e., the ID of thread A) when emitting a JavaMonitorWaitEvent. However, thread A is not registered for the current epoch.

Ok I see. I mistakenly thought we always used JfrNativeEventWriter.putThread(JfrNativeEventWriterData , Thread ) when dealing with threads. But in that case we use a the plain TID directly.

I think we'll have to switch to always using putThread whenever dealing with the JFR Thread type. What do you think of this?

The downside is that we'll have to remember to give careful consideration to emitting JFR Thread data.
I'm not sure there's a better place to put the registration check though. And we'll need to check at some point in order to copy Hotspot and do lazy registration.

I've just pushed a commit to update all the places we write thread IDs as JFR event data to use either putThread(Thread) or putThread(long). This should ensure that threads are always registered if they get referenced from event data. The downside is that we need to remember to use putThread.

Another unfortunate thing is that we can only do checks with putThread(Thread) because with only the TID we have no way of knowing whether the thread is virtual and cannot access Target_java_lang_VirtualThread.jfrGeneration.

christianhaeubl · 2024-09-02T13:59:39Z

substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/traceid/JfrTraceIdEpoch.java

@@ -43,6 +43,7 @@ public class JfrTraceIdEpoch {
    private static final long EPOCH_1_BIT = 0b10;

    private boolean epoch;


epoch can probably be removed (i.e., can be computed from the least significant bit of epochGeneration).

Ok, good idea. I'll remove that.

roberttoyonaga · 2024-09-04T16:26:33Z

Truffle gate error seems to be unrelated

christianhaeubl · 2025-04-23T11:28:08Z

Thanks!

I did one pass over the PR and opened #11070. For now, I think it is better/safer if we still register all virtual threads eagerly. At the moment, we have a few JFR events where we only store the thread id. We would need to change all that code so that we keep more information around, which in my opinion is not worth the effort at the moment.

roberttoyonaga · 2025-04-23T14:46:07Z

I did one pass over the PR and opened #11070. For now, I think it is better/safer if we still register all virtual threads eagerly. At the moment, we have a few JFR events where we only store the thread id. We would need to change all that code so that we keep more information around, which in my opinion is not worth the effort at the moment.

Yes that sounds fine to me as well. Most of improvement I measured was gained through avoiding global locking, not the lazy registration anyway.

lazy vthread registration

43b82b1

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Aug 26, 2024

roberttoyonaga marked this pull request as ready for review August 26, 2024 19:29

roberttoyonaga requested a review from christianhaeubl August 26, 2024 19:36

roberttoyonaga added native-image redhat-interest native-image-jfr labels Aug 26, 2024

christianhaeubl changed the title ~~Lazy virtual thread JFR registration and epoch generation caching~~ [GR-57064] Lazy virtual thread JFR registration and epoch generation caching Aug 27, 2024

christianhaeubl reviewed Sep 2, 2024

View reviewed changes

roberttoyonaga added 2 commits September 4, 2024 11:07

remove epoch boolean

e33f0ac

style

c1e4ab0

Use putThread to avoid races.

22add5c

graalvmbot mentioned this pull request Apr 23, 2025

[GR-57064] Skip unnecessary JFR registrations for virtual threads. #11070

Merged

graalvmbot merged commit 8cd35ad into oracle:master Apr 24, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GR-57064] Lazy virtual thread JFR registration and epoch generation caching #9570

[GR-57064] Lazy virtual thread JFR registration and epoch generation caching #9570

Uh oh!

roberttoyonaga commented Aug 26, 2024 •

edited

Loading

Uh oh!

christianhaeubl left a comment

Uh oh!

christianhaeubl Sep 2, 2024

Uh oh!

roberttoyonaga Sep 3, 2024

Uh oh!

christianhaeubl Sep 10, 2024

Uh oh!

roberttoyonaga Sep 10, 2024 •

edited

Loading

Uh oh!

roberttoyonaga Sep 10, 2024

Uh oh!

roberttoyonaga Sep 13, 2024

Uh oh!

christianhaeubl Sep 2, 2024

Uh oh!

roberttoyonaga Sep 3, 2024

Uh oh!

roberttoyonaga commented Sep 4, 2024

Uh oh!

christianhaeubl commented Apr 23, 2025

Uh oh!

roberttoyonaga commented Apr 23, 2025

Uh oh!

Uh oh!

Uh oh!

		@@ -43,6 +43,7 @@ public class JfrTraceIdEpoch {
		private static final long EPOCH_1_BIT = 0b10;

		private boolean epoch;

[GR-57064] Lazy virtual thread JFR registration and epoch generation caching #9570

[GR-57064] Lazy virtual thread JFR registration and epoch generation caching #9570

Uh oh!

Conversation

roberttoyonaga commented Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Uh oh!

christianhaeubl left a comment

Choose a reason for hiding this comment

Uh oh!

christianhaeubl Sep 2, 2024

Choose a reason for hiding this comment

Uh oh!

roberttoyonaga Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

christianhaeubl Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

roberttoyonaga Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roberttoyonaga Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

roberttoyonaga Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

christianhaeubl Sep 2, 2024

Choose a reason for hiding this comment

Uh oh!

roberttoyonaga Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

roberttoyonaga commented Sep 4, 2024

Uh oh!

christianhaeubl commented Apr 23, 2025

Uh oh!

roberttoyonaga commented Apr 23, 2025

Uh oh!

Uh oh!

Uh oh!

roberttoyonaga commented Aug 26, 2024 •

edited

Loading

roberttoyonaga Sep 10, 2024 •

edited

Loading