Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to include all threads in dotnet-stack reports #5193

Open
andredasilvapinto opened this issue Jan 20, 2025 · 7 comments
Open

Add option to include all threads in dotnet-stack reports #5193

andredasilvapinto opened this issue Jan 20, 2025 · 7 comments
Labels
enhancement New feature or request

Comments

@andredasilvapinto
Copy link

andredasilvapinto commented Jan 20, 2025

Background and Motivation

We monitor the number of process threads via Process.GetCurrentProcess().Threads.Count and have been noticing a continuous increase in this metric, which seems to indicate a potential thread leak. In order to investigate this, I tried using dotnet-stack, but the total number of stack frames that it prints is much smaller than what is reported by the Threads.Count metric or the output of ps -T -p <PID> on Linux.

The documentation for report says:

Prints the stack trace for each thread in the target process.

which doesn't seem to be correct.

By looking at the (short) thread names, I can see that for example none of the ".NET Server GC" or ".NET BGC" are reported, but there are more cases. It is difficult to know exactly which threads are missing as dotnet-stack does not include the thread name in its output (something that would also be very useful and is present in the JVM equivalent jstack).

It would be useful to provide an option to output the exhaustive list of all threads in the process.

Proposed Feature

Add option to include all process threads in the dotnet-stack report output

@andredasilvapinto andredasilvapinto added the enhancement New feature or request label Jan 20, 2025
@hoyosjs
Copy link
Member

hoyosjs commented Jan 21, 2025

There's two parts to this:

  • The threads you pointed out are native-only threads. We don't report native threads. dotnet-stack reports only managed threads. Likely this is something to call out in the documentation. dotnet-stack is unlikely to ever report native code execution.
  • You want to get the name of threads in the reports, and that sounds possible. I'd just need to check the permissions needed to run the API.

@andredasilvapinto
Copy link
Author

andredasilvapinto commented Jan 22, 2025

The total counts from

ps -T -p <PID> | awk '{for (i=5; i<=NF; i++) printf $i " "; print ""}' | sort | uniq -c | sort -nr

are significantly higher than the total number of threads listed in the dotnet-stack output.

I don't think the lack of native threads is the only reason behind the difference. The total count of threads with names that look like managed threads is higher than what is included in dotnet-stack.

I wonder if there is a way to clearly identify which ones of those are managed or native-only.

@andredasilvapinto
Copy link
Author

andredasilvapinto commented Jan 22, 2025

I couldn't find a way to split them, so the best I could do was

# dotnet-stack report --process-id <PID> | grep "Thread (" | wc -l
269

vs

# dotnet-dump collect -p <PID> --type Mini -o mini.dmp
# dotnet-dump analyze mini.dmp
> clrthreads
ThreadCount:      389
UnstartedThread:  0
BackgroundThread: 377
PendingThread:    0
DeadThread:       6
Hosted Runtime:   no
...

269 vs 389. The commands ran within a few seconds of each other.

For what is worth threads returns 858 entries. So I assume there were 469 native-only threads? That seems a lot.

@andredasilvapinto
Copy link
Author

It seems that a big reason behind the difference in values is caused by threads with no stack frames. As per clrstack -all:

OS Thread Id: 0xa0
        Child SP               IP Call Site

If I exclude these from the clrstack -all output then the difference is significantly lower: 269 vs 287. The remaining seem to be related to System.Threading.PortableThreadPool+WorkerThread, so I assume some dynamic thread creation in the worker thread pool.

I don't know what the threads with no clr stack represent.

@hoyosjs
Copy link
Member

hoyosjs commented Jan 22, 2025

LLDB/Windbg could give us hints - we'd need to see what things like clrstack -f tell us. Generally though, if stacks have no managed frames there won't be much for dotnet-stack to report. Stack reasons at the running managed code level - async tasks or empty stacks are not too interesting to it.

@andredasilvapinto
Copy link
Author

Thanks. It seems they are all related to GC. Here is a stacktrace:

OS Thread Id: 0x100
        Child SP               IP Call Site
00007E0228FF8BB0 00007FAE7D7E0117 libc.so.6!__GI___futex_abstimed_wait_cancelable64 + 231 at nptl/nptl/futex-internal.c:57
00007E0228FF8BB0 00007FAE7D7E00E9 libc.so.6!__GI___futex_abstimed_wait_cancelable64 + 185 at nptl/nptl/futex-internal.c:57
00007E0228FF8BB0 00007FAE7D7E00E9 libc.so.6!__GI___futex_abstimed_wait_cancelable64 + 185 at nptl/nptl/futex-internal.c:57
00007E0228FF8BF0 00007FAE7D7E2A41 libc.so.6!pthread_cond_wait@@GLIBC_2.3.2 + 529 at nptl/nptl/pthread_cond_wait.c:506
00007E0228FF8BF0 00007FAE7D7E2970 libc.so.6!pthread_cond_wait@@GLIBC_2.3.2 + 320 at nptl/nptl/pthread_cond_wait.c:558
00007E0228FF8CD0 00007FAE7D59CC62 libcoreclr.so!GCEvent::Impl::Wait(unsigned int, bool) + 210 at /__w/1/s/src/coreclr/gc/unix/events.cpp:179
00007E0228FF8D20 00007FAE7D3FA288 libcoreclr.so!SVR::gc_heap::bgc_thread_function() + 104 at /__w/1/s/src/coreclr/gc/gc.cpp:39188
00007E0228FF8D70 00007FAE7D3FA20E libcoreclr.so!SVR::gc_heap::bgc_thread_stub(void*) + 30 at /__w/1/s/src/coreclr/gc/gc.cpp:37163
00007E0228FF8D70 00007FAE7D2D3EA4 libcoreclr.so!(anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(void*) + 116 at /__w/1/s/src/coreclr/vm/gcenv.ee.cpp:1441
00007E0228FF8D70 00007FAE7D2D3E41 libcoreclr.so!(anonymous namespace)::CreateSuspendableThread(void (*)(void*), void*, char16_t const*)::$_0::__invoke(void*) + 17 at /__w/1/s/src/coreclr/vm/gcenv.ee.cpp:1425
00007E0228FF8DB0 00007FAE7D5D0FBE libcoreclr.so!CorUnix::CPalThread::ThreadEntry(void*) + 510 at /__w/1/s/src/coreclr/pal/src/include/pal/thread.hpp:468
00007E0228FF8E60 00007FAE7D7E3AC3 libc.so.6!start_thread + 755 at nptl/nptl/pthread_create.c:442
00007E0228FF8F00 00007FAE7D875850 libc.so.6!__clone3 + 48 at sysdeps/unix/sysv/linux/x86_64/clone3.S:83

96 threads in this situation. It seems there is one BGC thread per core as this machine has 96 cores. Not sure if this is the optimal number of BG GC threads. In the JVM it is possible to define the number of background GC threads. I didn't find such option in dotnet.

@hoyosjs
Copy link
Member

hoyosjs commented Feb 11, 2025

Sorry @andredasilvapinto - I thought I had replied to this. There's GcHeapCount in GC Configs. Additionally, there's a new GC mode DATAS that adapts the heap count dynamically to pressure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants