Debugging BEAM Applications

Introduction

This chapter goes into the various methods for finding and fixing bugs without disrupting the services in progress. We will explore testing techniques, tools, and frameworks that aid in testing and debugging your code. We’ll also shed light on some common bug sources, such as deadlocks, message overflow, and memory issues, providing guidance on identifying and resolving these problems.

Debugging is the process of identifying and eliminating errors, or "bugs," from software. While Erlang offers step-by-step debugging tools like the Debugger, the most effective debugging methods often rely on Erlang’s tracing facilities. These facilities will be thoroughly discussed in Chapter [CH-Tracing]. In this chapter We will touch on system level tracing witth dtrace and systemtap.

This chapter also explores the concept of "Crash Dumps," which are human-readable text files generated by the Erlang Runtime System when an unrecoverable error occurs, such as running out of memory or reaching an emulator limit. Crash Dumps are invaluable for post-mortem analysis of Erlang nodes, and you will learn how to interpret and understand them.

In addition to these topics, this chapter will also discuss different testing methodologies, including EUnit and Common Test, which are crucial for ensuring the reliability and robustness of your code. The importance of mocking in testing will be examined, along with its best practices.

You will become acquainted with the "let it crash" principle and the ways to effectively implement it within your system. You’ll gain insights into the workings of exceptions and supervisor tree design.

By the end of this chapter, you’ll be equipped with the knowledge to systematically test your system and its individual components. You will be able to identify common mistakes and problems, and possibly even picking up some debugging philosophy along the way.

Debugging Philosophy

Debugging is an essential part of software development, and in Erlang, it takes on a unique approach due to the language’s fault-tolerant design. Rather than focusing solely on preventing failures, Erlang encourages a reactive debugging philosophy—detecting, diagnosing, and recovering from errors effectively. Debugging in Erlang involves leveraging systematic approaches, analyzing failures in production, and continuously improving code quality by learning from mistakes.

Systematic Approaches to Debugging

A structured approach to debugging can significantly reduce the time and effort required to identify and resolve issues. Debugging in Erlang follows a methodical process that involves observation, isolation, and testing.

1. Reproduce the Problem

Before fixing a bug, you need to reproduce it consistently. Some techniques for reproducing issues in Erlang systems include:

Running the system with detailed logging (lager, logger).
Using tracing tools like dbg or recon to capture function calls and message passing.
Simulating failure scenarios with controlled test environments.

Example: Enabling tracing to inspect function calls in a module:

dbg:tracer().
dbg:p(all, c).
dbg:tpl(my_module, my_function, []). % Trace all calls to my_function

If the issue occurs sporadically, running the system under load testing with tools like prop_er or Common Test can help uncover race conditions.

2. Isolate the Faulty Component

Once the issue is reproducible, the next step is isolating the problem to a specific module, process, or function:

Check process message queues using: erlang process_info(Pid, messages). A long message queue could indicate a performance bottleneck.
Inspect ETS tables and memory usage: erlang ets:info(my_table, size). erlang:memory().
Use selective tracing to focus only on processes related to the issue: erlang dbg:p(self(), [m]). % Trace only the current process

By isolating the faulty component, you narrow the scope of debugging and avoid unnecessary distractions.

3. Analyze Logs and Crash Dumps

Logs and crash dumps provide valuable information about system failures. When an Erlang node crashes, it generates an erl_crash.dump file containing details such as:

The reason for the crash (e.g., memory exhaustion, infinite loops, deadlocks).
Process states at the time of failure.
The call stack of the crashing process.

Example: Checking a crash dump’s memory usage section:

=memory
total: 2147483648
processes: 1807483648
ets: 107374182
binary: 32212254
code: 5242880

If process memory is abnormally high, it could indicate a memory leak.

For real-time debugging, use crashdump_viewer:

crashdump_viewer:start().

4. Use Debugging Tools Effectively

Erlang provides powerful runtime debugging tools to analyze system behavior:

Observer GUI (observer:start()) – Interactive process monitoring.
dbg and recon – Low-level tracing and inspection.
SystemTap or DTrace – Kernel-level profiling for advanced debugging.

Using the right tool for the job prevents unnecessary code modifications and speeds up debugging.

5. Verify the Fix and Write Regression Tests

Once the bug is identified and fixed, ensure it does not reappear:

Write regression tests in Common Test or EUnit.
Run property-based tests (PropEr, QuickCheck) to verify edge cases.
Test in a staging environment before deploying to production.

Learning from Mistakes and Improving Code Quality

Every bug presents an opportunity to improve the codebase and prevent future issues. Erlang’s philosophy of resilience and self-healing extends to how developers handle mistakes and refine their systems.

1. Conducting Post-Mortems

After fixing a critical bug, analyze why it happened and how to prevent it. A post-mortem analysis should answer:

What was the root cause of the issue?
How did it impact the system?
How can similar bugs be prevented?

If a process crashed due to an unexpected message, ensure message filtering is robust:

handle_info(_Unexpected, State) ->
    {noreply, State}.

2. Improving Logging and Observability

Many issues arise due to insufficient logging and monitoring. Improving system observability includes:

Using structured logging (lager, logger) with log levels: erlang logger:log(info, "User logged in: ~p", [UserId]).
Implementing real-time monitoring: erlang recon:bin_leak(10). % Detects potential memory leaks.

Better logging helps detect anomalies before they escalate into major failures.

3. Enhancing Code Readability and Maintainability

Well-structured code is easier to debug. Following Erlang best practices improves maintainability:

Use clear function names (handle_request/1 instead of do_it/1).
Follow the OTP design principles (gen_server, supervisor).
Write modular code to make debugging easier.

Example: Instead of complex nested case statements:

case Result of
    {ok, Data} -> process(Data);
    {error, _} -> handle_error()
end.

Use pattern matching for clarity:

process_request({ok, Data}) -> process(Data);
process_request({error, _}) -> handle_error().

4. Implementing Fail-Fast Mechanisms

Erlang’s Let It Crash philosophy means processes should fail quickly when an error occurs instead of propagating invalid state.

Example: Enforcing fail-fast behavior with guards:

handle_request({ok, Data}) when is_list(Data) ->
    process(Data);
handle_request(_) ->
    exit(bad_request).

Fail-fast mechanisms prevent silent failures and make debugging easier.

5. Learning from Open Source Erlang Systems

Many production-grade Erlang applications are open source. Studying their debugging practices provides valuable insights:

RabbitMQ – Uses structured logging and monitoring tools.
MongooseIM – Implements extensive tracing.
Riak – Employs distributed fault recovery techniques.

Exploring these projects improves debugging skills and enhances system design knowledge.

The Usual Suspects: Common Sources of Bugs

Software systems often exhibit recurring types of failures that can impact stability and performance. In Erlang, despite its design for fault tolerance, certain categories of bugs appear frequently. This section explores some of the most common sources of issues in Erlang applications, including deadlocks, mailbox overflow, and memory issues. Understanding these problems and learning how to diagnose and resolve them can help in writing more reliable and efficient Erlang programs.

Deadlocks

Deadlocks occur when two or more processes are waiting for each other to release resources, leading to a state where no progress can be made. This is a common problem in concurrent systems, including those built with Erlang’s lightweight processes.

Deadlocks in Erlang typically arise due to:

Circular dependencies: Two processes each waiting for a resource held by the other.
Misused locks: When using gen_server or gen_fsm, incorrect ordering of message handling can lead to deadlocks.
Blocking calls inside gen_server: Calling gen_server:call/2 within a handle_call/3 callback can cause the process to block indefinitely.

To identify deadlocks:

Process inspection: Use observer:start(). or process_info(Pid, status). to check for stuck processes.
Tracing with dbg: Enable function call tracing to determine where processes are waiting indefinitely.
Message queue analysis: If a process is waiting for a message that never arrives, check its mailbox using process_info(Pid, messages).

Use timeouts in blocking operations:

gen_server:call(Server, Request, Timeout).

Setting a reasonable timeout prevents indefinite blocking.

Use asynchronous calls (gen_server:cast/2) or monitor messages (erlang:monitor/2) to avoid blocking.

Ensure that all locks are acquired in a consistent order across processes to prevent cyclic dependencies.

Implement periodic checks that monitor process status and forcefully restart deadlocked processes.

Mailbox Overflow

Erlang’s message-passing model allows processes to receive messages asynchronously via mailboxes. However, if a process accumulates messages faster than it can process them, the mailbox can grow indefinitely, leading to high memory consumption or crashes.

There are some common causes and symptoms of message overflows:

Slow message processing: A gen_server that takes too long to handle requests can lead to unprocessed messages piling up.
Excessive message generation: Processes sending frequent messages without checking backpressure.
Unprocessed system messages: Failure to handle system messages like gen_server:handle_info/2.

Symptoms include:

Increasing memory usage (process_info(Pid, memory).)
Long process message queues (process_info(Pid, message_queue_len).)
Unresponsive processes that appear idle but are overloaded.

Preventing and Resolving Mailbox Overflow Issues

Monitor message queue length:

process_info(Pid, message_queue_len).

Use monitoring tools to trigger alerts when queues grow beyond a threshold.

Rate-limiting senders

Use backpressure mechanisms, such as asking for explicit acknowledgments before sending more messages.
Implement flow control: Instead of blindly sending messages, a producer can check the consumer’s load.

Use selective receive properly

Avoid patterns like:

receive {specific_message, Data} -> process(Data) end.

which ignores other pending messages, causing an ever-growing mailbox. An exception to this rule is when you use the Ref-trick for a rpc-type send and receive. See [Ref-Trick] for more information.

Offload heavy computation:

Offload expensive operations to worker processes instead of doing them in the main process loop.
Use gen_server:reply/2 to respond to messages asynchronously after processing.

Memory Issues

Erlang’s memory model relies on per-process heaps, garbage collection, and a binary allocator. While designed for efficiency, improper memory usage can lead to performance degradation.

Memory leaks in Erlang often stem from:

Long-lived processes accumulating state: ETS tables, large lists, or unprocessed messages.
Unbounded message queues: Processes that receive but never consume messages.
Binary data accumulation: Large binaries can cause high memory fragmentation.

How to detect memory leaks

Check individual process memory usage:

process_info(Pid, memory).

Use observer:start(). and navigate to the "Processes" tab to identify processes consuming excessive memory.

Enable tracing on memory allocations using:

recon_alloc:memory(ets).

Managing Binary Memory Usage

Large binaries are managed separately from process heaps using reference counting. Issues arise when:

Processes hold onto binary references longer than needed.
Unused large binaries remain due to delayed garbage collection.

Solutions:

Convert large binaries to smaller chunks:

binary:split(BigBinary, <<"\n">>).

Force garbage collection:

erlang:garbage_collect(Pid).

This reclaims memory used by binaries if the process is no longer referencing them. This can be important in relaying processes that are not using the binaries anymore, but they hang on to a reference to them. Remember that binaries are reference counted and live across processes.

Monitor binary memory allocation:

erlang:memory(binary).

Optimizing Memory Usage in Erlang Systems

Erlang provides several system flags that control heap allocation behavior.

min_heap_size (Minimum Process Heap Size)

Defines the initial heap size for a newly created process.
Helps avoid frequent heap expansions if a process is expected to handle large amounts of data.
Default is typically 233 words, but increasing it slightly (e.g., 256 or 512) can improve performance for processes that grow quickly.

Example Usage You can configure this setting for a process using:

spawn_opt(fun() -> my_function() end, [{min_heap_size, 512}]).

or apply it globally via:

erl +hms 512

This ensures that all new processes start with a heap of at least 512 words, reducing the need for frequent heap expansions.

min_bin_vheap_size (Minimum Binary Virtual Heap Size)**

Controls the virtual heap size for reference-counted binaries (binaries > 64 bytes).
Helps optimize memory allocation for processes dealing with large binary data.
Default is 46422, but for binary-heavy workloads, you might tune it to 512 or higher.

spawn_opt(fun() -> handle_large_binaries() end, [{min_bin_vheap_size, 100000}]).

This ensures the process starts with enough binary heap space, preventing frequent reallocations.

Optimize full-sweep garbage collection thresholds (fullsweep_after).

Use ETS efficiently

Regularly clean up unused entries to avoid memory bloat.
Prefer set tables over bag or ordered_set unless necessary.

Be mindful of passing large terms, if they are long lived and shared. Instead of sending large terms between processes, use references (e.g., store large data in ETS or a database and send references).

Let It Crash Principle

Erlang’s “Let It Crash” principle is a fundamental philosophy in designing fault-tolerant and resilient systems. Instead of writing defensive code to handle every possible error, Erlang developers embrace failure and rely on supervisor trees to detect and recover from crashes. This approach simplifies code, improves maintainability, and ensures that systems remain robust even in the face of unexpected errors.

Overview and Rationale

In traditional programming, error handling often involves writing extensive try-catch statements and defensive code to anticipate failures. This approach, however, introduces complexity and can lead to hard-to-maintain codebases. Erlang takes a different approach by accepting that failures will happen and focusing on automatic recovery rather than exhaustive error prevention.

The rationale behind "Let It Crash" is:

Isolation of failures: Since each Erlang process runs independently, a crash in one process does not affect others.
Automatic recovery: Supervisors monitor processes and restart them when they fail.
Simpler code: Developers write less defensive code and focus on business logic rather than error handling.
Fault containment: By letting processes crash and restart in a controlled manner, errors are prevented from spreading.

This philosophy makes Erlang systems highly resilient, particularly in distributed environments where failures are inevitable.

Exceptions in Erlang

Erlang provides built-in mechanisms for handling exceptions, but instead of focusing on recovering from every error locally, it encourages process termination and restart through supervision.

Types of Exceptions

Erlang has three main types of exceptions:

Errors (error:Reason) – Occur due to serious faults like division by zero or calling an undefined function.
Throws (throw:Reason) – Used for non-local returns and controlled exits.
Exits (exit:Reason) – Occur when a process terminates unexpectedly or intentionally.

Example of Exception Handling

While defensive programming discourages crashes, Erlang allows you to handle exceptions explicitly if needed:

try 1 / 0 of
    Result -> io:format("Result: ~p~n", [Result])
catch
    error:badarith -> io:format("Cannot divide by zero!~n")
end.

This is useful in cases where immediate local handling is required, but most failures in Erlang are left to crash and be handled by supervisors.

Process Exits and Monitoring

If a process crashes, it sends an exit signal to linked processes. You can monitor or trap these exits if needed:

spawn_monitor(fun() -> exit(died) end).

This allows another process to detect failures and react accordingly.

Designing Systems with Supervisor Trees

Instead of handling errors inside every function, Erlang applications rely on supervisor trees, a hierarchical structure where supervisors monitor worker processes and restart them upon failure.

Structure of a Supervisor Tree

A supervisor tree consists of:

Supervisor: A special process that manages worker processes and other supervisors.
Workers: The actual processes performing computations. If they crash, the supervisor decides how to restart them.

-module(my_supervisor).
-behaviour(supervisor).

-export([start_link/0, init/1]).

start_link() ->
    supervisor:start_link(?MODULE, []).

init([]) ->
    {ok, {{one_for_one, 3, 10},
          [{worker1, {my_worker, start_link, []}, permanent, 5000, worker, [my_worker]}]}}.

This supervisor ensures that if my_worker crashes, it will be restarted automatically.

Supervision Strategies

Supervisors can follow different restart strategies:

one_for_one: Restart only the crashed process (most common).
one_for_all: Restart all child processes if one fails.
rest_for_one: Restart the failed process and all those started after it.
simple_one_for_one: Used when dynamically spawning similar worker processes.

Benefits of Using Supervisor Trees

Automatic Fault Recovery: If a worker crashes, it is restarted without manual intervention.
Scalability: Supervisors can manage thousands of processes efficiently.
Separation of Concerns: Business logic stays in workers, and fault recovery is handled separately.

Debugging Tools and Techniques

Debugging is essential when dealing with unexpected behavior in Erlang applications. Several tools exist in the Erlang ecosystem.

The Erlang Debugger (`dbg`)

The dbg module provides powerful tracing capabilities for debugging live systems with minimal impact on performance.

Getting Started with `dbg`

To start the dbg tool:

1> dbg:tracer().
{ok,<0.85.0>}

This sets up a tracer process to collect debug information. You can choose different backends for output:

dbg:tracer(console). → Print to the shell
dbg:tracer(port, file:open("trace.log", [write])). → Write to a file

Once tracing is enabled, you can attach tracers to processes or functions.

Tracing All Function Calls

dbg:p(all, c). % Trace all function calls in all processes

Tracing a Specific Function

dbg:tpl(my_module, my_function, []). % Trace calls to my_function/0

Setting a Conditional Trace

Trace only when a function argument matches:

dbg:tpl(my_module, my_function, [{'_', [], [{message, "Function called"}]}]).

Breakpoints are useful when stepping through code execution. Start the graphical debugger:

debugger:start().

Then, set breakpoints in a module:

int:break(my_module, my_function, Arity).

Once a function is traced, calls and returns are logged.

Example trace output:

(<0.85.0>) call my_module:my_function(42)
(<0.85.0>) returned from my_function -> "Result: 42"

This allows you to track how values change throughout execution.

The next-genation debugger: EDB

The Erlang Debugger (EDB) is a modern, feature-rich debugger for Erlang applications. It provides a language server interface for setting breakpoints, inspecting variables, and stepping through code execution. See https://whatsapp.github.io/edb/

In order to use EDB, you need to build Erlang from source with EDB support. Future versions of Erlang OTP might be shipped with EDB support. Here is a guide on how to build Erlang from source with EDB support:

git clone https://github.com/WhatsApp/edb.git
git submodule update --init
pushd otp
./configure --prefix $(pwd)/../otp-bin
make -j$(nproc)
make -j$(nproc) install
popd
rebar3 escriptize

Then you can start EDB with:

_build/default/bin/edb dap

Crash Dumps in Erlang

Crash dumps provide information for diagnosing failures in Erlang systems. They contain details about system state, memory usage, process information, and call stacks at the time of a crash. Understanding how to interpret these files can significantly speed up debugging and prevent future crashes.

Understanding and Reading Crash Dumps

A crash dump (erl_crash.dump) is a snapshot of the Erlang runtime system (ERTS) at the time of an abnormal termination. It includes:

System version and runtime parameters
Memory usage statistics
Loaded modules
Process states and call stacks
Port and driver information

By analyzing crash dumps, you can determine why a system crashed—whether due to memory exhaustion, infinite loops, deadlocks, or other failures.

The official documentation provides a detailed explanation of the crash dump format: How to Interpret the Erlang Crash Dumps. We will cover the basics here.

By default, crash dumps are saved in the working directory where the Erlang system was started. The filename is typically:

erl_crash.dump

You can change the location by setting the environment variable:

export ERL_CRASH_DUMP=/var/log/erl_crash.dump

or at runtime:

erlang:system_flag(crash_dump, "/var/log/erl_crash.dump").

Basic Structure of a Crash Dump

A crash dump consists of multiple sections. Below is a truncated example:

=erl_crash_dump:0.5
Sun Feb 18 13:45:52 2025
Slogan: eheap_alloc: Cannot allocate 1048576 bytes of memory (of type "heap").
System version: Erlang/OTP 26 [erts-13.1] [source] [64-bit]
Compiled: Fri Jan 26 14:10:07 2025
Taints: none
Atoms: 18423
Processes: 482
Memory: 2147483648
=memory
total: 2147483648
processes: 1807483648
ets: 107374182
binary: 32212254
code: 5242880

This dump suggests that the system crashed due to a memory allocation failure (Cannot allocate 1048576 bytes of memory).

Key Sections in a Crash Dump

Slogan Indicates the reason for the crash. Common slogans include:
- eheap_alloc: Cannot allocate X bytes of memory (Memory exhaustion)
- Init terminating in do_boot () (Pobably an erro in the boot script)
- Could not start kernel pid (Probably a bad argument in config)
System Information Contains details about the runtime:
- System version: The Erlang/OTP version and build details
- Compiled: When the system was built
- Taints: Whether external native code (NIFs) are running
Memory Usage Displays the memory distribution:
- Total: Total memory usage
- Processes: Memory used by processes (high values suggest memory leaks)
- ETS: Erlang Term Storage usage (can be a problem if growing uncontrollably)
- Binary: Memory allocated for binaries (can be a source of leaks)
- Code: Loaded code memory footprint
Process List Provides details about active processes, this section is crucial for identifying:
- Processes consuming excessive memory (Stack+Heap size)
- Processes stuck in an infinite loop (Reductions count abnormally high)
- Message queue overload (Messages field growing indefinitely)
Ports and Drivers This lists open ports and drivers, which can be useful if external system interactions (files, sockets, databases) are suspected as crash causes.
Loaded Modules This helps determine if dynamically loaded code (e.g., via code:load_file/1) caused the crash.

Analyzing a Crash Dump

Erlang provides a built-in tool for parsing crash dumps: crashdump_viewer.

To start it:

crashdump_viewer:start().

This provides a graphical interface to inspect the crash dump.

Investigating Why Crash Dumps May Not Be Generated

Sometimes, a system crash does not result in an erl_crash.dump file. Here’s why and how to fix it.

Crash Dumps Disabled

Erlang allows enabling/disabling crash dumps via:

erlang:system_flag(dump_on_exit, true).

Ensure it’s enabled:

ERL_CRASH_DUMP=/var/log/erl_crash.dump

or via sys.config:

[{kernel, [{error_logger, {file, "/var/log/erl_crash.dump"}}]}].

Insufficient Permissions

Ensure the process running Erlang has write permissions to the intended dump directory:

sudo chmod 777 /var/log/erl_crash.dump

Check the ownership:

ls -l /var/log/erl_crash.dump

If needed, change ownership:

sudo chown erlang_user /var/log/erl_crash.dump

Crashing Before Dump Can Be Written

If the system runs out of memory before writing the dump, you may need to reserve memory:

erlang:system_flag(reserved_memory, 1000000).

Or increase swap space.

System-Wide Limits

Linux/macOS system limits may prevent dump generation. Check:

ulimit -a

If core file size is 0, enable it:

ulimit -c unlimited

On macOS:

sudo launchctl limit core unlimited

Crash Inside NIFs

If a Native Implemented Function (NIF) crashes, Erlang might not handle it gracefully. In such a case, running the BEAM emulator under a debugger like gdb can help you inspect the state of the system at the point of the crash.

Debugging the Runtime System

Understanding and diagnosing issues within the Erlang runtime system (BEAM) can be challenging due to its complexity. However, utilizing tools like the GNU Debugger (GDB) can significantly aid in this process. This section provides an overview of using GDB to debug the BEAM, including setting up the environment and employing GDB macros to streamline the debugging workflow.

Using GDB

GDB is a powerful tool for debugging applications at the machine level, offering insights into the execution of compiled programs. When applied to the BEAM, GDB allows developers to inspect the state of the Erlang virtual machine during execution or after a crash.

To effectively use GDB with the BEAM, it’s beneficial to compile the Erlang runtime system with debugging symbols. This compilation provides detailed information during debugging sessions.

See [Alternative Beam emulator builds] for instructions on compiling and running a version of Erlang with debugging information.

Once built, you can run the debug version of the BEAM out of the build directory using the cerl launch script:

bin/cerl -debug

The easiest way to run Erlang in GDB is to use the -rgdb flag with cerl, making sure to also select the debug version:

bin/cerl -rgdb -debug

Once you get to the GDB prompt, you can set breakpoints etc., and when you’re ready, start execution of the BEAM with:

(gdb) run

If you have a OS core dump from a crashed execution (not an Erlang crash dump, which is a different thing, see Crash Dumps in Erlang), you can run cerl with the -rcore flag instead to launch GDB:

bin/cerl -rcore <core file>

Note

If you are comfortable with using the Emacs editor, you can use cerl with the flags -gdb and -core (no leading r), which launch an Emacs instance to work as an IDE for the debugging session. (By setting EMACS=emacsclient first, you can even make it run in an existing Emacs if you have done a M-x server-start.) See the Emacs GDB documentation for more details.

If you have a running BEAM already that you want to attach GDB to, you need to find its OS process ID. For example, using a separate shell window to enter

pgrep -l beam

which should print a result like

3140019 beam.debug.smp

you can then launch GDB with the path to the actual executable file in use and the process ID, like this:

gdb bin/x86_64-unknown-linux-gnu/beam.debug.smp 3140019

(Note that you cannot tell GDB to use the path bin/cerl or bin/erl, since those are just shell scripts that set up the proper environment variables for the BEAM executable.)

Your OS might by default restrict attaching to running processes - even those you own. How to reconfigure this is out of scope for this book.

Using GDB Macros

GDB macros can automate repetitive tasks and provide shortcuts for complex commands, enhancing the efficiency of your debugging sessions. The Erlang runtime includes a set of predefined GDB macros, known as the Erlang Pathologist Toolkit (ETP), which facilitate the inspection of various aspects of the BEAM, such as internal BEAM structures, process states, memory allocation, and scheduling information.

The ETP macros can be found in erts/etc/unix/etp-commands. They are automatically loaded into your GDB session when you launch it via the cerl script as described in the previous section. You should then be able to use macros like etp-process-info to retrieve detailed information about a specific Erlang process:

etp-process-info <process_pointer>

Replace <process_pointer> with the actual pointer to the process control block (PCB) you’re interested in. These macros simplify the process of extracting meaningful data from the BEAM’s internal structures.

For a comprehensive guide on debugging the BEAM using GDB and employing these macros, refer to Debugging the BEAM and Debug emulator documentation. These resources provide in-depth instructions and examples to assist you in effectively diagnosing and resolving issues within the Erlang runtime system.

SystemTap and DTrace

SystemTap and DTrace are powerful dynamic tracing frameworks that allow developers to analyze and monitor system behavior in real-time without modifying application code. These tools are particularly useful for investigating performance bottlenecks, debugging issues, and understanding system interactions at a low level. While both tools serve a similar purpose, they are designed for different operating systems—SystemTap is widely used on Linux, while DTrace is predominantly used on Solaris, macOS, and BSD variants.

Using these tools with Erlang can provide deep insights into the behavior of the BEAM virtual machine, process scheduling, garbage collection, and inter-process communication.

Introduction to SystemTap and DTrace

SystemTap and DTrace operate by inserting dynamically generated probes into running kernel and user-space applications. These probes capture real-time data, allowing developers to inspect and analyze program execution without stopping or modifying the application.

SystemTap: Developed for Linux, SystemTap enables monitoring of kernel events, user-space programs, and runtime behavior using scripting. It is commonly used for profiling, fault detection, and system introspection.
DTrace: Originally developed by Sun Microsystems for Solaris, DTrace provides similar tracing capabilities with a robust scripting language. It is widely used on macOS, FreeBSD, and SmartOS.

Both tools allow developers to measure function execution times, trace system calls, inspect memory usage, and capture event-based data critical for optimizing performance and debugging complex applications.

Using SystemTap and DTrace with Erlang

To use SystemTap and DTrace with Erlang, you need to enable the necessary tracing support in the BEAM runtime system. This allows inserting probes into the virtual machine to monitor function calls, message passing, garbage collection, and scheduling events.

Using SystemTap with Erlang

SystemTap scripts rely on user-space markers embedded in the BEAM emulator. These markers allow SystemTap to hook into various internal events. To use SystemTap with Erlang:

Ensure SystemTap is installed (on Linux distributions such as Ubuntu, Fedora, or CentOS):

sudo apt-get install systemtap systemtap-sdt-dev

or

sudo dnf install systemtap systemtap-devel

Enable Erlang’s SystemTap probes: The BEAM VM includes support for SystemTap, but it must be compiled with --enable-systemtap:

./configure --enable-systemtap
make

List available probes: To check which probes are available in the BEAM runtime:

stap -L 'process("*beam.smp").mark("*")'

Write a SystemTap script: The following example traces function calls in the BEAM VM:

probe process("beam.smp").mark("function_entry") {
    printf("Function call in BEAM: %s\n", user_string($arg1))
}

Run the script: Execute the script to start tracing:

sudo stap my_script.stp

This allows developers to observe function calls, detect bottlenecks, and debug performance issues in real-time.

Using DTrace with Erlang

DTrace integrates directly with the BEAM runtime, offering deep visibility into system operations. It allows tracing function calls, memory allocation, garbage collection, and inter-process communication.

Dtrace works best on Solaris. There is a Linux version bundled with systemtap, but it is not as powerful as the Solaris version.

On macOS, DTrace is pre-installed. On Ubuntu, it can be installed via:

sudo apt-get install systemtap-sdt-dev

The BEAM VM includes built-in DTrace support. If needed, rebuild Erlang with DTrace support:

./configure --with-dtrace
make

Write a simple DTrace script: The following script traces Erlang function calls:

syscall::write:entry
/execname == "beam.smp"/ {
    printf("Erlang process writing output\n");
}

Run the script: Execute DTrace to start tracing:

sudo dtrace -s my_script.d

This provides a non-intrusive way to monitor the internal behavior of the BEAM virtual machine in real-time.

Files

debugging.asciidoc

Latest commit

History