This chapter goes into the various methods for finding and fixing bugs without disrupting the services in progress. We will explore testing techniques, tools, and frameworks that aid in testing and debugging your code. We’ll also shed light on some common bug sources, such as deadlocks, message overflow, and memory issues, providing guidance on identifying and resolving these problems.
Debugging is the process of identifying and eliminating errors, or "bugs," from software. While Erlang offers step-by-step debugging tools like the Debugger, the most effective debugging methods often rely on Erlang’s tracing facilities. These facilities will be thoroughly discussed in Chapter [CH-Tracing]. In this chapter We will touch on system level tracing witth dtrace and systemtap.
This chapter also explores the concept of "Crash Dumps," which are human-readable text files generated by the Erlang Runtime System when an unrecoverable error occurs, such as running out of memory or reaching an emulator limit. Crash Dumps are invaluable for post-mortem analysis of Erlang nodes, and you will learn how to interpret and understand them.
In addition to these topics, this chapter will also discuss different testing methodologies, including EUnit and Common Test, which are crucial for ensuring the reliability and robustness of your code. The importance of mocking in testing will be examined, along with its best practices.
You will become acquainted with the "let it crash" principle and the ways to effectively implement it within your system. You’ll gain insights into the workings of exceptions and supervisor tree design.
By the end of this chapter, you’ll be equipped with the knowledge to systematically test your system and its individual components. You will be able to identify common mistakes and problems, and possibly even picking up some debugging philosophy along the way.
Debugging is an essential part of software development, and in Erlang, it takes on a unique approach due to the language’s fault-tolerant design. Rather than focusing solely on preventing failures, Erlang encourages a reactive debugging philosophy—detecting, diagnosing, and recovering from errors effectively. Debugging in Erlang involves leveraging systematic approaches, analyzing failures in production, and continuously improving code quality by learning from mistakes.
A structured approach to debugging can significantly reduce the time and effort required to identify and resolve issues. Debugging in Erlang follows a methodical process that involves observation, isolation, and testing.
Before fixing a bug, you need to reproduce it consistently. Some techniques for reproducing issues in Erlang systems include:
-
Running the system with detailed logging (
lager
,logger
). -
Using tracing tools like
dbg
orrecon
to capture function calls and message passing. -
Simulating failure scenarios with controlled test environments.
Example: Enabling tracing to inspect function calls in a module:
dbg:tracer().
dbg:p(all, c).
dbg:tpl(my_module, my_function, []). % Trace all calls to my_function
If the issue occurs sporadically, running the system under load testing with tools like prop_er
or Common Test
can help uncover race conditions.
Once the issue is reproducible, the next step is isolating the problem to a specific module, process, or function:
-
Check process message queues using:
A long message queue could indicate a performance bottleneck.erlang process_info(Pid, messages).
-
Inspect ETS tables and memory usage:
erlang ets:info(my_table, size). erlang:memory().
-
Use selective tracing to focus only on processes related to the issue:
erlang dbg:p(self(), [m]). % Trace only the current process
By isolating the faulty component, you narrow the scope of debugging and avoid unnecessary distractions.
Logs and crash dumps provide valuable information about system failures. When an Erlang node crashes, it generates an erl_crash.dump
file containing details such as:
-
The reason for the crash (e.g., memory exhaustion, infinite loops, deadlocks).
-
Process states at the time of failure.
-
The call stack of the crashing process.
Example: Checking a crash dump’s memory usage section:
=memory
total: 2147483648
processes: 1807483648
ets: 107374182
binary: 32212254
code: 5242880
If process memory is abnormally high, it could indicate a memory leak.
For real-time debugging, use crashdump_viewer
:
crashdump_viewer:start().
Erlang provides powerful runtime debugging tools to analyze system behavior:
-
Observer GUI (
observer:start()
) – Interactive process monitoring. -
dbg
andrecon
– Low-level tracing and inspection. -
SystemTap
orDTrace
– Kernel-level profiling for advanced debugging.
Using the right tool for the job prevents unnecessary code modifications and speeds up debugging.
Every bug presents an opportunity to improve the codebase and prevent future issues. Erlang’s philosophy of resilience and self-healing extends to how developers handle mistakes and refine their systems.
After fixing a critical bug, analyze why it happened and how to prevent it. A post-mortem analysis should answer:
-
What was the root cause of the issue?
-
How did it impact the system?
-
How can similar bugs be prevented?
If a process crashed due to an unexpected message, ensure message filtering is robust:
handle_info(_Unexpected, State) ->
{noreply, State}.
Many issues arise due to insufficient logging and monitoring. Improving system observability includes:
-
Using structured logging (
lager
,logger
) with log levels:erlang logger:log(info, "User logged in: ~p", [UserId]).
-
Implementing real-time monitoring:
erlang recon:bin_leak(10). % Detects potential memory leaks.
Better logging helps detect anomalies before they escalate into major failures.
Well-structured code is easier to debug. Following Erlang best practices improves maintainability:
-
Use clear function names (
handle_request/1
instead ofdo_it/1
). -
Follow the OTP design principles (
gen_server
,supervisor
). -
Write modular code to make debugging easier.
Example: Instead of complex nested case statements:
case Result of
{ok, Data} -> process(Data);
{error, _} -> handle_error()
end.
Use pattern matching for clarity:
process_request({ok, Data}) -> process(Data);
process_request({error, _}) -> handle_error().
Erlang’s Let It Crash philosophy means processes should fail quickly when an error occurs instead of propagating invalid state.
Example: Enforcing fail-fast behavior with guards:
handle_request({ok, Data}) when is_list(Data) ->
process(Data);
handle_request(_) ->
exit(bad_request).
Fail-fast mechanisms prevent silent failures and make debugging easier.
Many production-grade Erlang applications are open source. Studying their debugging practices provides valuable insights:
-
RabbitMQ – Uses structured logging and monitoring tools.
-
MongooseIM – Implements extensive tracing.
-
Riak – Employs distributed fault recovery techniques.
Exploring these projects improves debugging skills and enhances system design knowledge.
Software systems often exhibit recurring types of failures that can impact stability and performance. In Erlang, despite its design for fault tolerance, certain categories of bugs appear frequently. This section explores some of the most common sources of issues in Erlang applications, including deadlocks, mailbox overflow, and memory issues. Understanding these problems and learning how to diagnose and resolve them can help in writing more reliable and efficient Erlang programs.
Deadlocks occur when two or more processes are waiting for each other to release resources, leading to a state where no progress can be made. This is a common problem in concurrent systems, including those built with Erlang’s lightweight processes.
Deadlocks in Erlang typically arise due to:
-
Circular dependencies: Two processes each waiting for a resource held by the other.
-
Misused locks: When using
gen_server
orgen_fsm
, incorrect ordering of message handling can lead to deadlocks. -
Blocking calls inside
gen_server
: Callinggen_server:call/2
within ahandle_call/3
callback can cause the process to block indefinitely.
To identify deadlocks:
-
Process inspection: Use
observer:start().
orprocess_info(Pid, status).
to check for stuck processes. -
Tracing with
dbg
: Enable function call tracing to determine where processes are waiting indefinitely. -
Message queue analysis: If a process is waiting for a message that never arrives, check its mailbox using
process_info(Pid, messages).
Use timeouts in blocking operations:
gen_server:call(Server, Request, Timeout).
Setting a reasonable timeout prevents indefinite blocking.
Use asynchronous calls (gen_server:cast/2
) or monitor messages (erlang:monitor/2
) to avoid blocking.
Ensure that all locks are acquired in a consistent order across processes to prevent cyclic dependencies.
Implement periodic checks that monitor process status and forcefully restart deadlocked processes.
Erlang’s message-passing model allows processes to receive messages asynchronously via mailboxes. However, if a process accumulates messages faster than it can process them, the mailbox can grow indefinitely, leading to high memory consumption or crashes.
There are some common causes and symptoms of message overflows:
-
Slow message processing: A
gen_server
that takes too long to handle requests can lead to unprocessed messages piling up. -
Excessive message generation: Processes sending frequent messages without checking backpressure.
-
Unprocessed system messages: Failure to handle system messages like
gen_server:handle_info/2
.
Symptoms include:
-
Increasing memory usage (
process_info(Pid, memory).
) -
Long process message queues (
process_info(Pid, message_queue_len).
) -
Unresponsive processes that appear idle but are overloaded.
Monitor message queue length:
process_info(Pid, message_queue_len).
Use monitoring tools to trigger alerts when queues grow beyond a threshold.
Rate-limiting senders
-
Use backpressure mechanisms, such as asking for explicit acknowledgments before sending more messages.
-
Implement flow control: Instead of blindly sending messages, a producer can check the consumer’s load.
Use selective receive properly
Avoid patterns like:
receive {specific_message, Data} -> process(Data) end.
which ignores other pending messages, causing an ever-growing mailbox. An exception to this rule is when you use the Ref-trick for a rpc-type send and receive. See [Ref-Trick] for more information.
Offload heavy computation:
-
Offload expensive operations to worker processes instead of doing them in the main process loop.
-
Use
gen_server:reply/2
to respond to messages asynchronously after processing.
Erlang’s memory model relies on per-process heaps, garbage collection, and a binary allocator. While designed for efficiency, improper memory usage can lead to performance degradation.
Memory leaks in Erlang often stem from:
-
Long-lived processes accumulating state: ETS tables, large lists, or unprocessed messages.
-
Unbounded message queues: Processes that receive but never consume messages.
-
Binary data accumulation: Large binaries can cause high memory fragmentation.
Check individual process memory usage:
process_info(Pid, memory).
Use observer:start().
and navigate to the "Processes" tab to identify processes consuming excessive memory.
Enable tracing on memory allocations using:
recon_alloc:memory(ets).
Large binaries are managed separately from process heaps using reference counting. Issues arise when:
-
Processes hold onto binary references longer than needed.
-
Unused large binaries remain due to delayed garbage collection.
Solutions:
Convert large binaries to smaller chunks:
binary:split(BigBinary, <<"\n">>).
Force garbage collection:
erlang:garbage_collect(Pid).
This reclaims memory used by binaries if the process is no longer referencing them. This can be important in relaying processes that are not using the binaries anymore, but they hang on to a reference to them. Remember that binaries are reference counted and live across processes.
Monitor binary memory allocation:
erlang:memory(binary).
Erlang provides several system flags that control heap allocation behavior.
min_heap_size
(Minimum Process Heap Size)
-
Defines the initial heap size for a newly created process.
-
Helps avoid frequent heap expansions if a process is expected to handle large amounts of data.
-
Default is typically 233 words, but increasing it slightly (e.g., 256 or 512) can improve performance for processes that grow quickly.
Example Usage You can configure this setting for a process using:
spawn_opt(fun() -> my_function() end, [{min_heap_size, 512}]).
or apply it globally via:
erl +hms 512
This ensures that all new processes start with a heap of at least 512 words, reducing the need for frequent heap expansions.
min_bin_vheap_size
(Minimum Binary Virtual Heap Size)**
-
Controls the virtual heap size for reference-counted binaries (binaries > 64 bytes).
-
Helps optimize memory allocation for processes dealing with large binary data.
-
Default is 46422, but for binary-heavy workloads, you might tune it to 512 or higher.
spawn_opt(fun() -> handle_large_binaries() end, [{min_bin_vheap_size, 100000}]).
This ensures the process starts with enough binary heap space, preventing frequent reallocations.
Optimize full-sweep garbage collection thresholds (fullsweep_after
).
Use ETS efficiently
-
Regularly clean up unused entries to avoid memory bloat.
-
Prefer
set
tables overbag
orordered_set
unless necessary.
Be mindful of passing large terms, if they are long lived and shared. Instead of sending large terms between processes, use references (e.g., store large data in ETS or a database and send references).
Erlang’s “Let It Crash” principle is a fundamental philosophy in designing fault-tolerant and resilient systems. Instead of writing defensive code to handle every possible error, Erlang developers embrace failure and rely on supervisor trees to detect and recover from crashes. This approach simplifies code, improves maintainability, and ensures that systems remain robust even in the face of unexpected errors.
In traditional programming, error handling often involves writing extensive try-catch
statements and defensive code to anticipate failures. This approach, however, introduces complexity and can lead to hard-to-maintain codebases. Erlang takes a different approach by accepting that failures will happen and focusing on automatic recovery rather than exhaustive error prevention.
The rationale behind "Let It Crash" is:
-
Isolation of failures: Since each Erlang process runs independently, a crash in one process does not affect others.
-
Automatic recovery: Supervisors monitor processes and restart them when they fail.
-
Simpler code: Developers write less defensive code and focus on business logic rather than error handling.
-
Fault containment: By letting processes crash and restart in a controlled manner, errors are prevented from spreading.
This philosophy makes Erlang systems highly resilient, particularly in distributed environments where failures are inevitable.
Erlang provides built-in mechanisms for handling exceptions, but instead of focusing on recovering from every error locally, it encourages process termination and restart through supervision.
Erlang has three main types of exceptions:
-
Errors (
error:Reason
) – Occur due to serious faults like division by zero or calling an undefined function. -
Throws (
throw:Reason
) – Used for non-local returns and controlled exits. -
Exits (
exit:Reason
) – Occur when a process terminates unexpectedly or intentionally.
While defensive programming discourages crashes, Erlang allows you to handle exceptions explicitly if needed:
try 1 / 0 of
Result -> io:format("Result: ~p~n", [Result])
catch
error:badarith -> io:format("Cannot divide by zero!~n")
end.
This is useful in cases where immediate local handling is required, but most failures in Erlang are left to crash and be handled by supervisors.
Instead of handling errors inside every function, Erlang applications rely on supervisor trees, a hierarchical structure where supervisors monitor worker processes and restart them upon failure.
A supervisor tree consists of:
-
Supervisor: A special process that manages worker processes and other supervisors.
-
Workers: The actual processes performing computations. If they crash, the supervisor decides how to restart them.
-module(my_supervisor).
-behaviour(supervisor).
-export([start_link/0, init/1]).
start_link() ->
supervisor:start_link(?MODULE, []).
init([]) ->
{ok, {{one_for_one, 3, 10},
[{worker1, {my_worker, start_link, []}, permanent, 5000, worker, [my_worker]}]}}.
This supervisor ensures that if my_worker
crashes, it will be restarted automatically.
Supervisors can follow different restart strategies:
-
one_for_one: Restart only the crashed process (most common).
-
one_for_all: Restart all child processes if one fails.
-
rest_for_one: Restart the failed process and all those started after it.
-
simple_one_for_one: Used when dynamically spawning similar worker processes.
Debugging is essential when dealing with unexpected behavior in Erlang applications. Several tools exist in the Erlang ecosystem.
The dbg
module provides powerful tracing capabilities for debugging live systems with minimal impact on performance.
To start the dbg
tool:
1> dbg:tracer().
{ok,<0.85.0>}
This sets up a tracer process to collect debug information. You can choose different backends for output:
-
dbg:tracer(console).
→ Print to the shell -
dbg:tracer(port, file:open("trace.log", [write])).
→ Write to a file
Once tracing is enabled, you can attach tracers to processes or functions.
Tracing All Function Calls
dbg:p(all, c). % Trace all function calls in all processes
Tracing a Specific Function
dbg:tpl(my_module, my_function, []). % Trace calls to my_function/0
Setting a Conditional Trace
Trace only when a function argument matches:
dbg:tpl(my_module, my_function, [{'_', [], [{message, "Function called"}]}]).
Breakpoints are useful when stepping through code execution. Start the graphical debugger:
debugger:start().
Then, set breakpoints in a module:
int:break(my_module, my_function, Arity).
Once a function is traced, calls and returns are logged.
Example trace output:
(<0.85.0>) call my_module:my_function(42)
(<0.85.0>) returned from my_function -> "Result: 42"
This allows you to track how values change throughout execution.
The Erlang Debugger (EDB) is a modern, feature-rich debugger for Erlang applications. It provides a language server interface for setting breakpoints, inspecting variables, and stepping through code execution. See https://whatsapp.github.io/edb/
In order to use EDB, you need to build Erlang from source with EDB support. Future versions of Erlang OTP might be shipped with EDB support. Here is a guide on how to build Erlang from source with EDB support:
git clone https://github.com/WhatsApp/edb.git
git submodule update --init
pushd otp
./configure --prefix $(pwd)/../otp-bin
make -j$(nproc)
make -j$(nproc) install
popd
rebar3 escriptize
Then you can start EDB with:
_build/default/bin/edb dap
Crash dumps provide information for diagnosing failures in Erlang systems. They contain details about system state, memory usage, process information, and call stacks at the time of a crash. Understanding how to interpret these files can significantly speed up debugging and prevent future crashes.
A crash dump (erl_crash.dump
) is a snapshot of the Erlang runtime system (ERTS) at the time of an abnormal termination. It includes:
-
System version and runtime parameters
-
Memory usage statistics
-
Loaded modules
-
Process states and call stacks
-
Port and driver information
By analyzing crash dumps, you can determine why a system crashed—whether due to memory exhaustion, infinite loops, deadlocks, or other failures.
The official documentation provides a detailed explanation of the crash dump format: How to Interpret the Erlang Crash Dumps. We will cover the basics here.
By default, crash dumps are saved in the working directory where the Erlang system was started. The filename is typically:
erl_crash.dump
You can change the location by setting the environment variable:
export ERL_CRASH_DUMP=/var/log/erl_crash.dump
or at runtime:
erlang:system_flag(crash_dump, "/var/log/erl_crash.dump").
A crash dump consists of multiple sections. Below is a truncated example:
=erl_crash_dump:0.5
Sun Feb 18 13:45:52 2025
Slogan: eheap_alloc: Cannot allocate 1048576 bytes of memory (of type "heap").
System version: Erlang/OTP 26 [erts-13.1] [source] [64-bit]
Compiled: Fri Jan 26 14:10:07 2025
Taints: none
Atoms: 18423
Processes: 482
Memory: 2147483648
=memory
total: 2147483648
processes: 1807483648
ets: 107374182
binary: 32212254
code: 5242880
This dump suggests that the system crashed due to a memory allocation failure (Cannot allocate 1048576 bytes of memory
).
-
Slogan Indicates the reason for the crash. Common slogans include:
-
eheap_alloc: Cannot allocate X bytes of memory
(Memory exhaustion) -
Init terminating in do_boot ()
(Pobably an erro in the boot script) -
Could not start kernel pid
(Probably a bad argument in config)
-
-
System Information Contains details about the runtime:
-
System version
: The Erlang/OTP version and build details -
Compiled
: When the system was built -
Taints
: Whether external native code (NIFs) are running
-
-
Memory Usage Displays the memory distribution:
-
Total
: Total memory usage -
Processes
: Memory used by processes (high values suggest memory leaks) -
ETS
: Erlang Term Storage usage (can be a problem if growing uncontrollably) -
Binary
: Memory allocated for binaries (can be a source of leaks) -
Code
: Loaded code memory footprint
-
-
Process List Provides details about active processes, this section is crucial for identifying:
-
Processes consuming excessive memory (
Stack+Heap
size) -
Processes stuck in an infinite loop (
Reductions
count abnormally high) -
Message queue overload (
Messages
field growing indefinitely)
-
-
Ports and Drivers This lists open ports and drivers, which can be useful if external system interactions (files, sockets, databases) are suspected as crash causes.
-
Loaded Modules This helps determine if dynamically loaded code (e.g., via
code:load_file/1
) caused the crash.
Erlang provides a built-in tool for parsing crash dumps: crashdump_viewer
.
To start it:
crashdump_viewer:start().
This provides a graphical interface to inspect the crash dump.
Sometimes, a system crash does not result in an erl_crash.dump
file. Here’s why and how to fix it.
Erlang allows enabling/disabling crash dumps via:
erlang:system_flag(dump_on_exit, true).
Ensure it’s enabled:
ERL_CRASH_DUMP=/var/log/erl_crash.dump
or via sys.config
:
[{kernel, [{error_logger, {file, "/var/log/erl_crash.dump"}}]}].
Ensure the process running Erlang has write permissions to the intended dump directory:
sudo chmod 777 /var/log/erl_crash.dump
Check the ownership:
ls -l /var/log/erl_crash.dump
If needed, change ownership:
sudo chown erlang_user /var/log/erl_crash.dump
If the system runs out of memory before writing the dump, you may need to reserve memory:
erlang:system_flag(reserved_memory, 1000000).
Or increase swap space.
Linux/macOS system limits may prevent dump generation. Check:
ulimit -a
If core file size
is 0
, enable it:
ulimit -c unlimited
On macOS:
sudo launchctl limit core unlimited
Understanding and diagnosing issues within the Erlang runtime system (BEAM) can be challenging due to its complexity. However, utilizing tools like the GNU Debugger (GDB) can significantly aid in this process. This section provides an overview of using GDB to debug the BEAM, including setting up the environment and employing GDB macros to streamline the debugging workflow.
GDB is a powerful tool for debugging applications at the machine level, offering insights into the execution of compiled programs. When applied to the BEAM, GDB allows developers to inspect the state of the Erlang virtual machine during execution or after a crash.
To effectively use GDB with the BEAM, it’s beneficial to compile the Erlang runtime system with debugging symbols. This compilation provides detailed information during debugging sessions.
See [Alternative Beam emulator builds] for instructions on compiling and running a version of Erlang with debugging information.
Once built, you can run the debug version of the BEAM out of the build
directory using the cerl
launch script:
bin/cerl -debug
The easiest way to run Erlang in GDB is to use the -rgdb
flag with
cerl
, making sure to also select the debug version:
bin/cerl -rgdb -debug
Once you get to the GDB prompt, you can set breakpoints etc., and when you’re ready, start execution of the BEAM with:
(gdb) run
If you have a OS core dump from a crashed execution (not an Erlang crash
dump, which is a different thing, see Crash Dumps in Erlang), you can
run cerl
with the -rcore
flag instead to launch GDB:
bin/cerl -rcore <core file>
Note
|
If you are comfortable with using the Emacs editor, you can use
cerl with the flags -gdb and -core (no leading r ), which launch an
Emacs instance to work as an IDE for the debugging session. (By setting
EMACS=emacsclient first, you can even make it run in an existing Emacs if
you have done a M-x server-start .) See the
Emacs
GDB documentation for more details.
|
If you have a running BEAM already that you want to attach GDB to, you need to find its OS process ID. For example, using a separate shell window to enter
pgrep -l beam
which should print a result like
3140019 beam.debug.smp
you can then launch GDB with the path to the actual executable file in use and the process ID, like this:
gdb bin/x86_64-unknown-linux-gnu/beam.debug.smp 3140019
(Note that you cannot tell GDB to use the path bin/cerl
or bin/erl
,
since those are just shell scripts that set up the proper environment
variables for the BEAM executable.)
Your OS might by default restrict attaching to running processes - even those you own. How to reconfigure this is out of scope for this book.
GDB macros can automate repetitive tasks and provide shortcuts for complex commands, enhancing the efficiency of your debugging sessions. The Erlang runtime includes a set of predefined GDB macros, known as the Erlang Pathologist Toolkit (ETP), which facilitate the inspection of various aspects of the BEAM, such as internal BEAM structures, process states, memory allocation, and scheduling information.
The ETP macros can be found in erts/etc/unix/etp-commands
. They are
automatically loaded into your GDB session when you launch it via the
cerl
script as described in the previous section. You should then be able
to use macros like etp-process-info
to retrieve detailed information
about a specific Erlang process:
etp-process-info <process_pointer>
Replace <process_pointer>
with the actual pointer to the process control block (PCB) you’re interested in. These macros simplify the process of extracting meaningful data from the BEAM’s internal structures.
For a comprehensive guide on debugging the BEAM using GDB and employing these macros, refer to Debugging the BEAM and Debug emulator documentation. These resources provide in-depth instructions and examples to assist you in effectively diagnosing and resolving issues within the Erlang runtime system.
SystemTap and DTrace are powerful dynamic tracing frameworks that allow developers to analyze and monitor system behavior in real-time without modifying application code. These tools are particularly useful for investigating performance bottlenecks, debugging issues, and understanding system interactions at a low level. While both tools serve a similar purpose, they are designed for different operating systems—SystemTap is widely used on Linux, while DTrace is predominantly used on Solaris, macOS, and BSD variants.
Using these tools with Erlang can provide deep insights into the behavior of the BEAM virtual machine, process scheduling, garbage collection, and inter-process communication.
SystemTap and DTrace operate by inserting dynamically generated probes into running kernel and user-space applications. These probes capture real-time data, allowing developers to inspect and analyze program execution without stopping or modifying the application.
-
SystemTap: Developed for Linux, SystemTap enables monitoring of kernel events, user-space programs, and runtime behavior using scripting. It is commonly used for profiling, fault detection, and system introspection.
-
DTrace: Originally developed by Sun Microsystems for Solaris, DTrace provides similar tracing capabilities with a robust scripting language. It is widely used on macOS, FreeBSD, and SmartOS.
Both tools allow developers to measure function execution times, trace system calls, inspect memory usage, and capture event-based data critical for optimizing performance and debugging complex applications.
SystemTap scripts rely on user-space markers embedded in the BEAM emulator. These markers allow SystemTap to hook into various internal events. To use SystemTap with Erlang:
-
Ensure SystemTap is installed (on Linux distributions such as Ubuntu, Fedora, or CentOS):
sudo apt-get install systemtap systemtap-sdt-dev
or
sudo dnf install systemtap systemtap-devel
-
Enable Erlang’s SystemTap probes: The BEAM VM includes support for SystemTap, but it must be compiled with
--enable-systemtap
:
./configure --enable-systemtap
make
-
List available probes: To check which probes are available in the BEAM runtime:
stap -L 'process("*beam.smp").mark("*")'
-
Write a SystemTap script: The following example traces function calls in the BEAM VM:
probe process("beam.smp").mark("function_entry") {
printf("Function call in BEAM: %s\n", user_string($arg1))
}
-
Run the script: Execute the script to start tracing:
sudo stap my_script.stp
This allows developers to observe function calls, detect bottlenecks, and debug performance issues in real-time.
DTrace integrates directly with the BEAM runtime, offering deep visibility into system operations. It allows tracing function calls, memory allocation, garbage collection, and inter-process communication.
Dtrace works best on Solaris. There is a Linux version bundled with systemtap, but it is not as powerful as the Solaris version.
On macOS, DTrace is pre-installed. On Ubuntu, it can be installed via:
sudo apt-get install systemtap-sdt-dev
The BEAM VM includes built-in DTrace support. If needed, rebuild Erlang with DTrace support:
./configure --with-dtrace
make
-
Write a simple DTrace script: The following script traces Erlang function calls:
syscall::write:entry
/execname == "beam.smp"/ {
printf("Erlang process writing output\n");
}
-
Run the script: Execute DTrace to start tracing:
sudo dtrace -s my_script.d
This provides a non-intrusive way to monitor the internal behavior of the BEAM virtual machine in real-time.