Skip to content

Latest commit

 

History

History
886 lines (677 loc) · 37.4 KB

c.asciidoc

File metadata and controls

886 lines (677 loc) · 37.4 KB

Chapter: Interfacing other languages and extending BEAM and ERTS

Introduction

Interfacing C, C++, Rust, or assembler provides an opportunity to extend the capabilities of the BEAM. In this chapter, we will use C for most examples, but the methods described can be used to interface almost any other programming language. We will give some examples of Rust and Java in this chapter. In most cases, you can replace C with any other language in the rest of this chapter, but we will just use C for brevity.

By integrating C code, developers can enhance their Erlang applications' performance, especially for computationally intensive tasks requiring direct access to system-level resources. Additionally, interfacing with C allows Erlang applications to interact directly with hardware and system-level resources. This capability is crucial for applications that require low-level operations, such as manipulating memory, accessing specialized hardware, or performing real-time data processing. Another advantage of integrating C with Erlang is using existing C libraries and codebases. Many powerful libraries and tools are available in C, and by interfacing with them, Erlang developers can incorporate these functionalities without having to reimplement them in Erlang.

Furthermore, interfacing with C can help when precise control over execution is necessary. While Erlang’s virtual machine provides excellent concurrency management, certain real-time applications may require more deterministic behavior that can be better achieved with C. By integrating C code, developers can fine-tune the performance and behavior of their applications to meet specific requirements.

C code can also extend ERTS and BEAM since they are written in C.

In previous chapters, we have seen how you can safely interface other applications and services over sockets or ports. This chapter will look at ways to interface low-level code more directly, which also means using it more unsafely.

The official documentation contains a tutorial on interoperability, see Interoperability Tutorial.

Safe Ways of Interfacing C Code

Interfacing C code with Erlang can be done safely using several mechanisms that minimize the risk of destabilizing the BEAM virtual machine. Here are the primary methods.

os:cmd

The os:cmd function allows Erlang processes to execute shell commands and retrieve their output. This method is safe because it runs the command in a separate OS process, isolating it from the BEAM VM. By using os:cmd, developers can interact with external C programs without directly affecting the Erlang runtime environment. It comes with an overhead and the C program is expected to be a standalone program that can be run from the command line and return the result on standard output.

Example:

// time.c
#include <stdio.h>
#include <time.h>

void get_system_time()
{
    time_t rawtime;
    struct tm *timeinfo;

    time(&rawtime);
    timeinfo = localtime(&rawtime);

    printf("Current Date and Time: %s", asctime(timeinfo));
}

int main()
{
    get_system_time();
    return 0;
}
> os:cmd("./time").
"Current Date and Time: Mon May 20 04:46:37 2024\n"
'open_port' 'spawn_executable'

An even safer way to interact with a program, especially when arguments are based on user input, is to use open_port with the spawn_executable argument. This method mitigates the risk of argument injection by passing the arguments directly to the executable without involving an operating system shell. This direct passing prevents the shell from interpreting the arguments, thus avoiding potential injection attacks that could arise from special characters or commands in the user input.

1> Port = open_port({spawn_executable, "./time"}, [{args, []}, exit_status]).
#Port<0.7>
2> receive {Port, {data, R}} -> R after 1000 -> timeout end.
"Current Date and Time: Mon May 20 13:59:32 2024\n"

You can also spawn a port with a program that reads from standard input and writes to standard output, and send and receive data to and from the port.

Example:

Port = open_port({spawn, "./system_time"}, [binary]),
port_command(Port, <<"get_time\n">>).

See [CH-IO] for more details on how to use port drivers to connect to external programs.

Sockets

Sockets provide a straightforward way to enable communication between Erlang and external C programs. By using TCP or UDP sockets, C applications can exchange data with Erlang processes over the network, ensuring that both systems remain isolated. This method is particularly useful for distributed systems and allows for asynchronous communication.

The most common and easiest way is to use a REST-like interface over HTTP or HTTPS. There are Erlang libraries for both client and server implementations, such as httpc for HTTP clients and cowboy for HTTP servers. This approach allows C applications to expose APIs that Erlang processes can call, facilitating interaction over a well-defined protocol.

The next level is to use pure socket communication, which can be more efficient than HTTP/HTTPS but requires you to come up with a protocol yourself or use some other low-level protocol. This method allows for custom data exchange formats and can optimize performance by reducing the overhead associated with higher-level protocols.

See chapter [CH-IO] for details on how sockets works.

Linked-in Drivers

Linked-in drivers in Erlang allow for the integration of C code directly into the Erlang runtime system, enabling high-performance operations and seamless interaction with external resources.

Linked-in drivers offer faster communication between Erlang and C code compared to external ports, as they operate within the same process space, eliminating the overhead of inter-process communication.

For applications with stringent timing constraints, such as hardware interfacing or real-time data processing, linked-in drivers provide the necessary responsiveness.

However, linked-in drivers come with a higher risk of destabilizing the Erlang VM due to direct memory access and potential programming errors.

How to Implement a Linked-In Driver

Lets look at how to implement a linked-in driver in Erlang on three levels, first int the abstract what need to be done, then in some genarl psuedo code and finally in a full example.

To implement a linked-in driver in Erlang, follow these steps:

Step 1. Write the C code for the driver, handling:

  1. Driver initialization

    • Define the erl_drv_entry struct with pointers to callback functions.

    • Register the driver using the DRIVER_INIT macro.

  2. Implement asynchronous operations

    • Handle driver callbacks for start, stop, and output operations.

    • Manage I/O events using functions like driver_select.

  3. Manage resource allocation and deallocation

    • Define a struct to hold driver-specific data.

    • Allocate resources during driver start and release them during stop.

Step 2. Integrate with Erlang

  • Load the driver using erl_ddll:load_driver/2.

  • Create a port in Erlang to communicate with the driver.

Let os look at some psuedo code for a linked in driver going into a bit more detail of the steps above.

Step 1. Write the C code for the driver:

  1. Driver Initialization:

    • Define the erl_drv_entry Struct: This structure contains pointers to callback functions that the Erlang runtime system will invoke.

    • Register the Driver: Implement the DRIVER_INIT macro to initialize and register your driver with the Erlang runtime.

      ```c
      static ErlDrvEntry example_driver_entry = {
          NULL,                   // init
          example_drv_start,      // start
          example_drv_stop,       // stop
          example_drv_output,     // output
          NULL,                   // ready_input
          NULL,                   // ready_output
          "example_drv",          // driver_name
          NULL,                   // finish
          NULL,                   // handle
          NULL,                   // control
          NULL,                   // timeout
          NULL,                   // outputv
          NULL,                   // ready_async
          NULL,                   // flush
          NULL,                   // call
          NULL,                   // event
          ERL_DRV_EXTENDED_MARKER, // extended_marker
          ERL_DRV_EXTENDED_MAJOR_VERSION, // major_version
          ERL_DRV_EXTENDED_MINOR_VERSION, // minor_version
          0,                      // driver_flags
          NULL,                   // handle2
          NULL,                   // process_exit
          NULL                    // stop_select
      };
      DRIVER_INIT(example_drv) {
          return &example_driver_entry;
      }
      ```
  2. Asynchronous Operations:

    • Handle Driver Callbacks: Implement the necessary callback functions such as example_drv_start, example_drv_stop, and example_drv_output to manage driver operations.

    • Manage I/O Events: Utilize functions like driver_select to handle asynchronous I/O operations efficiently.

      ```c
      static ErlDrvData example_drv_start(ErlDrvPort port, char *command) {
          example_data* d = (example_data*)driver_alloc(sizeof(example_data));
          d->port = port;
          return (ErlDrvData)d;
      }
      static void example_drv_stop(ErlDrvData handle) {
          driver_free((char*)handle);
      }
      static void example_drv_output(ErlDrvData handle, char *buff, ErlDrvSizeT bufflen) {
          example_data* d = (example_data*)handle;
          // Process the input and produce output
          driver_output(d->port, output_data, output_len);
      }
      ```
  3. Resource Management:

    • Allocation and Deallocation: Ensure proper memory management by allocating resources during driver start and releasing them during driver stop to prevent memory leaks.

      ```c
      typedef struct {
          ErlDrvPort port;
          // Additional driver-specific data
      } example_data;
      ```

Step 2. Integration with Erlang:

  • Loading the Driver: Use erl_ddll:load_driver/2 to load the shared library containing your driver.

  • Opening the Port: Create a port in Erlang using open_port/2 with the {spawn, DriverName} tuple to communicate with the driver.

    ```erlang
    start(SharedLib) ->
        case erl_ddll:load_driver(".", SharedLib) of
            ok -> ok;
            {error, already_loaded} -> ok;
            _ -> exit({error, could_not_load_driver})
        end,
        Port = open_port({spawn, SharedLib}, []),
        loop(Port).
    ```

Example Implementation Now lets look at a full example of a linked in driver that doubles an integer.

Step 1. write the C Driver.

#include "erl_driver.h"
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct {
    ErlDrvPort port;
} double_data;

// Start function: initialize driver state
static ErlDrvData double_drv_start(ErlDrvPort port, char *command) {
    double_data* d = (double_data*)driver_alloc(sizeof(double_data));
    d->port = port;
    return (ErlDrvData)d;
}

// Stop function: clean up resources
static void double_drv_stop(ErlDrvData handle) {
    driver_free((char*)handle);
}

// Output function: process data sent from Erlang
static void double_drv_output(ErlDrvData handle, char *buff, ErlDrvSizeT bufflen) {
    double_data* d = (double_data*)handle;

    // Convert input buffer to an integer
    int input = atoi(buff);

    // Perform the operation (double the input value)
    int result = input * 2;

    // Convert the result back to a string
    char result_str[32];
    snprintf(result_str, sizeof(result_str), "%d", result);

    // Send the result back to Erlang
    driver_output(d->port, result_str, strlen(result_str));
}

// Define the driver entry struct
static ErlDrvEntry double_driver_entry = {
    NULL,                      // init
    double_drv_start,          // start
    double_drv_stop,           // stop
    double_drv_output,         // output
    NULL,                      // ready_input
    NULL,                      // ready_output
    "double_drv",              // driver_name
    NULL,                      // finish
    NULL,                      // handle
    NULL,                      // control
    NULL,                      // timeout
    NULL,                      // outputv
    NULL,                      // ready_async
    NULL,                      // flush
    NULL,                      // call
    NULL,                      // event
    ERL_DRV_EXTENDED_MARKER,   // extended marker
    ERL_DRV_EXTENDED_MAJOR_VERSION, // major version
    ERL_DRV_EXTENDED_MINOR_VERSION, // minor version
    0,                         // driver flags
    NULL,                      // handle2
    NULL,                      // process_exit
    NULL                       // stop_select
};

// Driver initialization macro
DRIVER_INIT(double_drv) {
    return &double_driver_entry;
}

Step 2 wrtte the Erlang module.

-module(double).
-export([start/0, stop/0, double/1]).

start() ->
      SharedLib = "double_drv",
      case erl_ddll:load_driver(".", SharedLib) of
         ok -> ok;
         {error, already_loaded} -> ok;
         _ -> exit({error, could_not_load_driver})
      end,
      register(double_server, spawn(fun() -> init(SharedLib) end)).

init(SharedLib) ->
      Port = open_port({spawn, SharedLib}, []),
      loop(Port).

loop(Port) ->
      receive
         {double, Caller, N} ->
            Port ! {self(), {command, integer_to_list(N)}},
            receive
                  {Port, {data, Result}} ->
                     Caller ! {double_result, list_to_integer(Result)}
            end,
            loop(Port);
         stop ->
            Port ! {self(), close},
            receive
                  {Port, closed} -> exit(normal)
            end
      end.

double(N) ->
      double_server ! {double, self(), N},
      receive
         {double_result, Result} -> Result
      end.

stop() ->
      double_server ! stop.

The double_drv_output function is responsible for processing data sent from the Erlang runtime to the driver. It begins by converting the input string to an integer using atoi, performs the required operation of doubling the value, and then formats the result back into a string using snprintf. Finally, the result is sent back to the Erlang runtime using the driver_output function, enabling seamless communication between Erlang and the native code.

The double_driver_entry struct acts as the central definition for the driver, linking the Erlang runtime to the driver’s core functionality. It contains pointers to callback functions such as start, stop, and output, which handle various aspects of the driver’s lifecycle and interactions. Additionally, the driver_name field specifies the identifier used when loading the driver from Erlang. To complete the integration, the DRIVER_INIT macro registers the driver with the Erlang runtime by returning a pointer to the double_driver_entry struct, enabling the runtime to recognize and manage the driver effectively.

To test the Linked-In Driver first compile the C code into a shared library:

gcc -o double_drv.so -fPIC -shared double_drv.c -I /path/to/erlang/erts/include

If you are running the code in the devcontainer of the book:

cd /code/c_chapter
gcc -o double_drv.so -fPIC -shared -I /usr/local/lib/erlang/usr/include/ double_drv.c

Load the driver in Erlang:

Start the Erlang shell:

erl

Load the driver:

1> c(double).
{ok,double}
2> double:start().
true

Call the Functionality:

3> double:double(21).
42

Stop the driver

4> double:stop().
stop
5>

Built-In Functions (BIFs)

Built-in functions (BIFs) are a set of predefined, native functions integrated into the BEAM virtual machine. They implement some basic operations within Erlang, such as bignum arithmetic computations, process management, and handling data types like lists and tuples. BIFs are implemented in C.

It is good to have a clear understanding of the differences between beam instructions, BIFs, Operators, and Library Functions.

Beam instructions are the instructions that the BEAM virtual machine executes. The code for beam instructuins is usually kept small and can in later Erlang versions be inlined by the JIT loader. (More on this in the chapter on JIT).

BIFs are implemented in C and are part of the BEAM virtual machine. They are usually either operators such as bignum '+' that are quite big compared to a beam instruction, or they are library functions that are implemented in C for performance reasons such as 'lists:reverse/1'. Some BIFs provide essential low-level functionality that would be challenging to implement purely in Erlang. Like for example 'erlang:send/2' that is used to send messages between processes.

Operators are syntactic language constructs that are mapped to a library function. For example the oprator '' is mapped by the compiler to 'erlang:/2'. These functions can in term be implemented as either beam instructions or BIFs, depending on thier complexity.

Library functions are basic Erlang functions such as 'lists:length/1' that are part of the Erlang language. They could be implemented as BEAM instructions, BIFS or in Erlang.

BIFs are efficienct as they are writeen in C and optimized for performance within the BEAM. The dissadvantages with BIFs are thet long-running BIFs can block BEAM schedulers, affecting system responsiveness. They have limited extensibility as they are fixed within the runtime system.

Most Erlang users will never need to write a BIF, but it is good to know what they are and how they work. Also if you are writing an EEP (Erlang Enhancement Proposal) you might need to write a BIF to implement the new functionality.

The following sections will go through some of the detalils of how to implement a BIF.

Implementation Steps

  1. Creating a BIF:

    • Write the function in C, adhering to the BEAM’s internal API for memory and process safety.

    • Implement a thread-safe, efficient algorithm for the function’s purpose.

    • Add the function to the BEAM source code and compile it with the runtime system.

    • Calculate the number of reductions the BIF should use in relartion to the time complexity of the function.

    • If the function can take a long time consider adding the possibility to yield.

  2. Performance Considerations:

    • Avoid long-running operations within the BIF.

    • Ensure memory safety and proper handling of Erlang terms to maintain system stability.

Example Implementation

Imagine implementing a simple BIF to calculate the factorial of a number. The BIF would: 1. Parse the input arguments to ensure validity. 2. Perform the computation efficiently in C. 3. Return the result to the Erlang environment.

A BIF has a special signature it is defined as:

BIF_RETTYPE [NAME]_[ARITY](BIF_ALIST_[ARITY])

Note that the name contains the module name and the function name, separated by an underscore. The arity is the number of arguments the function takes. The BIF_ALIST_X macro is used to define the arguments. The BIF_RETTYPE macro is used to define the return type of the function.

Code Snippet:

BIF_RETTYPE math_factorial_1(BIF_ALIST_1)
{
    /* Calculate factorial = n!
       for n >= 0
     */
    int i, n, reds;
    long factorial = 1;
    Eterm result;

    /* Check the arguemnt and return an error if it is not a positiv integer.
       When defining a BIF with BIF_ALIST_X the arguments are named
       BIF_ARG_1, BIF_ARG_2 etc.
       To signal ann error use BIF_ERROR(BIF_P, BADARG);
       The macro BIF_P is a pointer to the process struct (PCB) of the process
       that called the BIF, containing all the information about the process,
       such as heap pointers, stack pointers, registers etc.
    */
    if (is_not_small(BIF_ARG_1) || (n = signed_val(BIF_ARG_1)) < 0)
    {
        BIF_ERROR(BIF_P, BADARG);
    }

    /* Calculate the factorial */
    for (i = 1; i <= n; ++i) factorial *= i;

    /* Convert the result to an Erlang term */
    result = erts_make_integer(factorial, BIF_P);

    /* Calculate reductions relative to the number of iterations of the calculation loop.*/
    reds = n / 1000 + 1;

    /* Return the result and a number of reductions.
      The BIF_RET2 macro is used to return the result and the number of reductions.
      The first argument is the result and the second is the number of reductions.
    */
    BIF_RET2(result, reds);
}

Note that this toy example does not handle large numbers well, factorial(66), will overflow a 64 bit integer ('long'). A real implementation would use the bignum library. We will show how to do this in the next section.

Add the BIF to the list of BIFs in the file bif.tab (https://github.com/erlang/otp/blob/master/erts/emulator/beam/bif.tab).

bif math:factorial/1

If you want to make the bif available to the rest of the runtime don’t forget to add the header to e.g. bif.h.

Then add a stub to the Erlang module math.erl (https://github.com/erlang/otp/blob/master/lib/stdlib/src/math.erl):

-doc "The factorial of `X`.".
-doc(#{since => <<"OTP 29.0">>}).
-spec factorial(X) -> integer() when
      X :: pos_integer().
factorial(_) ->
    erlang:nif_error(undef).

Don’t forget to export your function.

Now you should be able to recompile the BEAM and run the new BIF in Erlang.

1> math:factorial(7).
5040

If you want to add functionallity to the kernel modules you need to handle the preloaded modules in the erl_prim_loader.c file. See [CH-Beam_loader] for details.

If you try it out wirth some larger numbers that are bigger than what your machines small integer can handle you will get the wrong result. Let us move on to a more complex example that uses bignums.

BIF_RETTYPE math_factorial_1(BIF_ALIST_1)
{
    /* Calculate factorial = n! for n >= 0 */
    Sint64 i, n, reds;
    Uint64 factorial = 1;
    Eterm result;
    Eterm *hp;
    Eterm big_factorial;
    ErtsDigit *temp_digits_a;
    ErtsDigit *temp_digits_b;
    ErtsDigit *src;
    ErtsDigit *dest;
    ErtsDigit *temp;
    dsize_t temp_size;
    dsize_t curr_size;
    dsize_t new_size;

    if (is_not_integer(BIF_ARG_1) || (!term_to_Sint64(BIF_ARG_1, &n)) || n < 0)
    {
        BIF_ERROR(BIF_P, BADARG);
    }

    /* We do not want to use heap space for intermediate resultst sine we are
       going to do a lot of them. So we use a temporary buffer.
       We create two buffers one to keep (n-1)! and one to keep n!.

    */

    /* Initial allocation of buffers, this will be large enough for most n,
       that is up to ~ 50.000. Then we will start increasing the buffer.  */
    temp_size = 2 * (n + 1);
    temp_digits_a = (ErtsDigit *)erts_alloc(ERTS_ALC_T_TMP, temp_size * sizeof(ErtsDigit));
    temp_digits_b = (ErtsDigit *)erts_alloc(ERTS_ALC_T_TMP, temp_size * sizeof(ErtsDigit));
    src = temp_digits_a;
    dest = temp_digits_b;

    for (i = 1; i <= n; ++i)
    {
        /* We  want to avoid heap allocation for small numbers.
           These we do on the C stack and just return a small_int. */
        if (!IS_USMALL(0, factorial * i))
        {
            // Initial conversion to bignum in our temp buffer
            hp = (Eterm *)src;
            big_factorial = uint_to_big(factorial, hp);
            curr_size = BIG_SIZE(big_val(big_factorial));

            for (; i <= n; ++i)
            {
                new_size = curr_size + 1;

                if (new_size > temp_size)
                {
                    dsize_t alloc_size = new_size * 2;
                    src = (ErtsDigit *)erts_realloc(ERTS_ALC_T_TMP, src, alloc_size * sizeof(ErtsDigit));
                    dest = (ErtsDigit *)erts_realloc(ERTS_ALC_T_TMP, dest, alloc_size * sizeof(ErtsDigit));
                    temp_size = alloc_size;
                }

                big_factorial = big_times_small(big_factorial, i, (Eterm *)dest);

                temp = src;
                src = dest;
                dest = temp;
            }

            // We are done.
            // Only now allocate on process heap and copy the final result
            hp = HAlloc(BIF_P, BIG_SIZE(big_val(big_factorial)) + 1);
            sys_memcpy(hp, big_val(big_factorial), (BIG_SIZE(big_val big_factorial)) + 1) * sizeof(Eterm));
            result = make_big(hp);

            erts_free(ERTS_ALC_T_TMP, temp_digits_a);
            erts_free(ERTS_ALC_T_TMP, temp_digits_b);

            goto done;
        }
        factorial *= i;
    }

    result = make_small(factorial);

done:
    reds = n / 1000 + 1;
    BIF_RET2(result, reds);
}

Now we can calcualte really large factorials. like 1000!

1> math:factorial(1000).
402387260077093773543702433923003985719374864210714632543799910429938512398629020592044208486969404800479988610197196058631666872994808558901323829669944590997424504087073759918823627727188732519779505950995276120874975462497043601418278094646496291056393887437886487337119181045825783647849977012476632889835955735432513185323958463075557409114262417474349347553428646576611667797396668820291207379143853719588249808126867838374559731746136085379534524221586593201928090878297308431392844403281231558611036976801357304216168747609675871348312025478589320767169132448426236131412508780208000261683151027341827977704784635868170164365024153691398281264810213092761244896359928705114964975419909342221566832572080821333186116811553615836546984046708975602900950537616475847728421889679646244945160765353408198901385442487984959953319101723355556602139450399736280750137837615307127761926849034352625200015888535147331611702103968175921510907788019393178114194545257223865541461062892187960223838971476088506276862967146674697562911234082439208160153780889893964518263243671616762179168909779911903754031274622289988005195444414282012187361745992642956581746628302955570299024324153181617210465832036786906117260158783520751516284225540265170483304226143974286933061690897968482590125458327168226458066526769958652682272807075781391858178889652208164348344825993266043367660176999612831860788386150279465955131156552036093988180612138558600301435694527224206344631797460594682573103790084024432438465657245014402821885252470935190620929023136493273497565513958720559654228749774011413346962715422845862377387538230483865688976461927383814900140767310446640259899490222221765904339901886018566526485061799702356193897017860040811889729918311021171229845901641921068884387121855646124960798722908519296819372388642614839657382291123125024186649353143970137428531926649875337218940694281434118520158014123344828015051399694290153483077644569099073152433278288269864602789864321139083506217095002597389863554277196742822248757586765752344220207573630569498825087968928162753848863396909959826280956121450994871701244516461260379029309120889086942028510640182154399457156805941872748998094254742173582401063677404595741785160829230135358081840096996372524230560855903700624271243416909004153690105933983835777939410970027753472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

We can even calcualte 100000! but it will take some time. In order to not block the BEAM scheduler we should yield every now and then.

We can do this by calling BIF_TRAP1 with the state we want to save and the function we want to return to. In this case we want to return to the math_factorial_trap_1 function.

We need to define a structure to hold the state we want to save and the function we want to return to in math_factorial_trap_export. We also need to initialize this export structure. We do this in the math_factorial_trap_init function by calling erts_init_trap_export. Then we need to ad a call to math_factorial_trap_init in the erl_init function in erl_init.c. We also need to add the new atom 'am_math_factorial_trap' to atom.names.

When we yield we store the state in a tuple with the current i, the current factorial and the original n. So the argument to the trap function is a tuple with these values. In the original BIF 'math_factorial_1' we create this tuple and call 'math_factorial_trap_1'.

Now when we come back from a yield we might already have a bignum so we restructure the code a bit to handle this case.

We check our iterator and every 1000 interations we save the state and yield. We also save the state and yield when we have a bignum.

static Export math_factorial_trap_export;

static BIF_RETTYPE math_factorial_trap_1(BIF_ALIST_1)
{
    Sint64 n, i;
    Uint64 factorial;
    Eterm big_factorial = THE_NON_VALUE;
    ErtsDigit *temp_digits_a = NULL;
    ErtsDigit *temp_digits_b = NULL;
    ErtsDigit *src;
    ErtsDigit *dest;
    ErtsDigit *temp;
    dsize_t temp_size;
    dsize_t curr_size;
    dsize_t new_size;
    dsize_t big_size;
    Eterm result;
    Eterm *hp;
    Eterm *big_hp;
    Eterm *tp = tuple_val(BIF_ARG_1);
    Eterm state;

    // Extract current state
    i = signed_val(tp[1]); // Current i
    n = signed_val(tp[3]); // Original n

    temp_size = (n * 4);
    temp_digits_a = (ErtsDigit *)erts_alloc(ERTS_ALC_T_TMP, temp_size * sizeof(ErtsDigit));
    temp_digits_b = (ErtsDigit *)erts_alloc(ERTS_ALC_T_TMP, temp_size * sizeof(ErtsDigit));
    src = temp_digits_a;
    dest = temp_digits_b;

    // Handle both small and bignum factorial cases
    if (is_small(tp[2]))
    {
        factorial = unsigned_val(tp[2]);
    }
    else
    {
        // We're already in bignum territory, skip the small number loop
        big_factorial = tp[2];
        goto bignum_loop;
    }


    for (; i <= n; ++i)
    {
        if (!IS_USMALL(0, factorial * i))
        {
            // Initial conversion to bignum in our temp buffer
            big_factorial = uint_to_big(factorial, (Eterm *)src);
            goto bignum_loop;
        }
        factorial *= i;
    }

    if (temp_digits_a)
        erts_free(ERTS_ALC_T_TMP, temp_digits_a);
    if (temp_digits_b)
        erts_free(ERTS_ALC_T_TMP, temp_digits_b);

    BIF_RET(make_small(factorial));

bignum_loop:
    // Continue multiplying using bignums

    for (; i <= n; ++i)
    {
        curr_size = BIG_SIZE(big_val(big_factorial));
        new_size = curr_size + 1;
        if (new_size > temp_size)
        {
            dsize_t alloc_size = new_size * 2;
            src = (ErtsDigit *)erts_realloc(ERTS_ALC_T_TMP, src,
                                            alloc_size * sizeof(ErtsDigit));
            dest = (ErtsDigit *)erts_realloc(ERTS_ALC_T_TMP, dest,
                                             alloc_size * sizeof(ErtsDigit));
            temp_size = alloc_size;
        }

        big_factorial = big_times_small(big_factorial, i, (Eterm *)dest);

        if ((i % 1000) == 0)
        {
            // Increment i so we don't get stuck in a loop. with i % 1000 == 0.
            // Save state: {(i+1), i!, n} and yield.

            // Store the tuple with the sate on the heap
            big_size = BIG_SIZE(big_val(big_factorial));
            hp = HAlloc(BIF_P, 4 + big_size + 1);
            big_hp = hp + 4;
            sys_memcpy(big_hp, big_val(big_factorial), (big_size + 1) * sizeof(Eterm));
            state = TUPLE3(hp, make_small(i + 1), make_big(big_hp), tp[3]);

            // Free the temporary buffers
            erts_free(ERTS_ALC_T_TMP, temp_digits_a);
            erts_free(ERTS_ALC_T_TMP, temp_digits_b);

            // Yield giving the state to the trap function
            BIF_TRAP1(&math_factorial_trap_export, BIF_P, state);
        }

        temp = src;
        src = dest;
        dest = temp;
    }

    hp = HAlloc(BIF_P, BIG_SIZE(big_val(big_factorial)) + 1);
    sys_memcpy(hp, big_val(big_factorial),
               (BIG_SIZE(big_val(big_factorial)) + 1) * sizeof(Eterm));
    result = make_big(hp);

    erts_free(ERTS_ALC_T_TMP, temp_digits_a);
    erts_free(ERTS_ALC_T_TMP, temp_digits_b);

    BIF_RET(result);
}

BIF_RETTYPE math_factorial_1(BIF_ALIST_1)
{
    Sint64 n;
    Eterm *hp;
    Eterm state;
    Eterm args[1];

    if (is_not_integer(BIF_ARG_1) || (!term_to_Sint64(BIF_ARG_1, &n)) || n < 0)
    {
        BIF_ERROR(BIF_P, BADARG);
    }

    // Create initial state
    hp = HAlloc(BIF_P, 4);
    state = TUPLE3(hp, make_small(1), make_small(1), BIF_ARG_1); // {i, factorial, n}

    // Call the continuation function with initial state
    args[0] = state;
    return math_factorial_trap_1(BIF_P, args, A__I);
}

void erts_init_math_factorial(void)
{
    erts_init_trap_export(&math_factorial_trap_export,
                          am_erts_internal, am_math_factorial_trap, 1,
                          &math_factorial_trap_1);
    return;
}

Now we can handle really large factorials without blocking the BEAM scheduler.

One Calm Night: A Tale of term_to_binary and Non-Yielding BIFs

It was a calm night, one of those rare occasions where everything in the system seemed to hum along perfectly. But then, without warning, the unthinkable happened—the system crashed.

In the aftermath, the investigation revealed a peculiar sequence of events. The BEAM schedulers, designed to gracefully balance work and sleep, had been too eager to shut down when the workload diminished. Only a single scheduler remained active, handling the entirety of the system’s operations.

As fate would have it, this lone scheduler stumbled upon a task: converting a 4MB term into a binary for logging purposes. This seemingly routine operation invoked the term_to_binary function—a built-in function (BIF). Instead of breaking its workload into manageable chunks, it perfomed all the work until done and then reported it had used only 20 reductions.

This non-yielding nature proved to be a fatal flaw. The single scheduler became entirely consumed, unable to handle other tasks or respond to the system’s needs. Erlang’s HEART mechanism, ever vigilant, detected the unresponsiveness and concluded that the node was beyond saving. With grim efficiency, it killed the node.

Fixing the Problem

Fix 1: Keep Schedulers Alive

The first step was to ensure that schedulers remained alert, even during periods of low activity. Introduced in June 2013, this fix involved tweaking Erlang’s startup flags:

  • '+sfwi 50': Adjusted the scheduler forced wakeup interval to keep them responsive.

  • '+scl false': Prevented scheduler compaction, maintaining an even workload distribution.

  • '+sub true': Ensured utilization spread across the system.

These adjustments breathed new life into the schedulers, making them resilient to lulls in activity.

See [CH-Scheduling] and [CH-Tweak] for detials on tweaking your scheduler settings.

Fix 2: Rewriting 'term_to_binary'

The second and more intricate fix targeted the 'term_to_binary' function itself. Developers rewrote the BIF to yield periodically, spreading its workload over multiple reductions. This approach ensured that the function played nicely with the scheduler, avoiding prolonged monopolization of system resources.

The fix, implemented in the same period, introduced chunked processing to the function. It also depended on advancements in garbage collection within BIFs. See https://github.com/erlang/otp/commit/47d6fd3ccf35a4d921591dd0a9b5e69b9804b5b0

Summary of Built In Functions (BIFs)

Most developers will never have to write or even read the code of a BIF. But it is good to know what they are and how they work. Also if you ever look into the Erlang code of some standard libraries you might now know why they don’t contain any real code, only stubs.

NIFs

A slightly more common, and safer, task for Erlang developers is writing NIFs (Native Implemented Functions). NIFs are functions implemented in C or C++ (or other languages) that are called from Erlang code. They provide a way to dynamically extend Erlang with native code for performance-critical operations or interfacing with external libraries. Improper use can still destabilize the VM, so they should be used cautiously.

How to use Nifs is nowadays well documented in the Interoperability Tutorial in the Nif section, but we will go through the basics here.

Implementing a Simple NIF

A simple C-based NIF for fast integer addition:

C Code (nif_add.c):

#include "erl_nif.h"

static ERL_NIF_TERM nif_add(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
    int a, b;
    if (!enif_get_int(env, argv[0], &a) || !enif_get_int(env, argv[1], &b)) {
        return enif_make_badarg(env);
    }
    return enif_make_int(env, a + b);
}

static ErlNifFunc nif_funcs[] = {
    {"add", 2, nif_add}
};

ERL_NIF_INIT(my_nif, nif_funcs, NULL, NULL, NULL, NULL);

Erlang Wrapper Module (my_nif.erl):

-module(my_nif).
-export([add/2, load/0]).

-on_load(load/0).

load() ->
    erlang:load_nif("./my_nif", 0).

add(_A, _B) ->
    erlang:nif_error("NIF not loaded").

Compiling and Running the NIF

gcc -shared -fPIC -o my_nif.so nif_add.c -I /usr/lib/erlang/usr/include

Then in Erlang:

> my_nif:add(10, 20).
30

Keep NIF execution short to avoid blocking the BEAM scheduler. Use NIFs for performance-critical operations and interfacing with external libraries. Use dirty schedulers for long-running operations. Always validate inputs to prevent crashes.