Skip to content

Immutable SCC algorithm for Python #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mjp41 opened this issue Apr 2, 2025 · 2 comments
Open

Immutable SCC algorithm for Python #54

mjp41 opened this issue Apr 2, 2025 · 2 comments
Assignees

Comments

@mjp41
Copy link
Owner

mjp41 commented Apr 2, 2025

This issues contains a detailed design of implementing the Reference Counting Deeply Immutable Data
Structures with Cycles
for Python.

Motivation

Python 3.12 associates each heap object with an interpreter. These objects are part of a doubly linked list on the interpreter that created the object. If we were to share objects across interpreters and a different interpreter decrement the final reference, then it would have to modify the doubly linked list for a different interpreter. This concurrent modification could form a bottle neck and would require locking.

The Immutable SCC paper means that immutable state can be safely reference counted and does not require a backup cycle detection.

Challenges with respect to the paper

There are a couple of things that need to differ from the paper for the Python context.

External references

We want to allow freezing a component, which still has incoming edges. Consider the following graph:

%%{init: {'theme': 'neutral', 'themeVariables': { 'fontSize': '16px' }}}%%
graph TD
id1[frame]
id1--> |z| id3[rc=1]
id3--> |f| id5[rc=1]
id4--> |f| id6[rc=2]
id5--> |g| id6
id5--> |f| id4[rc=3]
id1--> |x| id4
id6--> |g| id4
id1
id5
id4
id6
id3
subgraph LocalReg[ ]
  id1
  id3
  id4
  id5
  id6
end
style LocalReg fill:#eefcdd
classDef unreachable stroke-width:2px,stroke:red
classDef highlight stroke-width:4px,stroke:yellow
classDef error stroke-width:4px,stroke:red
classDef tainted fill:#afa8db
classDef tainted_immutable stroke-width:4px,stroke:#afa8db
classDef immutable fill:#94f7ff
Loading

If we freeze(x), then we should result in

%%{init: {'theme': 'neutral', 'themeVariables': { 'fontSize': '16px' }}}%%
graph TD
id1[frame]
id1--> |z| id3[rc=1]
id3--> |f| id5[rc=1]
id4--> |f| id6[rc=3]:::immutable
id5--> |g| id6
id5--> |f| id4[rc=.]:::immutable
id1--> |x| id4
id6--> |g| id4
id4-.-> |root| id6
id1
id5
id4
id6
id3
subgraph LocalReg[ ]
  id1
  id3
  id4
  id5
  id6
end
style LocalReg fill:#eefcdd
classDef unreachable stroke-width:2px,stroke:red
classDef highlight stroke-width:4px,stroke:yellow
classDef error stroke-width:4px,stroke:red
classDef tainted fill:#afa8db
classDef tainted_immutable stroke-width:4px,stroke:#afa8db
classDef immutable fill:#94f7ff
Loading

This requires us to use the existing reference counts to work out what the reference count of the SCC should be. This requires additional storage that was not required in the paper.

Failure

The paper did not consider the possibility of the algorithm failing. We would like that to be possible without breaking the invariant that objects can only reach other objects.

Implementation

Python 3.12 has two pointer sized elements that are used for the GCs doubly linked list of objects. We can use this state to implement the SCC algorithm in Python 3.12 with minimal disruption. This does not exist for all Python objects: those that cannot participate in cycles do not have this space allocated. This means we need to be able to handle atomic reference counting on these types of objects without that additional space. See _PyType_PreHeaderSize.

We need to borrow parts of the reference count to represent the following states:

  • Mutable
  • Immutable direct (either the root of an SCC, or an immutable object without GC space)
  • Immutable indirect (part of an SCC and not the root.)
  • Pending in an SCC that is currently being calculated (this should not persist beyond a freeze).

This can be borrowed in the top parts of the reference count. (Consider 32bit, with Immortal are we out of space?)

Union-find

We use the two GC fields for handling tracking the SCCs using a cyclic singly-linked-list of all the elements in the SCC, and a parent pointing tree to efficiently implement the union find structure. The root of the SCC in the parent pointing tree can be used to carry the rank of the SCC (its maximum possible depth). The rank only really requires a maximum of 7-bits as it is the maximum depth of a mostly balanced tree.

Algorithm

The following is a recursive function that implements the freeze operation

def freeze(r):
  pending = []

  def freeze_inner(x):
    match x.status
    | MUTABLE =>
      # Check if the GC space exists for SCC calc
      if _PyObject_IS_GC(x):
        # Make this pending as it could be part of a non-trivial SCC.
        x.status = PENDING;
        pending.push(x);
        # Traverse children
        for each f in x
          freeze_inner(x.f);
        # On postorder traversal check if this is a completed SCC.
        if (pending.peek() == x)
          # A completed SCC
          pending.pop();
          r = find(x)
          # Accumulate the rcs in each element of the SCC,
          # internal edges will have been decremented already
          rc = 0;
          for (s : scc(x)):
            rc += s.rc
            # Optimise to point directly at the root
            s.root = root
            s.status = IMMUTABLE_INDIRECT
          # Set up root with the summed rc, and set its status as the root.
          r.rc = rc
          r.status = IMMUTABLE_DIRECT
      else:
        # No space for SCC stuff, just make immutable and recurse.
        # A cycle through this will never be collected.
        r.status = IMMUTABLE_DIRECT
        for each f in x
          freeze_inner(x.f);
    | PENDING =>
      x.decref() # Internal edge so remove RC.
      while (union(x, pending.peek()))
        pending.pop();
    | IMMUTABLE_DIRECT =>
      incref(x);
    | IMMUTABLE_INDIRECT =>
      incref(find(x))

  freeze_inner(r)

The code above is written as a recursive function. This is not practical, and a DFS stack should be used to prevent stack overflow. This also needs to handle the postorder step correctly. The Verona runtime achieves this by borrowing a bit to indicate that this is either a pre- or post-order visit to the node: https://github.com/microsoft/verona-rt/blob/7deb5873937354e1fa64c9879c4e1e077b0f327d/src/rt/region/freeze.h#L181-L196

Keeping the cyclic list in the union find means this could be extended to handle failure without breaking the invariant.

@xFrednet
Copy link
Collaborator

xFrednet commented Apr 3, 2025

Python 3.12 has two pointer sized elements that are used for the GCs doubly linked list of objects. We can use this state to implement the SCC algorithm in Python 3.12 with minimal disruption. This does not exist for all Python objects: those that cannot participate in cycles do not have this space allocated. [...]

Question regarding this cannot: Is there any mechanism to prevent a cycle though this kind of object? Or is this a theoretical possibility, and it just means that the memory will never be reclaimed?

@mjp41
Copy link
Owner Author

mjp41 commented Apr 9, 2025

If you had a cycle through an object that the cycle detector does not understand, then it will never be collect in Python unless the cycle is manually broken. We are not making things worse in this regard, so I view it as acceptable design.

@mjp41 mjp41 self-assigned this Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants