Skip to content

Commit

Permalink
added documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
m-fila committed Apr 24, 2024
1 parent d438c5d commit 16f8a7d
Show file tree
Hide file tree
Showing 3 changed files with 58 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
- [Data Model Syntax](./doc/datamodel_syntax.md)
- [Examples](./doc/examples.md)
- [Advanced Topics](./doc/advanced_topics.md)
- [Python Interface](./doc/python.md)
- [Contributing](./doc/contributing.md)

<!-- Browse the API documentation created with Doxygen at -->
Expand Down
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,6 @@ Welcome to PODIO's documentation!
userdata.md
advanced_topics.md
templates.md
python.md
cpp_api/api
py_api/modules
56 changes: 56 additions & 0 deletions doc/python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Python interface for data models

Podio provides support for a Python interface for the generated data models. The [design choice](design.md) to create Python interface resembling the C++ interface is achieved by generating Python bindings from the C++ interface using
[cppyy](https://cppyy.readthedocs.io/en/latest/index.html).

It's important to note that cppyy loads the bindings and presents them lazily at runtime to the Python interpreter, rather than writing Python interface files. Consequently, the Python bindings have a runtime dependency on both cppyy and the data model's C++ interface.

To load the Python bindings from a generated C++ model dictionary, first make sure the model's library and headers can be found in `LD_LIBRARY_PATH` and `ROOT_INCLUDE_HEADERS` respectively, then:

```python
import ROOT

res = ROOT.gSystem.Load('libGeneratedModelDict.so')
if res < 0:
raise RuntimeError('Failed to load libGeneratedModelDict.so')
```

For reference usage, see [Python module of EDM4HEP](https://github.com/key4hep/EDM4hep/blob/main/python/edm4hep/__init__.py) data model.

## Pythonizations

Python as a language uses different constructions and conventions than C++, perfectly fine C++ code translated one to one to Python could be clunky by Python's standard. cppyy offers a mechanism called [pythonizations](https://cppyy.readthedocs.io/en/latest/pythonizations.html) to make the resulting bindings more pythonic. Some basic pythonizations are included automatically (for instance `operator[]` is translated to `__getitem__`) but others can be specified by a user.

Podio comes with its own set pythonizations useful for the data models generated with it. To apply all the provided pythonizations to a `model_namespace` namespace:

```python
from podio.pythonizations import load_pythonizations

load_pythonizations("model_namespace")
```

If only specific pythonizations should be applied:

```python
from podio.pythonizations import collection_subscript # specific pythonization

collection_subscript.CollectionSubscriptPythonizer.register("model_namespace")
```

### Developing new pythonizations

To be discovered by `load_pythonizations`, any new pythonization should be placed in `podio.pythonizations` and be derived from the abstract class `podio.pythonizations.utils.pythonizer.Pythonizer`.

A pythonization class should implement the following three class methods:

- `priority`: The `load_pythonizations` function applies the pythonizations in increasing order of their `priority`
- `filter`: A predicate to filter out classes to which given pythonization should be applied. See the [cppyy documentation](https://cppyy.readthedocs.io/en/latest/pythonizations.html#python-callbacks).
- `modify`: Applying the modifications to the pythonized classes.

### Considerations

The cppyy pythonizations come with some considerations:

- The general cppyy idea to lazily load only things that are needed applies only partially to the pythonizations. For instance, a pythonization modifying the `collection[]` will be applied the first time a class of `collection` is used, regardless if `collection[]` is actually used.
- Each pythonization is applied to all the entities in a namespace and relies on a conditional mechanism (`filter` method) inside the pythonizations to select entities they modify. With a large number of pythonizations, the overheads will add up and slow down the usage of any class from a pythonized namespace.
- The cppyy bindings hooking to the C++ routines are characterized by high performance compared to ordinary Python code. The pythonizations are written in Python and are executed at ordinary Python code speed.

0 comments on commit 16f8a7d

Please sign in to comment.