diff --git a/README.md b/README.md index 445924871..75c89fdec 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,7 @@ - [Data Model Syntax](./doc/datamodel_syntax.md) - [Examples](./doc/examples.md) - [Advanced Topics](./doc/advanced_topics.md) + - [Python Interface](./doc/python.md) - [Contributing](./doc/contributing.md) diff --git a/doc/index.rst b/doc/index.rst index 9c56def77..706e039d0 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -17,5 +17,6 @@ Welcome to PODIO's documentation! userdata.md advanced_topics.md templates.md + python.md cpp_api/api py_api/modules diff --git a/doc/python.md b/doc/python.md new file mode 100644 index 000000000..12989d2b2 --- /dev/null +++ b/doc/python.md @@ -0,0 +1,56 @@ +# Python interface for data models + +Podio provides support for a Python interface for the generated data models. The [design choice](design.md) to create Python interface resembling the C++ interface is achieved by generating Python bindings from the C++ interface using +[cppyy](https://cppyy.readthedocs.io/en/latest/index.html). + +It's important to note that cppyy loads the bindings and presents them lazily at runtime to the Python interpreter, rather than writing Python interface files. Consequently, the Python bindings have a runtime dependency on both cppyy and the data model's C++ interface. + +To load the Python bindings from a generated C++ model dictionary, first make sure the model's library and headers can be found in `LD_LIBRARY_PATH` and `ROOT_INCLUDE_HEADERS` respectively, then: + +```python +import ROOT + +res = ROOT.gSystem.Load('libGeneratedModelDict.so') +if res < 0: + raise RuntimeError('Failed to load libGeneratedModelDict.so') +``` + +For reference usage, see [Python module of EDM4HEP](https://github.com/key4hep/EDM4hep/blob/main/python/edm4hep/__init__.py) data model. + +## Pythonizations + +Python as a language uses different constructions and conventions than C++, perfectly fine C++ code translated one to one to Python could be clunky by Python's standard. cppyy offers a mechanism called [pythonizations](https://cppyy.readthedocs.io/en/latest/pythonizations.html) to make the resulting bindings more pythonic. Some basic pythonizations are included automatically (for instance `operator[]` is translated to `__getitem__`) but others can be specified by a user. + +Podio comes with its own set pythonizations useful for the data models generated with it. To apply all the provided pythonizations to a `model_namespace` namespace: + +```python +from podio.pythonizations import load_pythonizations + +load_pythonizations("model_namespace") +``` + +If only specific pythonizations should be applied: + +```python +from podio.pythonizations import collection_subscript # specific pythonization + +collection_subscript.CollectionSubscriptPythonizer.register("model_namespace") +``` + +### Developing new pythonizations + +To be discovered by `load_pythonizations`, any new pythonization should be placed in `podio.pythonizations` and be derived from the abstract class `podio.pythonizations.utils.pythonizer.Pythonizer`. + +A pythonization class should implement the following three class methods: + +- `priority`: The `load_pythonizations` function applies the pythonizations in increasing order of their `priority` +- `filter`: A predicate to filter out classes to which given pythonization should be applied. See the [cppyy documentation](https://cppyy.readthedocs.io/en/latest/pythonizations.html#python-callbacks). +- `modify`: Applying the modifications to the pythonized classes. + +### Considerations + +The cppyy pythonizations come with some considerations: + +- The general cppyy idea to lazily load only things that are needed applies only partially to the pythonizations. For instance, a pythonization modifying the `collection[]` will be applied the first time a class of `collection` is used, regardless if `collection[]` is actually used. +- Each pythonization is applied to all the entities in a namespace and relies on a conditional mechanism (`filter` method) inside the pythonizations to select entities they modify. With a large number of pythonizations, the overheads will add up and slow down the usage of any class from a pythonized namespace. +- The cppyy bindings hooking to the C++ routines are characterized by high performance compared to ordinary Python code. The pythonizations are written in Python and are executed at ordinary Python code speed.