BOATParser (Business Objects AdminTools Parser) allows you to take the output of queries run in AdminTools and convert them into a pandas DataFrame. This facilitatates analysis of your metadata and also allows you to easily output that information into common sharable formats like csv and Excel.
If you are new to Python and need a more in-depth walk through of how to use this, feel free to reference my posting on the SAP Community website.
Clone the repository git@github.com:WillAyd/BOATParser.git
Build the source distribution and install via pip:
python setup.py sdist pip install dist/BOATParser-X.Y.tar.gz
Run any query in the AdminTools of your Business Objects environment. After the query completes, do a "Save As" on the page and be sure to save the page source locally.
Import the BOAdminToolsParser class and point it to your locally saved file:
from BOATParser import BOAdminToolsParser bp = BOAdminToolsParser() df = bp.frame_from_file(<YOUR_HTML_FILE>)
Optionally, if your file contains folder path information (i.e. you included SI_KIND='Folder' in the WHERE clause of your query) use the expand_paths function to parse out an absolute directory listing:
# Assuming df contains folder info, with SI_PATH and SI_NAME columns df['expanded_path'] = bp.expand_paths(df)
The output of any AdminTools query in Business Objects looks as follows:
Business Objects Business Intelligence platform - Query BuilderNumber of InfoObject(s) returned: 3 |
1/3 | top |
Properties | |||||||||||||||
SI_NAME | baz | ||||||||||||||
SI_ID | 999999 | ||||||||||||||
SI_CUID | ACUID_FOR_BAZ | ||||||||||||||
SI_PATH |
|
2/3 | top |
Properties | |||||||||
SI_NAME | bar | ||||||||
SI_ID | 888888 | ||||||||
SI_CUID | ACUID_FOR_BAR | ||||||||
SI_PATH |
|
3/3 | top |
Properties | |||
SI_NAME | foo | ||
SI_ID | 777777 | ||
SI_CUID | ACUID_FOR_FOO | ||
SI_PATH |
|
The BOAdminToolsParser class contained within the BOATParser module uses BeautifulSoup to parse and focus just on table elements (this can greatly improve performance for large documents). Each table is converted into a dict entry, where the key is the first column and the value is the second. After parsing the entire file, the class converts the dict into a DataFrame, where each table parsed becomes its own record. The value in the first table column maps to the column name of the DataFrame, and the second column from the table becomes the value.
In cases where tables are nested (see SI_PATH in the test.html table provided) the BOAdminToolsParser class will parse recursively. The value for that given entry becomes a nested dict. The class also provides a convenience function called expand_paths. Using the example above, after parsing it into a DataFrame you can call that function to get the full file path of a given record.
The frame_from_file method may also accept an arbirtrary number of keyword arguments to be passed to Python's built-in open command. This may be especially useful if the encoding of the file you are trying to parse differs from the default system encoding.
Bringing this altogether, here's are the steps for parsing the above table:
from BOATParser import BOAdminToolsParser bp = BOAdminToolsParser() df = bp.frame_from_file('test.html', encoding='utf-8') df['expanded_path'] = bp.expand_paths(df)
Yielding the following DataFrame:
SI_NAME | SI_ID | SI_CUID | SI_PATH | expanded_path | |
---|---|---|---|---|---|
0 | baz | 999999 | ACUID_FOR_BAZ | {'SI_FOLDER_ID2': '888888', 'SI_FOLDER_ID1': '... | foo/bar/baz |
1 | bar | 888888 | ACUID_FOR_BAR | {'SI_FOLDER_ID1': '777777', 'SI_NUM_FOLDERS': ... | foo/bar |
2 | foo | 777777 | ACUID_FOR_FOO | {'SI_NUM_FOLDERS': '0'} | foo |