This repository contains Python scripts and modules for various tooling purposes.
Before running the scripts in this repository, make sure you have the following:
- Python 3.11 or above
- Virtual environment - included in Installation instructions below
I recommend using venv
and the instructions for this are as follows:
- Set up venv, preferably running this command in your data_analysis directory:
python3 -m venv .venv
- Activate your environment:
source .venv/bin/activate
- Install the requirements via
requirements.txt
to this project specific virtual environment:
python3 -m pip install -r requirements.txt
To set up the virtual environment, you can either:
- Use the provided
requirements.txt
file and install directly to your base python environment:
pip3 install -r requirements.txt
retrieve_schema_fields.py To run this script from within the tooling folder, based on your python distribution:
python3 retrieve_schema_fields.py
Outputs:
- A csv file in the outputs directory containing schemas and fields within them
generate_schema_diagram.py There are several options for the running of this script:
- Search for one schema
- Search for multiple schema
- No Search option - generate full diagram
To run this script from within the tooling folder, based on your Python distribution:
- Search for one schema, where the argument is a schema name in a string format
python3 generate_schema_diagram.py "AddressDto"
- Search for mulitiple schemas, where the arguments are all strings of schema names:
python3 generate_schema_diagram.py "AddressDto" "SentenceCalcDates"
- No search option, generating a full diagram:
python3 generate_schema_diagram.py
Outputs
- A csv file with the parent child relations of schemas.
- A diagram in .dot format, renderable in plantuml or local graphviz renderer, of the schema relations
discover_schema_endpoints.py This script:
- Takes as an input a csv file of parent-child schema relations (defaults to the schema_parent_child.csv file generated by another script)
- Note that the input file MUST contain as a subset the following columns in any order:
Parent_Schema
,Field
,Child_Schema
,Searched_bool
- Note that the input file MUST contain as a subset the following columns in any order:
- Returns as a csv file a table of relevant endpoints to all parent and children schemas in the provided file. The endpoints are for successful response types only (i.e.
2XX
)- The output file is tabulated with the following columns:
Path
,HTTP_method
,HTTP_response
,Schema
To run this script from within the data_analysiss folder:
- The output file is tabulated with the following columns:
python3 discover_schema_endpoints.py
OR you can manually specify a file to load in with:
python3 discover_schema_endpoints.py "outputs/some_other_file.csv"
Noting that the file must contain expected columns
search_apis_for_phrase.py This script:
- Takes a search term or even a phrase and searches every API listed in the published APIs of Structurizr
- The search is not case or space delimiter sensitive, and works by scanning only the paths and schemas of the api-docs (which is where the relevant information will be).
- It generates an in-memory data frame of search results for both the Schema and the Path search, and returns these data frames as csv tables.
python3 search_apis_for_phrase.py "search phrase"
Limitations:
- The script will handle timeouts and other common API errors.
- If there is a limitation on accessing a URL from a non MoJ device, this script will also be limited in that way
- In this regard all of the links only work when running the search on an MoJ device
- The URLs are currently hardcoded due to non-obvious ways of retrieving the api-docs dynamically
- A potential upgrade could be with a webscraper but I don't know if the computational complexity is worth the effort
This repository contains the following Python modules:
There is a constants directory, initialised to be a module directory, containing constants and common objects used by the scripts. This allows you to edit constants in one location, without having to amend other scripts in the base data_analysis directory. Feel free to explore them for more functionality.
Contributions to this tooling section are welcome, as long as they can be executed in Python. Autodocumentation is a potential and the desire is to keep this option open as this tooling section expands.