Data Analysis Tooling

Requirements

Before running the scripts in this repository, make sure you have the following:

Python 3.11 or above
Virtual environment - included in Installation instructions below

Installation

Method 1 - to a local, project specific python virtual environment

I recommend using venv and the instructions for this are as follows:

Set up venv, preferably running this command in your data_analysis directory:

python3 -m venv .venv

Activate your environment:

source .venv/bin/activate

Install the requirements via requirements.txt to this project specific virtual environment:

python3 -m pip install -r requirements.txt

Method 2 - directly to your base python distribution

To set up the virtual environment, you can either:

Use the provided requirements.txt file and install directly to your base python environment:

pip3 install -r requirements.txt

Scripts

Retrieve Schemas and their fields from Prison API

retrieve_schema_fields.py To run this script from within the tooling folder, based on your python distribution:

python3 retrieve_schema_fields.py

Outputs:

A csv file in the outputs directory containing schemas and fields within them

Generate a Schema space diagram, output child-parent relations

generate_schema_diagram.py There are several options for the running of this script:

Search for one schema
Search for multiple schema
No Search option - generate full diagram

To run this script from within the tooling folder, based on your Python distribution:

Search for one schema, where the argument is a schema name in a string format

python3 generate_schema_diagram.py "AddressDto"

Search for mulitiple schemas, where the arguments are all strings of schema names:

python3 generate_schema_diagram.py "AddressDto" "SentenceCalcDates"

No search option, generating a full diagram:

python3 generate_schema_diagram.py

Outputs

A csv file with the parent child relations of schemas.

A diagram in .dot format, renderable in plantuml or local graphviz renderer, of the schema relations

Retrieve Endpoints for provided list of schemas

discover_schema_endpoints.py This script:

Takes as an input a csv file of parent-child schema relations (defaults to the schema_parent_child.csv file generated by another script)
- Note that the input file MUST contain as a subset the following columns in any order: Parent_Schema, Field, Child_Schema, Searched_bool
Returns as a csv file a table of relevant endpoints to all parent and children schemas in the provided file. The endpoints are for successful response types only (i.e. 2XX)
- The output file is tabulated with the following columns: Path, HTTP_method, HTTP_response, Schema To run this script from within the data_analysiss folder:

python3 discover_schema_endpoints.py

OR you can manually specify a file to load in with:

python3 discover_schema_endpoints.py "outputs/some_other_file.csv"

Noting that the file must contain expected columns

Search all published APIs for search phrase

search_apis_for_phrase.py This script:

Takes a search term or even a phrase and searches every API listed in the published APIs of Structurizr
The search is not case or space delimiter sensitive, and works by scanning only the paths and schemas of the api-docs (which is where the relevant information will be).
It generates an in-memory data frame of search results for both the Schema and the Path search, and returns these data frames as csv tables.

python3 search_apis_for_phrase.py "search phrase"

Limitations:

The script will handle timeouts and other common API errors.
If there is a limitation on accessing a URL from a non MoJ device, this script will also be limited in that way
- In this regard all of the links only work when running the search on an MoJ device
The URLs are currently hardcoded due to non-obvious ways of retrieving the api-docs dynamically
- A potential upgrade could be with a webscraper but I don't know if the computational complexity is worth the effort

Modules

This repository contains the following Python modules:

Constants

There is a constants directory, initialised to be a module directory, containing constants and common objects used by the scripts. This allows you to edit constants in one location, without having to amend other scripts in the base data_analysis directory. Feel free to explore them for more functionality.

Contributing

Contributions to this tooling section are welcome, as long as they can be executed in Python. Autodocumentation is a potential and the desire is to keep this option open as this tooling section expands.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data Analysis Tooling

Contents

Requirements

Installation

Method 1 - to a local, project specific python virtual environment

Method 2 - directly to your base python distribution

Scripts

Retrieve Schemas and their fields from Prison API

Generate a Schema space diagram, output child-parent relations

Retrieve Endpoints for provided list of schemas

Search all published APIs for search phrase

Modules

Constants

Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data Analysis Tooling

Contents

Requirements

Installation

Method 1 - to a local, project specific python virtual environment

Method 2 - directly to your base python distribution

Scripts

Retrieve Schemas and their fields from Prison API

Generate a Schema space diagram, output child-parent relations

Retrieve Endpoints for provided list of schemas

Search all published APIs for search phrase

Modules

Constants

Contributing