KnowledgeGraph-Terraform-Flask-app

Description: Deployment framework to AWS for secure, autoscaling, High-Availability microservices
Provided microservice: In this example, we use an API generating Knowledge Graphs from arxiv.org
Use:
- with provided flask app, no modifications required
- with your own flask app
  - replace the content of folder ./app/ by your own miscroservice
  - update requirements.txt
  - update Dockerfile
  - update terraform files config.tf and variables.tf
  - That's it !

NB: Feel free to contribute to this project by creating issues :)

Table of Contents

General info
Flask App
Install
Deploy
Use

General info

This project deploys an API on AWS according to the following workflow:

Flask App

C4 Diagram

./app folder structure

downloads/ --------- temp folder when downloading pdf from arxiv
models/ ------------- helper functions for app.py
ontologies/ --------- to store generated ontology world.owl
templates/ ---------- html templates for rendering in web browser (not supported in this version)
tests/ ---------------- test scripts for pytest (not supported in this version)
uploads/ ------------ folder to stage manually pdf documents for upload (not supported in this version)
app.py -------------- flask app and main routes
Dockerfile ---------- to build container
requirements.txt --- project dependencies generated using pipreqs package

Functional blocks

Web Scrapping
- queries through python arxiv API
Natural Language Processing
- use spacy with pre-trained en_core_web_sm model
Ontology / Knowledge Graph
- use owlready2
- currently does not import full foaf model (due to import bug in Protégé)

Install

Quickstart

Global dependencies: (please refer to links for installation tutorials if necessary)

recent OS
git
Python (including venv)
AWS Account & CLI
Docker
Terraform
Protégé

Clone and go to the newly created repository :

$ git clone <project https address>
$ cd KnowledgeGraph-Terraform-Flask-app

Create a deployment virtualenv and activate it:

# for UNIX systems:
$ python -m venv deploy_venv
$ source deploy_venv/bin/activate

# for Windows systems:
$ python -m venv deploy_venv
$ deploy venv\Scripts\activate

Install requirements from txt file:

$ pip install -r requirements.txt

Select endpoint for database

Various DB available:

- local DynamoDB, for integration testing
- hosted AWS DynamoDB, for production

Select chosen option by commenting/uncommenting related lines in models/model.py

If you wish to use a local DynamoDB, you should configure it using the following commands: Refer to this tutorial for details.

download DynamoDB .zip package from the tutorial
extract package to chosen location

from a bash shell, at this location, launch DynamoDBLocal.jar with:

 $ java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb

keep this shell window open to use your DB

in another shell tab, create your table

 $ aws dynamodb create-table --table-name arxivTable     --attribute-definitions AttributeName=_id,AttributeType=S --key-schema AttributeName=_id,KeyType=HASH --billing-mode PAY_PER_REQUEST --endpoint-url http://localhost:8000

check if the table exists

 $ aws dynamodb list-tables --endpoint-url http://localhost:8000

When needed, you can destroy the table using the command:

 $ aws dynamodb delete-table --table-name arxivTable --endpoint-url http://localhost:8000

Launch microservice on localhost

$ cd app/
$ python app.py

Open http://localhost:5000 in a browser to interact with the API

Docker locally

build and run container using following commands.

$ docker build -t knowledgegraph-terraform-flask-app .
$ docker run -d -p 5000:5000 knowledgegraph-terraform-flask-app
$ curl http://localhost:5000

Deploy

Resulting architecture generated in AWS :

Refer to this tutorial to get more details. Use commands below to ensure proper deployment.

Docker push to AWS

NB: This step assumes you already have a configured programatic CLI access to an active AWS account. Refer to this tutorial for more details.

Make sure to select proper DB endpoint (AWS hosted DynamoDB) in models/model.py before building your container.

Create repository on AWS ECR:

$ aws ecr create-repository --repository-name knowledgegraph-terraform-flask-app --image-scanning-configuration scanOnPush=true --region eu-west-3

NB: Insert your actual AWS ID in place of <AWS_ID> in the following command lines.

Get credentials:

$ aws ecr get-login-password --region eu-west-3 | docker login --username AWS --password-stdin <AWS_ID>.dkr.ecr.eu-west-3.amazonaws.com/knowledgegraph-terraform-flask-app

From your browser open the AWS Console, open Services, Elastic Container Registry.

Select the knowledgegraph-terraform-flask-app. The ECR URI will be needed later on.

Back to the shell, log into the ECR service of your AWS account (use your own AWS_ID) with the following commands.

Tag and push to ECR:

$ docker tag knowledgegraph-terraform-flask-app:latest <AWS_ID>.dkr.ecr.eu-west-3.amazonaws.com/knowledgegraph-terraform-flask-app:latest

$ docker push <AWS_ID>.dkr.ecr.eu-west-3.amazonaws.com/knowledgegraph-terraform-flask-app:latest

Deploy Terraform plan

$ cd ../terraform
$ terraform init

The Terraform code will deploy the following configuration:

IAM: Identity access management policy configuration
VPC: Public and private subnets, routes, and a NAT Gateway
EC2: Autoscaling implementation
ECS: Cluster configuration
ALB: Load balancer configuration
DynamoDB: Table configuration
CloudWatch: Alert metrics configuration

# check configuration files:
$ terraform validate 

# prepare and review execution plan:
# this command prompts for a valid ECR URI (see AWS console)
$ terraform plan  

# deploy plan to AWS:
# this command prompts for a valid ECR URI (see AWS console)
# then type 'yes' when prompted to launch execution
$ terraform apply

The execution may take a while. If successful, the output will be the newly created URI for our API endpoint. Copy and paste this URI to your browser in order to access the API.

Remove deployed architecture

Delete the API completely from AWS:

$ terraform destroy

You can finally delete the ECR registry directly from your browser in AWS console.

In case of errors during deletion, check manually from AWS Console for services that are still up and running.

Use

API manager

An API contract is provided through Postman API Platform, based on OpenAPI specifications.

See API contracts for information on the KnowledgeGraph-Terraform-Flask-app API and available routes:

API Contract

See these resources for more content on how to document APIs

Use scenarii

To Do: programatic access for tester in fully hosted scenarii?? --> AWS IAM role and associated acces keys for DynamoDB ???

Test fully hosted microservice

Go to provided endpoint
Security, access restriction: TBD
Upload unit file
Upload batch not supported
Generate ontology

OR

Deploy your own cloud hosted microservice

Follow Deploy section
with your endpoint, same steps as for fully hosted microservice
Launch API from your machine to perform batch imports

OR

Test your own microservice on localhost

launch local API instance (with local DynamoDB instance)
with your endpoint, same steps as for fully hosted microservice
Perform batch imports (for instance, batch size = increasing multiples of 10)

NB:

The fully hosted Flask app relies extensively on network connectivity (timeouts may occur)
Always prefer to launch batch imports from local API instance
An area of improvement could be to use a cache such as celery.
Another option would be to tweak parameters of the architecture, especially limitations on:
- Internet Gateway,
- NAT Gateway,
- Application Load Balancer.

Example of successful batch request from local API instance, 10 documents, elapsed time: 3 min

Code testing librairies

Testing not yet maintained in this version. Tech stack to use:

black: Clean code automatically on app files by using black package

$ black <filename>.py

pylint: Rate code quality and suggests improvements

$ python -m pylint <filename>.py

pytest: Perform unit tests from tests folder and check coverage

$ python -m pytest --cov

Monitor

Monitor you microservice from AWS CloudWatch

Follow this tutorial to implement monitoring.

Work with generated ontology

Install Protégé on your machine
Open downloaded file worl.owl
Launch reasoner in Protégé (Pellet)
Visualize Graph using Protégé plug-in OntoGraf

Example of Knowledge Graph obtained in Protégé:

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
app		app
images		images
terraform		terraform
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KnowledgeGraph-Terraform-Flask-app

General info

Flask App

C4 Diagram

./app folder structure

Functional blocks

Install

Quickstart

Select endpoint for database

Launch microservice on localhost

Docker locally

Deploy

Docker push to AWS

Deploy Terraform plan

Remove deployed architecture

Use

API manager

Use scenarii

Test fully hosted microservice

Deploy your own cloud hosted microservice

Test your own microservice on localhost

Code testing librairies

Monitor

Work with generated ontology

About

Languages

License

yclerc/KnowledgeGraph-Terraform-Flask-app

Folders and files

Latest commit

History

Repository files navigation

KnowledgeGraph-Terraform-Flask-app

General info

Flask App

C4 Diagram

./app folder structure

Functional blocks

Install

Quickstart

Select endpoint for database

Launch microservice on localhost

Docker locally

Deploy

Docker push to AWS

Deploy Terraform plan

Remove deployed architecture

Use

API manager

Use scenarii

Test fully hosted microservice

Deploy your own cloud hosted microservice

Test your own microservice on localhost

Code testing librairies

Monitor

Work with generated ontology

About

Topics

Resources

License

Stars

Watchers

Forks

Languages