- Description: Deployment framework to AWS for secure, autoscaling, High-Availability microservices
- Provided microservice: In this example, we use an API generating Knowledge Graphs from arxiv.org
- Use:
- with provided flask app, no modifications required
- with your own flask app
- replace the content of folder ./app/ by your own miscroservice
- update requirements.txt
- update Dockerfile
- update terraform files config.tf and variables.tf
- That's it !
NB: Feel free to contribute to this project by creating issues :)
Table of Contents
This project deploys an API on AWS according to the following workflow:
./app folder structure
- downloads/ --------- temp folder when downloading pdf from arxiv
- models/ ------------- helper functions for app.py
- ontologies/ --------- to store generated ontology world.owl
- templates/ ---------- html templates for rendering in web browser (not supported in this version)
- tests/ ---------------- test scripts for pytest (not supported in this version)
- uploads/ ------------ folder to stage manually pdf documents for upload (not supported in this version)
- app.py -------------- flask app and main routes
- Dockerfile ---------- to build container
- requirements.txt --- project dependencies generated using pipreqs package
- Web Scrapping
- queries through python arxiv API
- Natural Language Processing
- use spacy with pre-trained en_core_web_sm model
- Ontology / Knowledge Graph
Global dependencies: (please refer to links for installation tutorials if necessary)
Clone and go to the newly created repository :
$ git clone <project https address>
$ cd KnowledgeGraph-Terraform-Flask-app
Create a deployment virtualenv and activate it:
# for UNIX systems:
$ python -m venv deploy_venv
$ source deploy_venv/bin/activate
# for Windows systems:
$ python -m venv deploy_venv
$ deploy venv\Scripts\activate
Install requirements from txt file:
$ pip install -r requirements.txt
Various DB available:
- local DynamoDB, for integration testing
- hosted AWS DynamoDB, for production
Select chosen option by commenting/uncommenting related lines in models/model.py
If you wish to use a local DynamoDB, you should configure it using the following commands: Refer to this tutorial for details.
-
download DynamoDB .zip package from the tutorial
-
extract package to chosen location
-
from a bash shell, at this location, launch DynamoDBLocal.jar with:
$ java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb
-
keep this shell window open to use your DB
-
in another shell tab, create your table
$ aws dynamodb create-table --table-name arxivTable --attribute-definitions AttributeName=_id,AttributeType=S --key-schema AttributeName=_id,KeyType=HASH --billing-mode PAY_PER_REQUEST --endpoint-url http://localhost:8000
-
check if the table exists
$ aws dynamodb list-tables --endpoint-url http://localhost:8000
-
When needed, you can destroy the table using the command:
$ aws dynamodb delete-table --table-name arxivTable --endpoint-url http://localhost:8000
$ cd app/
$ python app.py
Open http://localhost:5000 in a browser to interact with the API
build and run container using following commands.
$ docker build -t knowledgegraph-terraform-flask-app .
$ docker run -d -p 5000:5000 knowledgegraph-terraform-flask-app
$ curl http://localhost:5000
Resulting architecture generated in AWS :
Refer to this tutorial to get more details. Use commands below to ensure proper deployment.
NB: This step assumes you already have a configured programatic CLI access to an active AWS account. Refer to this tutorial for more details.
Make sure to select proper DB endpoint (AWS hosted DynamoDB) in models/model.py before building your container.
Create repository on AWS ECR:
$ aws ecr create-repository --repository-name knowledgegraph-terraform-flask-app --image-scanning-configuration scanOnPush=true --region eu-west-3
NB: Insert your actual AWS ID in place of <AWS_ID> in the following command lines.
Get credentials:
$ aws ecr get-login-password --region eu-west-3 | docker login --username AWS --password-stdin <AWS_ID>.dkr.ecr.eu-west-3.amazonaws.com/knowledgegraph-terraform-flask-app
From your browser open the AWS Console, open Services, Elastic Container Registry.
Select the knowledgegraph-terraform-flask-app. The ECR URI will be needed later on.
Back to the shell, log into the ECR service of your AWS account (use your own AWS_ID) with the following commands.
Tag and push to ECR:
$ docker tag knowledgegraph-terraform-flask-app:latest <AWS_ID>.dkr.ecr.eu-west-3.amazonaws.com/knowledgegraph-terraform-flask-app:latest
$ docker push <AWS_ID>.dkr.ecr.eu-west-3.amazonaws.com/knowledgegraph-terraform-flask-app:latest
$ cd ../terraform
$ terraform init
The Terraform code will deploy the following configuration:
- IAM: Identity access management policy configuration
- VPC: Public and private subnets, routes, and a NAT Gateway
- EC2: Autoscaling implementation
- ECS: Cluster configuration
- ALB: Load balancer configuration
- DynamoDB: Table configuration
- CloudWatch: Alert metrics configuration
# check configuration files:
$ terraform validate
# prepare and review execution plan:
# this command prompts for a valid ECR URI (see AWS console)
$ terraform plan
# deploy plan to AWS:
# this command prompts for a valid ECR URI (see AWS console)
# then type 'yes' when prompted to launch execution
$ terraform apply
The execution may take a while. If successful, the output will be the newly created URI for our API endpoint. Copy and paste this URI to your browser in order to access the API.
Delete the API completely from AWS:
$ terraform destroy
You can finally delete the ECR registry directly from your browser in AWS console.
In case of errors during deletion, check manually from AWS Console for services that are still up and running.
An API contract is provided through Postman API Platform, based on OpenAPI specifications.
See API contracts for information on the KnowledgeGraph-Terraform-Flask-app API and available routes:
See these resources for more content on how to document APIs
To Do: programatic access for tester in fully hosted scenarii?? --> AWS IAM role and associated acces keys for DynamoDB ???
- Go to provided endpoint
- Security, access restriction: TBD
- Upload unit file
- Upload batch not supported
- Generate ontology
OR
- Follow Deploy section
- with your endpoint, same steps as for fully hosted microservice
- Launch API from your machine to perform batch imports
OR
- launch local API instance (with local DynamoDB instance)
- with your endpoint, same steps as for fully hosted microservice
- Perform batch imports (for instance, batch size = increasing multiples of 10)
NB:
- The fully hosted Flask app relies extensively on network connectivity (timeouts may occur)
- Always prefer to launch batch imports from local API instance
- An area of improvement could be to use a cache such as celery.
- Another option would be to tweak parameters of the architecture, especially limitations on:
- Internet Gateway,
- NAT Gateway,
- Application Load Balancer.
Example of successful batch request from local API instance, 10 documents, elapsed time: 3 min
Testing not yet maintained in this version. Tech stack to use:
black: Clean code automatically on app files by using black package
$ black <filename>.py
pylint: Rate code quality and suggests improvements
$ python -m pylint <filename>.py
pytest: Perform unit tests from tests folder and check coverage
$ python -m pytest --cov
Monitor you microservice from AWS CloudWatch
Follow this tutorial to implement monitoring.