Skip to content

Commit eb53623

Browse files
aaltonjaJoona Halonen
and
Joona Halonen
authored
EWC backups and restore & token renewal cron job (#10)
* install etcdctl, add api6 snapshot script, move common logic to own file * add apisix backup job, use common namespace for shared secret and use role and role binding to read the values * vault backup changes, local value for etcd cluster host address * add vault, and apisix related variables * use general image ment for all cron jobs * fix vault service account reference * revert to namespace specific secrets, common one with roles and bindings not worked as expected * change backup bucket paths * change kubernetes host to match cluster DNS search paths * add postgresql-client * add keycloak snapshot script * add keycloak backup cron job to dev-portal module * pass new vars to dev-portal module * mv cron-jobs to jobs, add action to build & push jobs image to registry * name cron_jobs to jobs since restore jobs not cron * update job scripts * update restore jobs * add initial disaster recovery part * update vault keys need when recreate for data restore * update * renaming resources, cleaning up * parameterize replicaCounts, apisix helm chart etc. * run fmt * parameterize keycloak stuff * use single s3 backup base path var and construct the rest in job def * add renewal job, change role name to match more general purpose * parameterize more * use parameterized apisix and vault values * parameterize vault helm release name * fix api6 helm release name * use paramaterized helm release name * use release name from param * use apisix helm release name * supress default stdout, quite noisy. check init status for each pod * compress or decompress snapshot files, improve logging and error handling * update * remove hard-coded role and reference from terraform resource, separate policies and gather in role * Pass renewable token to job as an array * fix token var * Store secret as array --------- Co-authored-by: Joona Halonen <joona.halonen@cgi.com>
1 parent 5cb47b3 commit eb53623

21 files changed

+1838
-21
lines changed
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
name: Upload Jobs Image to Registry
2+
3+
env:
4+
DOCKER_REGISTRY: ghcr.io
5+
DOCKER_IMAGE_NAME: jobs
6+
7+
on:
8+
workflow_dispatch:
9+
push:
10+
branches:
11+
- main
12+
paths:
13+
- 'ewc/jobs/Dockerfile'
14+
- 'ewc/jobs/*.sh'
15+
- '.github/workflows/upload_jobs_image.yml'
16+
17+
jobs:
18+
build-and-push-docker-image:
19+
runs-on: ubuntu-latest
20+
21+
permissions:
22+
contents: read
23+
packages: write
24+
25+
steps:
26+
- name: Checkout repository
27+
uses: actions/checkout@v4
28+
29+
- name: Add Docker metadata to the image
30+
id: meta
31+
uses: docker/metadata-action@v5
32+
with:
33+
# list of Docker images to use as base name for tags
34+
images: |
35+
${{ env.DOCKER_REGISTRY }}/${{ github.repository }}/${{ env.DOCKER_IMAGE_NAME }}
36+
# generate Docker tags based on the following events/attributes
37+
tags: |
38+
type=schedule,pattern={{date 'YYYYMMDD-HHmmss'}}
39+
type=sha
40+
type=raw,value=latest,enable=${{ github.ref == format('refs/heads/{0}', 'main') }}
41+
42+
- name: Set up Docker Buildx
43+
uses: docker/setup-buildx-action@v3
44+
45+
- name: Log in to Docker registry
46+
uses: docker/login-action@v3
47+
with:
48+
registry: ghcr.io
49+
username: ${{ github.actor }}
50+
password: ${{ secrets.GITHUB_TOKEN }}
51+
52+
- name: Build and push Docker image
53+
uses: docker/build-push-action@v6
54+
with:
55+
file: ./ewc/jobs/Dockerfile
56+
context: ./ewc/jobs
57+
platforms: linux/amd64
58+
push: true
59+
# for now cache to github actions
60+
# might need some tuning
61+
cache-from: type=gha
62+
cache-to: type=gha,mode=max
63+
tags: ${{ steps.meta.outputs.tags }}

README.md

Lines changed: 146 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,9 @@ vault_unseal_keys = <sensitive>
4141
```
4242

4343
> [!IMPORTANT]
44-
> Make sure to store `vault_root_token` `vault_unseal_keys` and `dev-portal_keycloak_secret` somewhere safe.
44+
> Make sure to store `vault_root_token` `vault_unseal_keys` and `dev-portal_keycloak_secret` somewhere safe.
45+
>
46+
> If the Vault is recreated for a data restore operation, do not delete the previous `vault_unseal_keys`. Continue using the old unseal keys and ignore the new ones. The new `vault_root_token` is needed. For more details, see the [Vault Restore](#vault-restore) section.
4547
4648
You can access sensitive values using commands:
4749
```bash
@@ -77,3 +79,146 @@ vault_pod_ready_statuses_before_init = [
7779
```bash
7880
terraform output dev-portal_keycloak_secret
7981
```
82+
83+
## Vault token renewals
84+
85+
APISIX and the Dev Portal use service tokens to communicate with Vault. These tokens have a maximum TTL of 768 hours (32 days). To prevent token revocation, a cron job is scheduled to run on the 1st and 15th of each month to reset the token period.
86+
87+
## Disaster Recovery
88+
89+
The disaster recovery plan includes backing up application databases and logical data, and restoring them from snapshot files. The backup and restore processes are performed using database-specific tools like `pg_dump` and `pg_restore`.
90+
91+
### Backups
92+
93+
Each application's database (Keycloak PostgreSQL, APISIX etcd, Vault raft) has a dedicated Cron job for backups. The backup schedule can be adjusted using Terraform if needed. Currently, backups are saved to an AWS S3 bucket. If there is no need to store files older than a certain number of days, bucket retention policies can be used to manage this.
94+
95+
### Restore
96+
97+
Each application has dedicated job(s) to restore data from snapshots. These jobs are invoked with independent commands, but the job templates are managed within Terraform.
98+
99+
> [!IMPORTANT]
100+
> Ensure that the Terraform state and the actual cluster state are aligned before running restore jobs to avoid potential issues.
101+
>
102+
> You can try to take manual snapshot from desired database(s) before attempting the restore operation(s).
103+
104+
#### Keycloak Restore
105+
106+
```sh
107+
export KUBECONFIG="~/.kube/config" # Replace with the path to your kubeconfig file
108+
109+
###########################################
110+
# Optional manual backup before the restore
111+
###########################################
112+
113+
JOB_NAME=$(kubectl create job --from=cronjob/keycloak-backup keycloak-backup-$(date +%s) -n keycloak -o jsonpath='{.metadata.name}')
114+
POD_NAME=$(kubectl get pods -n keycloak -l job-name=$JOB_NAME -o jsonpath='{.items[0].metadata.name}')
115+
# Optionally, tail the logs
116+
kubectl logs -f $POD_NAME -n keycloak
117+
# Optionally, delete the job and its resources after completion
118+
kubectl delete job $JOB_NAME -n keycloak
119+
120+
###########################################
121+
# Restore
122+
###########################################
123+
124+
export SNAPSHOT_NAME="specific_snapshot.db.gz" # Optionally provide a specific snapshot name if you need to restore a snapshot other than the latest one
125+
126+
# Create the restore job and capture the job name
127+
JOB_NAME=$(kubectl get configmap keycloak-restore-backup -n keycloak -o jsonpath='{.data.job-template\.yaml}' | envsubst | kubectl create -f - -o name)
128+
129+
# Optionally, tail the logs
130+
kubectl logs -f $JOB_NAME -n keycloak
131+
132+
# Optionally, delete the job and its resources after completion
133+
kubectl delete $JOB_NAME -n keycloak
134+
```
135+
136+
#### Vault Restore
137+
138+
**Note:** Vault restore requires the UNSEAL_KEYS that were in use when the backup snapshot was taken. The VAULT_TOKEN is the latest root token that was created and used. If the Vault cluster becomes unresponsive or is completely wiped out, the existing cluster might need to be removed and a new one initialized. The new cluster will have new tokens and unseal keys. To access the cluster, the new token is needed, but unsealing the cluster requires the unseal keys used by the data in the snapshot.
139+
140+
```sh
141+
export KUBECONFIG="~/.kube/config" # Replace with the path to your kubeconfig file
142+
143+
###########################################
144+
# Optional manual backup before the restore
145+
###########################################
146+
147+
JOB_NAME=$(kubectl create job --from=cronjob/vault-backup vault-backup-$(date +%s) -n vault -o jsonpath='{.metadata.name}')
148+
POD_NAME=$(kubectl get pods -n vault -l job-name=$JOB_NAME -o jsonpath='{.items[0].metadata.name}')
149+
# Optionally, tail the logs
150+
kubectl logs -f $POD_NAME -n vault
151+
# Optionally, delete the job and its resources after completion
152+
kubectl delete job $JOB_NAME -n vault
153+
154+
###########################################
155+
# Restore
156+
###########################################
157+
158+
export SNAPSHOT_NAME="specific_snapshot.snap.gz" # Optionally provide a specific snapshot name if need to restore other than latest snapshot file
159+
160+
JOB_TEMPLATE=$(kubectl get configmap vault-restore-backup -n vault -o jsonpath='{.data.job-template\.yaml}')
161+
162+
# Pass the unseal keys and vault token, place and logic to fetch these might need adjusting
163+
# Create the restore job and capture the job name
164+
JOB_NAME=$(
165+
UNSEAL_KEYS=$(jq -r '. | join(",")' ~/path-to/unseal_keys.txt) \
166+
VAULT_TOKEN=$(cat ~/path-to/vault_token.txt) \
167+
envsubst <<< "$JOB_TEMPLATE" | \
168+
kubectl create -f - -o name
169+
)
170+
# Optionally, tail the logs
171+
kubectl logs -f $JOB_NAME -n vault
172+
# Optionally, delete the job and its resources after completion
173+
kubectl delete $JOB_NAME -n vault
174+
```
175+
176+
#### APISIX Restore
177+
178+
```sh
179+
export KUBECONFIG="~/.kube/config" # Replace with the path to your kubeconfig file
180+
181+
###########################################
182+
# Optional manual backup before the restore
183+
###########################################
184+
185+
JOB_NAME=$(kubectl create job --from=cronjob/apisix-backup apisix-backup-$(date +%s) -n apisix -o jsonpath='{.metadata.name}')
186+
POD_NAME=$(kubectl get pods -n apisix -l job-name=$JOB_NAME -o jsonpath='{.items[0].metadata.name}')
187+
# Optionally, tail the logs
188+
kubectl logs -f $POD_NAME -n apisix
189+
# Optionally, delete the job and its resources after completion
190+
kubectl delete job $JOB_NAME -n apisix
191+
192+
###########################################
193+
# Restore
194+
###########################################
195+
196+
export SNAPSHOT_NAME="specific_snapshot.snap.gz" # Optionally provide a specific snapshot name if you need to restore a snapshot other than the latest one
197+
198+
# Create the pre-restore job and capture the job name
199+
PRE_JOB_NAME=$(kubectl get configmap apisix-restore-backup -n apisix -o jsonpath='{.data.pre-job-template\.yaml}' | envsubst | kubectl create -f - -o name)
200+
201+
# Optionally, tail the logs of the pre-restore job
202+
kubectl logs -f $PRE_JOB_NAME -n apisix
203+
204+
# Optionally, delete the pre-restore job and its resources after completion
205+
kubectl delete $PRE_JOB_NAME -n apisix
206+
207+
# Create the main restore job and capture the job name
208+
MAIN_JOB_NAME=$(kubectl get configmap apisix-restore-backup -n apisix -o jsonpath='{.data.job-template\.yaml}' | envsubst | kubectl create -f - -o name)
209+
210+
# Optionally, tail the logs
211+
kubectl logs -f $MAIN_JOB_NAME -n apisix
212+
213+
# Optionally, delete the main restore job and its resources after completion
214+
kubectl delete $MAIN_JOB_NAME -n apisix
215+
216+
# Create the post-restore job and capture the job name
217+
POST_JOB_NAME=$(kubectl get configmap apisix-restore-backup -n apisix -o jsonpath='{.data.post-job-template\.yaml}' | envsubst | kubectl create -f - -o name)
218+
219+
# Optionally, tail the logs of the post-restore job
220+
kubectl logs -f $POST_JOB_NAME -n apisix
221+
222+
# Optionally, delete the post-restore job and its resources after completion
223+
kubectl delete $POST_JOB_NAME -n apisix
224+
```

0 commit comments

Comments
 (0)