Skip to content

Commit

Permalink
Merge pull request #26 from CDOT-CV/develop
Browse files Browse the repository at this point in the history
Updates from CDOT-CV Fork
  • Loading branch information
Michael7371 authored Jan 25, 2025
2 parents 196b67d + 7183174 commit f36c6cf
Show file tree
Hide file tree
Showing 65 changed files with 164 additions and 4,090 deletions.
29 changes: 19 additions & 10 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ on:
pull_request:
types: [opened, synchronize, reopened]

jobs:
jpo-deduplicator:
jobs:
jpo-jikkou:
runs-on: ubuntu-latest
steps:
- name: Checkout
Expand All @@ -15,12 +15,21 @@ jobs:
- name: Build
uses: docker/build-push-action@v3
with:
context: jpo-deduplicator
build-args: |
MAVEN_GITHUB_TOKEN_NAME=${{ vars.MAVEN_GITHUB_TOKEN_NAME }}
MAVEN_GITHUB_TOKEN=${{ secrets.MAVEN_GITHUB_TOKEN }}
MAVEN_GITHUB_ORG=${{ github.repository_owner }}
secrets: |
MAVEN_GITHUB_TOKEN: ${{ secrets.MAVEN_GITHUB_TOKEN }}
context: jikkou
file: jikkou/Dockerfile.jikkou
cache-from: type=gha
cache-to: type=gha,mode=max
cache-to: type=gha,mode=max

jpo-kafka-connect:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Build
uses: docker/build-push-action@v3
with:
context: kafka-connect
cache-from: type=gha
cache-to: type=gha,mode=max
44 changes: 32 additions & 12 deletions .github/workflows/dockerhub.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ on:
- "master"
- "release/*"

jobs:
dockerhub-jpo-deduplicator:
jobs:
dockerhub-jpo-jikkou:
runs-on: ubuntu-latest
steps:
- name: Checkout
Expand All @@ -21,21 +21,41 @@ jobs:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Replcae Docker tag
- name: Replace Docker tag
id: set_tag
run: echo "TAG=$(echo ${GITHUB_REF##*/} | sed 's/\//-/g')" >> $GITHUB_ENV

- name: Build
uses: docker/build-push-action@v3
with:
context: jpo-deduplicator
file: jikkou/Dockerfile.jikkou
push: true
tags: usdotjpoode/jpo-deduplicator:${{ env.TAG }}
build-args: |
MAVEN_GITHUB_TOKEN_NAME=${{ vars.MAVEN_GITHUB_TOKEN_NAME }}
MAVEN_GITHUB_TOKEN=${{ secrets.MAVEN_GITHUB_TOKEN }}
MAVEN_GITHUB_ORG=${{ github.repository_owner }}
secrets: |
MAVEN_GITHUB_TOKEN: ${{ secrets.MAVEN_GITHUB_TOKEN }}
tags: usdotjpoode/jpo-jikkou:${{ env.TAG }}
cache-from: type=gha
cache-to: type=gha,mode=max
cache-to: type=gha,mode=max

dockerhub-jpo-kafka-connect:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to DockerHub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Replace Docker tag
id: set_tag
run: echo "TAG=$(echo ${GITHUB_REF##*/} | sed 's/\//-/g')" >> $GITHUB_ENV

- name: Build
uses: docker/build-push-action@v3
with:
context: kafka-connect
push: true
tags: usdotjpoode/jpo-kafka-connect:${{ env.TAG }}
cache-from: type=gha
cache-to: type=gha,mode=max
70 changes: 0 additions & 70 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,6 @@ The JPO ITS utilities repository serves as a central location for deploying open
- [Configuration](#configuration)
- [Configure Kafka Connector Creation](#configure-kafka-connector-creation)
- [Quick Run](#quick-run-2)
- [5. jpo-deduplicator](#5-jpo-deduplicator)
- [Deduplication Config](#deduplication-config)
- [Generate a Github Token](#generate-a-github-token)
- [Quick Run](#quick-run-3)
- [Security Notice](#security-notice)


Expand Down Expand Up @@ -190,72 +186,6 @@ The following environment variables can be used to configure Kafka Connectors:
3. Click `OdeBsmJson`, and now you should see your message!
8. Feel free to test this with other topics or by producing to these topics using the [ODE](https://github.com/usdot-jpo-ode/jpo-ode)


<a name="deduplicator"></a>

## 5. jpo-deduplicator
The JPO-Deduplicator is a Kafka Java spring-boot application designed to reduce the number of messages stored and processed in the ODE system. This is done by reading in messages from an input topic (such as topic.ProcessedMap) and outputting a subset of those messages on a related output topic (topic.DeduplicatedProcessedMap). Functionally, this is done by removing deduplicate messages from the input topic and only passing on unique messages. In addition, each topic will pass on at least 1 message per hour even if the message is a duplicate. This behavior helps ensure messages are still flowing through the system. The following topics currently support deduplication.

- topic.ProcessedMap -> topic.DeduplicatedProcessedMap
- topic.ProcessedMapWKT -> topic.DeduplicatedProcessedMapWKT
- topic.OdeMapJson -> topic.DeduplicatedOdeMapJson
- topic.OdeTimJson -> topic.DeduplicatedOdeTimJson
- topic.OdeRawEncodedTIMJson -> topic.DeduplicatedOdeRawEncodedTIMJson
- topic.OdeBsmJson -> topic.DeduplicatedOdeBsmJson
- topic.ProcessedSpat -> topic.DeduplicatedProcessedSpat

### Deduplication Config

When running the jpo-deduplication as a submodule in jpo-utils, the deduplicator will automatically turn on deduplication for a topic when that topic is created. For example if the KAFKA_TOPIC_CREATE_GEOJSONCONVERTER environment variable is set to true, the deduplicator will start performing deduplication for ProcessedMap, ProcessedMapWKT, and ProcessedSpat data.

To manually configure deduplication for a topic, the following environment variables can also be used.

| Environment Variable | Description |
|---|---|
| `ENABLE_PROCESSED_MAP_DEDUPLICATION` | `true` / `false` - Enable ProcessedMap message Deduplication |
| `ENABLE_PROCESSED_MAP_WKT_DEDUPLICATION` | `true` / `false` - Enable ProcessedMap WKT message Deduplication |
| `ENABLE_ODE_MAP_DEDUPLICATION` | `true` / `false` - Enable ODE MAP message Deduplication |
| `ENABLE_ODE_TIM_DEDUPLICATION` | `true` / `false` - Enable ODE TIM message Deduplication |
| `ENABLE_ODE_RAW_ENCODED_TIM_DEDUPLICATION` | `true` / `false` - Enable ODE Raw Encoded TIM Deduplication |
| `ENABLE_PROCESSED_SPAT_DEDUPLICATION` | `true` / `false` - Enable ProcessedSpat Deduplication |
| `ENABLE_ODE_BSM_DEDUPLICATION` | `true` / `false` - Enable ODE BSM Deduplication |

### Generate a Github Token

A GitHub token is required to pull artifacts from GitHub repositories. This is required to obtain the jpo-deduplicator jars and must be done before attempting to build this repository.

1. Log into GitHub.
2. Navigate to Settings -> Developer settings -> Personal access tokens.
3. Click "New personal access token (classic)".
1. As of now, GitHub does not support `Fine-grained tokens` for obtaining packages.
4. Provide a name and expiration for the token.
5. Select the `read:packages` scope.
6. Click "Generate token" and copy the token.
7. Copy the token name and token value into your `.env` file.

For local development the following steps are also required
8. Create a copy of [settings.xml](jpo-deduplicator/jpo-deduplicator/settings.xml) and save it to `~/.m2/settings.xml`
9. Update the variables in your `~/.m2/settings.xml` with the token value and target jpo-ode organization.

### Quick Run
1. Create a copy of `sample.env` and rename it to `.env`.
2. Update the variable `MAVEN_GITHUB_TOKEN` to a github token used for downloading jar file dependencies. For full instructions on how to generate a token please see here:
3. Set the password for `MONGO_ADMIN_DB_PASS` and `MONGO_READ_WRITE_PASS` environmental variables to a secure password.
4. Set the `COMPOSE_PROFILES` variable to: `kafka,kafka_ui,kafka_setup, jpo-deduplicator`
5. Navigate back to the root directory and run the following command: `docker compose up -d`
6. Produce a sample message to one of the sink topics by using `kafka_ui` by:
1. Go to `localhost:8001`
2. Click local -> Topics
3. Select `topic.OdeMapJson`
4. Select `Produce Message`
5. Copy in sample JSON for a Map Message
6. Click `Produce Message` multiple times
7. View the synced message in `kafka_ui` by:
1. Go to `localhost:8001`
2. Click local -> Topics
3. Select `topic.DeduplicatedOdeMapJson`
4. You should now see only one copy of the map message sent.

[Back to top](#toc)

## Security Notice
Expand Down
5 changes: 5 additions & 0 deletions docker-compose-connect.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,10 @@ services:
depends_on:
mongo:
condition: service_healthy
required: false
kafka:
condition: service_healthy
required: false
environment:
CONNECT_BOOTSTRAP_SERVERS: ${KAFKA_BOOTSTRAP_SERVERS}
CONNECT_REST_ADVERTISED_HOST_NAME: connect
Expand Down Expand Up @@ -54,6 +56,7 @@ services:
- all
- kafka_connect
- kafka_connect_standalone
- kafka_connect_setup
image: jpo-jikkou
build:
context: jikkou
Expand All @@ -68,13 +71,15 @@ services:
depends_on:
kafka-connect:
condition: service_healthy
required: false
environment:
CONNECT_URL: ${CONNECT_URL}
CONNECT_TASKS_MAX: ${CONNECT_TASKS_MAX}
CONNECT_CREATE_ODE: ${CONNECT_CREATE_ODE}
CONNECT_CREATE_GEOJSONCONVERTER: ${CONNECT_CREATE_GEOJSONCONVERTER}
CONNECT_CREATE_CONFLICTMONITOR: ${CONNECT_CREATE_CONFLICTMONITOR}
CONNECT_CREATE_DEDUPLICATOR: ${CONNECT_CREATE_DEDUPLICATOR}
CONNECT_CREATE_MECDEPOSIT: ${CONNECT_CREATE_MECDEPOSIT}
MONGO_CONNECTOR_USERNAME: ${MONGO_ADMIN_DB_USER}
MONGO_CONNECTOR_PASSWORD: ${MONGO_ADMIN_DB_PASS:?}
MONGO_DB_IP: ${MONGO_IP}
Expand Down
41 changes: 0 additions & 41 deletions docker-compose-deduplicator.yml

This file was deleted.

35 changes: 23 additions & 12 deletions docker-compose-kafka.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ services:
deploy:
resources:
limits:
cpus: '1'
cpus: "1"
memory: 4G
volumes:
- kafka:/bitnami
Expand All @@ -40,10 +40,9 @@ services:
KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: "false"
logging:
options:
max-size: "10m"
max-size: "10m"
max-file: "5"


kafka-setup:
profiles:
- all
Expand All @@ -58,11 +57,12 @@ services:
deploy:
resources:
limits:
cpus: '0.5'
cpus: "0.5"
memory: 1G
depends_on:
kafka:
condition: service_healthy
required: false
environment:
KAFKA_BOOTSTRAP_SERVERS: ${KAFKA_BOOTSTRAP_SERVERS}
KAFKA_TOPIC_PARTITIONS: ${KAFKA_TOPIC_PARTITIONS}
Expand All @@ -74,12 +74,13 @@ services:
KAFKA_TOPIC_CREATE_GEOJSONCONVERTER: ${KAFKA_TOPIC_CREATE_GEOJSONCONVERTER}
KAFKA_TOPIC_CREATE_CONFLICTMONITOR: ${KAFKA_TOPIC_CREATE_CONFLICTMONITOR}
KAFKA_TOPIC_CREATE_DEDUPLICATOR: ${KAFKA_TOPIC_CREATE_DEDUPLICATOR}
MONGO_CONNECTOR_USERNAME: ${MONGO_ADMIN_DB_USER}
MONGO_CONNECTOR_PASSWORD: ${MONGO_ADMIN_DB_PASS}
MONGO_DB_IP: ${MONGO_IP}
MONGO_DB_NAME: ${MONGO_DB_NAME}
KAFKA_TOPIC_CREATE_MECDEPOSIT: ${KAFKA_TOPIC_CREATE_MECDEPOSIT}
volumes:
- ${KAFKA_TOPIC_CONFIG_RELATIVE_PATH:-./jikkou/kafka-topics-values.yaml}:/app/kafka-topics-values.yaml
logging:
options:
max-size: "10m"
max-file: "5"

kafka-schema-registry:
profiles:
Expand All @@ -91,11 +92,12 @@ services:
deploy:
resources:
limits:
cpus: '0.5'
cpus: "0.5"
memory: 1G
depends_on:
kafka:
condition: service_healthy
required: false
ports:
- "8081:8081"
environment:
Expand All @@ -108,6 +110,10 @@ services:
interval: 30s
timeout: 10s
retries: 4
logging:
options:
max-size: "10m"
max-file: "5"

kafka-ui:
profiles:
Expand All @@ -120,19 +126,24 @@ services:
deploy:
resources:
limits:
cpus: '0.5'
cpus: "0.5"
memory: 1G
ports:
- 8001:8080
depends_on:
kafka:
condition: service_healthy
required: false
environment:
DYNAMIC_CONFIG_ENABLED: 'true'
DYNAMIC_CONFIG_ENABLED: "true"
KAFKA_CLUSTERS_0_NAME: local
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: ${KAFKA_BOOTSTRAP_SERVERS}
KAFKA_CLUSTERS_0_KAFKACONNECT_0_NAME: kafka-connect
KAFKA_CLUSTERS_0_KAFKACONNECT_0_ADDRESS: ${CONNECT_URL}
logging:
options:
max-size: "10m"
max-file: "5"

volumes:
kafka:
kafka:
Loading

0 comments on commit f36c6cf

Please sign in to comment.