Skip to content

Commit

Permalink
Merge pull request #28 from usdot-jpo-ode/release/2025-q1
Browse files Browse the repository at this point in the history
Merge Release/2025 q1 into master
  • Loading branch information
SaikrishnaBairamoni authored Jan 27, 2025
2 parents 99c3652 + dd61d2e commit 8eb69a7
Show file tree
Hide file tree
Showing 26 changed files with 1,596 additions and 338 deletions.
35 changes: 35 additions & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Docker build

on:
pull_request:
types: [opened, synchronize, reopened]

jobs:
jpo-jikkou:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Build
uses: docker/build-push-action@v3
with:
context: jikkou
file: jikkou/Dockerfile.jikkou
cache-from: type=gha
cache-to: type=gha,mode=max

jpo-kafka-connect:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Build
uses: docker/build-push-action@v3
with:
context: kafka-connect
cache-from: type=gha
cache-to: type=gha,mode=max
62 changes: 62 additions & 0 deletions .github/workflows/dockerhub.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
name: "DockerHub Build and Push"

on:
push:
branches:
- "develop"
- "master"
- "release/*"

jobs:
dockerhub-jpo-jikkou:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to DockerHub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Replace Docker tag
id: set_tag
run: echo "TAG=$(echo ${GITHUB_REF##*/} | sed 's/\//-/g')" >> $GITHUB_ENV

- name: Build
uses: docker/build-push-action@v3
with:
context: jikkou
file: jikkou/Dockerfile.jikkou
push: true
tags: usdotjpoode/jpo-jikkou:${{ env.TAG }}
cache-from: type=gha
cache-to: type=gha,mode=max

dockerhub-jpo-kafka-connect:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to DockerHub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Replace Docker tag
id: set_tag
run: echo "TAG=$(echo ${GITHUB_REF##*/} | sed 's/\//-/g')" >> $GITHUB_ENV

- name: Build
uses: docker/build-push-action@v3
with:
context: kafka-connect
push: true
tags: usdotjpoode/jpo-kafka-connect:${{ env.TAG }}
cache-from: type=gha
cache-to: type=gha,mode=max
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
**/.env
**/.env

**/target
69 changes: 47 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ The JPO ITS utilities repository serves as a central location for deploying open
- [Quick Run](#quick-run-1)
- [4. MongoDB Kafka Connect](#4-mongodb-kafka-connect)
- [Configuration](#configuration)
- [Configure Kafka Connector Creation](#configure-kafka-connector-creation)
- [Quick Run](#quick-run-2)
- [Security Notice](#security-notice)


<a name="base-configuration"></a>
Expand Down Expand Up @@ -88,7 +90,7 @@ An optional `kafka-init`, `schema-registry`, and `kafka-ui` instance can be depl

### Configure Topic Creation

The Kafka topics created by the `kafka-setup` service are configured in the [kafka-topics-values.yaml](kafka/kafka-topics-values.yaml) file. The topics in that file are organized by the application, and sorted into "Stream Topics" (those with `cleanup.policy` = `delete`) and "Table Topics" (with `cleanup.policy` = `compact`).
The Kafka topics created by the `kafka-setup` service are configured in the [kafka-topics-values.yaml](jikkou/kafka-topics-values.yaml) file. The topics in that file are organized by the application, and sorted into "Stream Topics" (those with `cleanup.policy` = `delete`) and "Table Topics" (with `cleanup.policy` = `compact`).

The following enviroment variables can be used to configure Kafka Topic creation.

Expand All @@ -103,8 +105,7 @@ The following enviroment variables can be used to configure Kafka Topic creation
| `KAFKA_TOPIC_MIN_INSYNC_REPLICAS` | Minumum number of in-sync replicas (for use with ack=all) |
| `KAFKA_TOPIC_RETENTION_MS` | Retention time for stream topics, milliseconds |
| `KAFKA_TOPIC_DELETE_RETENTION_MS` | Tombstone retention time for compacted topics, milliseconds |


| `KAFKA_TOPIC_CONFIG_RELATIVE_PATH` | Relative path to the Kafka topic yaml configuration script, upper level directories are supported |

### Quick Run

Expand All @@ -121,34 +122,49 @@ The following enviroment variables can be used to configure Kafka Topic creation
<a name="mongodb-kafka-connect"></a>

## 4. MongoDB Kafka Connect
The mongo-connector service connects to specified Kafka topics (as defined in the mongo-connector/connect_start.sh script) and deposits these messages to separate collections in the MongoDB Database. The codebase that provides this functionality comes from Confluent using their community licensed [cp-kafka-connect image](https://hub.docker.com/r/confluentinc/cp-kafka-connect). Documentation for this image can be found [here](https://docs.confluent.io/platform/current/connect/index.html#what-is-kafka-connect).
The mongo-connector service connects to specified Kafka topics and deposits these messages to separate collections in the MongoDB Database. The codebase that provides this functionality comes from Confluent using their community licensed [cp-kafka-connect image](https://hub.docker.com/r/confluentinc/cp-kafka-connect). Documentation for this image can be found [here](https://docs.confluent.io/platform/current/connect/index.html#what-is-kafka-connect).

### Configuration
Provided in the mongo-connector directory is a sample configuration shell script ([connect_start.sh](./kafka-connect/connect_start.sh)) that can be used to create kafka connectors to MongoDB. The connectors in kafka connect are defined in the format that follows:

``` shell
declare -A config_name=([name]="topic_name" [collection]="mongo_collection_name"
[convert_timestamp]=true [timefield]="timestamp" [use_key]=true [key]="key" [add_timestamp]=true)
```

The format above describes the basic configuration for configuring a sink connector, this should be placed at the beginning of the connect_start.sh file. In general we recommend to keep the MongoDB collection name the same as the topic name to avoid confusion. Additionally, if there is a top level timefield set `convert_timestamp` to true and then specify the time field name that appears in the message. This will allow MongoDB to transform that message into a date object to allow for TTL creation and reduce message size. To override MongoDB's default message `_id` field, set `use_key` to true and then set the `key` property to "key". The "add_timestamp" field defines whether the connector will add a auto generated timestamp to each document. This allows for creation of Time To Live (TTL) indexes on the collections to help limit collection size growth.

After the sink connector is configured above, then make sure to call the createSink function with the config_name of the configuration like so:

``` shell
createSink config_name
```

This needs to be put after the createSink function definition. To use a different `connect_start.sh` script, pass in the relative path of the new script by overriding the `CONNECT_SCRIPT_RELATIVE_PATH` environmental variable.
Kafka connectors are managed by the

Set the `COMPOSE_PROFILES` environmental variable as follows:

- `kafka_connect` will only spin up the `kafka-connect` service in [docker-compose-connect](docker-compose-connect.yml)
- `kafka_connect` will only spin up the `kafka-connect` and `kafka-init` services in [docker-compose-connect](docker-compose-connect.yml)
- NOTE: This implies that you will be using a separate Kafka and MongoDB cluster
- `kafka_connect_standalone` will run the following:
1. `kafka-connect` service from [docker-compose-connect](docker-compose-connect.yml)
2. `kafka` service from [docker-compose-kafka](docker-compose-kafka.yml)
3. `mongo` and `mongo-setup` services from [docker-compose-mongo](docker-compose-mongo.yml)
2. `kafka-init` service from [docker-compose-connect](docker-compose-connect.yml)
3. `kafka` service from [docker-compose-kafka](docker-compose-kafka.yml)
4. `mongo` and `mongo-setup` services from [docker-compose-mongo](docker-compose-mongo.yml)

### Configure Kafka Connector Creation

The Kafka connectors created by the `kafka-connect-setup` service are configured in the [kafka-connectors-values.yaml](jikkou/kafka-connectors-values.yaml) file. The connectors in that file are organized by the application, and given parameters to define the Kafka -> MongoDB sync connector:

| Connector Variable | Required | Condition | Description|
|---|---|---|---|
| `topicName` | Yes | Always | The name of the Kafka topic to sync from |
| `collectionName` | Yes | Always | The name of the MongoDB collection to write to |
| `generateTimestamp` | No | Optional | Enable or disable adding a timestamp to each message (true/false) |
| `connectorName` | No | Optional | Override the name of the connector from the `collectionName` to this field instead |
| `useTimestamp` | No | Optional | Converts the `timestampField` field at the top level of the value to a BSON date |
| `timestampField` | No | Required if `useTimestamp` is `true` | The name of the timestamp field at the top level of the message |
| `useKey` | No | Optional | Override the document `_id` field in MongoDB to use a specified `keyField` from the message |
| `keyField` | No | Required if `useKey` is `true` | The name of the key field |

The following environment variables can be used to configure Kafka Connectors:

| Environment Variable | Description |
|---|---|
| `CONNECT_URL` | Kafka connect API URL |
| `CONNECT_LOG_LEVEL` | Kafka connect log level (`OFF`, `ERROR`, `WARN`, `INFO`) |
| `CONNECT_TASKS_MAX` | Number of concurrent tasks to configure on kafka connectors |
| `CONNECT_CREATE_ODE` | Whether to create kafka connectors for the ODE |
| `CONNECT_CREATE_GEOJSONCONVERTER` | Whether to create topics for the GeojsonConverter |
| `CONNECT_CREATE_CONFLICTMONITOR` | Whether to create kafka connectors for the Conflict Monitor |
| `CONNECT_CREATE_DEDUPLICATOR` | Whether to create topics for the Deduplicator |
| `CONNECT_CONFIG_RELATIVE_PATH` | Relative path to the Kafka connector yaml configuration script, upper level directories are supported |

### Quick Run

Expand All @@ -171,3 +187,12 @@ Set the `COMPOSE_PROFILES` environmental variable as follows:
8. Feel free to test this with other topics or by producing to these topics using the [ODE](https://github.com/usdot-jpo-ode/jpo-ode)

[Back to top](#toc)

## Security Notice

While default passwords are provided for development convenience, it is **strongly recommended** to:

1. Change all passwords before deploying to any environment
2. Never use default passwords in production
3. Use secure password generation and management practices
4. Consider using Docker secrets or environment management tools for production deployments
61 changes: 51 additions & 10 deletions docker-compose-connect.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,32 +16,73 @@ services:
memory: 4G
ports:
- "8083:8083"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8083/connectors"]
interval: 30s
timeout: 10s
retries: 4
depends_on:
mongo:
condition: service_healthy
required: false
kafka:
condition: service_healthy
required: false
environment:
MONGO_URI: ${MONGO_URI}
MONGO_DB_NAME: ${MONGO_DB_NAME}
CONNECT_BOOTSTRAP_SERVERS: ${KAFKA_BOOTSTRAP_SERVERS}
CONNECT_REST_ADVERTISED_HOST_NAME: connect
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: kafka-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: topic.kafka-connect-configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_CONFIG_STORAGE_CLEANUP_POLICY: compact
# Topics are created with jikkou in the kafka-setup service
CONNECT_CONFIG_STORAGE_TOPIC: topic.KafkaConnectConfigs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: -1
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_STORAGE_TOPIC: topic.kafka-connect-offsets
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_STORAGE_TOPIC: topic.KafkaConnectOffsets
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: -1
CONNECT_OFFSET_STORAGE_CLEANUP_POLICY: compact
CONNECT_STATUS_STORAGE_TOPIC: topic.kafka-connect-status
CONNECT_STATUS_STORAGE_TOPIC: topic.KafkaConnectStatus
CONNECT_STATUS_STORAGE_CLEANUP_POLICY: compact
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: -1
CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_LOG4J_ROOT_LOGLEVEL: ${CONNECT_LOG_LEVEL}
CONNECT_LOG4J_LOGGERS: "org.apache.kafka.connect.runtime.rest=${CONNECT_LOG_LEVEL},org.reflections=${CONNECT_LOG_LEVEL},com.mongodb.kafka=${CONNECT_LOG_LEVEL}"
CONNECT_PLUGIN_PATH: /usr/share/confluent-hub-components

kafka-connect-setup:
profiles:
- all
- kafka_connect
- kafka_connect_standalone
- kafka_connect_setup
image: jpo-jikkou
build:
context: jikkou
dockerfile: Dockerfile.jikkou
entrypoint: ./kafka_connector_init.sh
restart: on-failure
deploy:
resources:
limits:
cpus: '0.5'
memory: 1G
depends_on:
kafka-connect:
condition: service_healthy
required: false
environment:
CONNECT_URL: ${CONNECT_URL}
CONNECT_TASKS_MAX: ${CONNECT_TASKS_MAX}
CONNECT_CREATE_ODE: ${CONNECT_CREATE_ODE}
CONNECT_CREATE_GEOJSONCONVERTER: ${CONNECT_CREATE_GEOJSONCONVERTER}
CONNECT_CREATE_CONFLICTMONITOR: ${CONNECT_CREATE_CONFLICTMONITOR}
CONNECT_CREATE_DEDUPLICATOR: ${CONNECT_CREATE_DEDUPLICATOR}
CONNECT_CREATE_MECDEPOSIT: ${CONNECT_CREATE_MECDEPOSIT}
MONGO_CONNECTOR_USERNAME: ${MONGO_ADMIN_DB_USER}
MONGO_CONNECTOR_PASSWORD: ${MONGO_ADMIN_DB_PASS:?}
MONGO_DB_IP: ${MONGO_IP}
MONGO_DB_NAME: ${MONGO_DB_NAME}
volumes:
- ${CONNECT_SCRIPT_RELATIVE_PATH}:/scripts/connect_start.sh
- ${CONNECT_CONFIG_RELATIVE_PATH-./jikkou/kafka-connectors-values.yaml}:/app/kafka-connectors-values.yaml
Loading

0 comments on commit 8eb69a7

Please sign in to comment.