doublecloud · laskoviymishka · Jan 17, 2025 · Jan 17, 2025 · Jan 17, 2025 · Jan 17, 2025
diff --git a/README.md b/README.md
@@ -3,10 +3,10 @@
 <div align="center">
 
 <h4 align="center">
-  <a href="https://double.cloud/services/doublecloud-transfer/">Double Cloud Transfer</a>  |
-  <a href="./docs/getting_started.md">Documentation</a>  |
-  <a href="./docs/benchmarks.md">Benchmarking</a>  |
-  <a href="./roadmap/roadmap_2024.md">Roadmap</a>
+  <a href="https://doublecloud.github.io/transfer/">Double Cloud Transfer</a>  |
+  <a href="https://doublecloud.github.io/transfer/docs/getting_started.html">Documentation</a>  |
+  <a href="https://doublecloud.github.io/transfer/docs/benchmarks.html">Benchmarking</a>  |
+  <a href="https://doublecloud.github.io/transfer/docs/roadmap">Roadmap</a>
 </h4>
 
 
@@ -186,18 +186,18 @@ More details [here](./docs/deploy_k8s.md).
 ## ⚡ Performance
 
 
-[Naive-s3-vs-airbyte](./docs/benchmark_vs_airbyte.md)
+[Naive-s3-vs-airbyte](https://medium.com/@laskoviymishka/transfer-s3-connector-vs-airbyte-s3-connector-360a0da084ae)
 
 </div>
 
-![Naive-s3-vs-airbyte](./assets/bench_s3_vs_airbyte.png)
+![Naive-s3-vs-airbyte](./docs/_assets/bench_s3_vs_airbyte.png)
 
 <div align="center">
 
 ## 📐 Architecture
 
 
-<img src="./assets/logo.png" alt="transfer" />
+<img src="./docs/_assets/architecture.png" alt="transfer" />
 
 </div>
 

diff --git a/assets/bench_s3_vs_airbyte.png b/assets/bench_s3_vs_airbyte.png
diff --git a/docs/_assets/architecture.png b/docs/_assets/architecture.png
diff --git a/assets/bench_key_metrics.png → docs/_assets/bench_key_metrics.png b/assets/bench_key_metrics.png → docs/_assets/bench_key_metrics.png
diff --git a/assets/bench_pprof_lens.png → docs/_assets/bench_pprof_lens.png b/assets/bench_pprof_lens.png → docs/_assets/bench_pprof_lens.png
diff --git a/assets/bench_pprof_prifle.png → docs/_assets/bench_pprof_prifle.png b/assets/bench_pprof_prifle.png → docs/_assets/bench_pprof_prifle.png
diff --git a/assets/bench_results.png → docs/_assets/bench_results.png b/assets/bench_results.png → docs/_assets/bench_results.png
diff --git a/docs/_assets/bench_s3_vs_airbyte.png b/docs/_assets/bench_s3_vs_airbyte.png
diff --git a/assets/bench_speedscope_init.png → docs/_assets/bench_speedscope_init.png b/assets/bench_speedscope_init.png → docs/_assets/bench_speedscope_init.png
diff --git a/docs/architecture-overview.md b/docs/architecture-overview.md
@@ -14,7 +14,8 @@ The white paper is structured as follows:
 
 1. **Introduction**: An overview of the purpose and goals of the system.
 2. **Systems Overview**: A brief overview of current systems that require different approaches for data synchronization.
-3. **Replication Techniques**: Description of the main replication techniques and application scenarios.
+3. **Architecture Overviw**: High level principles around {{ data-transfer-name }} architecture.
+4. **Replication Techniques**: Description of the main replication techniques and application scenarios.
 4. **Data Integrity**: Discussion of how to achieve data integrity across different types of storages and possible design decisions.
 5. **Challenges**: Description of the main challenges encountered while building the system as a service.
 6. **Case Studies**: In-depth analysis of some case studies where the system was used and how it helped.
@@ -82,8 +83,6 @@ The first step in overcoming the above challenges was to formulate the requireme
 
 After careful analysis, we crystallized our requirements for the future **{{ data-transfer-name }}** product as follows:
 
-
-
 * **Minimize Delivery Lag**: The system must guarantee that data is delivered with a freshness lag of only a few seconds to be considered useful.
 * **Guarantee Quality of Data**: The system must provide data with inferred schema from source tables and guarantee consistency between storages, with eventual consistency being acceptable.
 * **Serializable Intermediate Format**: The system must provide a uniform intermediate format to transport data, with the option to route traffic through a persistent queue.
@@ -108,11 +107,32 @@ To minimize development efforts and system complexity, we must have some univers
 
 Many middlewares exist between the source and sink for metrics collection, transformers application and logging.
 
+Here’s a draft for your architecture overview in the same style as the linked article:
+
+---
+
+## Architecture Overview
+
+The system is built around a **core module** that acts as the central part of the application, managing its internal logic and facilitating communication between components. Users can interact with the system through either a **Command-Line Interface (CLI)** or via a **Component Development Kit (CDK)**, which serves as a library of interfaces for embedding functionality into external systems.
+
+*Architecture*:
+![alt_text](_assets/architecture.png "image_tooltip")
+
+
+At its heart, the application follows a **plugin-based architecture**, enabling extensibility and modularity. 
+Plugins are integrated at **compile time** as Go dependencies, ensuring tight integration and optimal performance. 
+The system is implemented as a **Go monolith**, providing a streamlined and cohesive runtime environment.
+
+The core connects to **plugins** in various domains, such as **connectors** (e.g., S3, PostgreSQL, ClickHouse), or **transformers** (e.g., renaming or SQL transformations). 
+These plugins are further glued together by **middlewares**, enabling data processing and transformations to be seamlessly chained. 
+A shared **data model** ensures consistent communication between components, while **connectors** handle all database specific logic, **transformers** do computations based on shared **data model** and a **coordinator** manage and state tracking and coordination between nodes of **{{data-transfer-name}}** deployments.
+
+This modular approach allows the system to remain flexible, robust, and scalable while adhering to Go's principles of simplicity and high performance.
+
 *Dataplane overview*:
 
 ![alt_text](_assets/dp_architecture.png "image_tooltip")
 
-
 We must handle each delivery as a separate entity or resource to be able to configure it in a centralized way. This realization led us to make the runtime engine pluggable to use any IAS cloud provider or container management service (like <span style="text-decoration:underline;">k8s</span>). Each runtime here is a simple stateless worker executor running a job binary with provided options.
 
 The Data plane can track the status of a job and process commands to the job from our coordinator service.

diff --git a/docs/benchmarks.md b/docs/benchmarks.md
@@ -13,7 +13,6 @@ This guide outlines the steps to benchmark database transfer services using a ro
 
 2. **Prepare the Source Database**
     - Set up a source database on ec2 instance.
-    - Use a pre-production serverless runtime environment.
 
 3. **Load the Data**  
    Import the prepared dataset into the source database, ensuring it aligns with the benchmarking scenario.
@@ -35,7 +34,7 @@ Baselines provide reference points to measure performance.
     - Document the metrics from this initial transfer.
     - Key metric: Rows per second for single-core throughput.
     - Example: Measure total transfer time or use metrics like rows/sec.
-    - ![bench_key_metrics.png](../assets/bench_key_metrics.png)
+    - ![bench_key_metrics.png](_assets/bench_key_metrics.png)
 
 ---
 
@@ -45,22 +44,22 @@ After setting baselines, fine-tune the transfer settings for better performance.
 
 ### Optimization Steps:
 1. **Activate the Transfer**  
-   Deploye transfer via [helm](./deploy_k8s.md) in your k8s cluster.
+   Deploye transfer via [helm](deploy_k8s.html) in your k8s cluster.
 
 2. **Expose pprof for Profiling**
     - Expose the pprof port for profiling, by default `--run-profiler` is true.
 
 3. **Download the pprof File**
     - CPU profiles are accessible at `http://localhost:{EXPOSED_PORT}/debug/pprof/`.
-    - ![bench_key_metrics.png](../assets/bench_pprof_lens.png)
+    - ![bench_key_metrics.png](_assets/bench_pprof_lens.png)
     - Profiles typically sample for 30 seconds.
-    - ![bench_key_metrics.png](../assets/bench_pprof_prifle.png)
+    - ![bench_key_metrics.png](_assets/bench_pprof_prifle.png)
 
 4. **Visualize the Profile**
     - Use tools like [Speedscope](https://www.speedscope.app/).
-    - ![bench_key_metrics.png](../assets/bench_speedscope_init.png)
+    - ![bench_key_metrics.png](_assets/bench_speedscope_init.png)
     - Upload the profile to analyze call stacks.
-    - ![bench_key_metrics.png](../assets/bench_results.png)
+    - ![bench_key_metrics.png](_assets/bench_results.png)
     - Use the "Left-Heavy" view to identify high-time-consuming paths.
 
 ---

diff --git a/docs/index.yaml b/docs/index.yaml
@@ -12,6 +12,16 @@ description:
   - >-
     It supports several data transfer scenarios, with every scenario run at the logical level. This allows you to keep your source database running and
     minimize the downtime of applications that use the service.
+
+  - >-
+    At its heart, the application follows a <strong>plugin-based architecture</strong>, enabling extensibility and modularity.
+    <img style="width: 100%" src="_assets/architecture.png" alt="alt_text" title="image_tooltip">
+    <br/>
+    The system is built around a <strong>core module</strong> that acts as the central part of the application, 
+    managing its internal logic and facilitating communication between components.
+    Users can interact with the system through either a <strong>Command-Line Interface (CLI)</strong> or via a 
+    <strong>Component Development Kit (CDK)</strong>, 
+    which serves as a library of interfaces for embedding functionality into external systems.
 meta:
   title: "{{product-name}}"
 links:

diff --git a/docs/roadmap/index.md b/docs/roadmap/index.md
@@ -0,0 +1,11 @@
+---
+title: "Transfer connectors"
+description: "Explore the list of {{ data-transfer-name }} connectors in {{ DC }} and see their usage in different transfer types."
+---
+
+# Transfer roadmaps
+
+Transfer aim to have publicly vidible roadmaps for future improvements, here is a list of active and past roadmaps.
+
+* [{#T}](roadmap_2024.md)
+* [{#T}](roadmap_2025.md)
diff --git a/docs/roadmap/roadmap_2024.md b/docs/roadmap/roadmap_2024.md
@@ -0,0 +1,56 @@
+# Roadmap 2024
+
+## Key Goals
+
+1. **K8s Operator for Multi-Transfer Deployments**
+2. **Delta Sink**
+3. **Iceberg Sink**
+3. **Clickhouse Exactly Once Support**
+
+---
+
+## 1. E2E Testing for Main Connectors
+
+### Objective:
+Set up comprehensive **end-to-end tests** in the CI pipeline for the following main connectors:
+- **Postgres**
+- **MySQL**
+- **Clickhouse**
+- **Yandex Database (YDB)**
+- **YTsaurus (YT)**
+
+### Steps:
+- [x] Configure test environments in CI for each connector.
+- [x] Design E2E test scenarios covering various transfer modes (snapshot, replication, etc.).
+- [x] Automate test execution for all supported connectors.
+- [x] Set up reporting and logs for test failures.
+
+### Milestone:
+Achieve **fully automated E2E testing** across all major connectors to ensure continuous integration stability.
+
+---
+
+## 2. Helm Deployment Documentation
+
+### Objective:
+Provide detailed documentation on deploying the transfer engine using **Helm** on Kubernetes clusters.
+
+### Steps:
+- [x] Create Helm chart for easy deployment of the transfer engine.
+- [x] Write comprehensive **Helm deployment guide**.
+  - [x] Define key parameters for customization (replicas, resources, etc.).
+  - [x] Instructions for various environments (local, cloud).
+- [x] Test Helm deployment process on common platforms (GKE, EKS, etc.).
+
+### Milestone:
+Enable seamless deployment of the transfer engine via Helm with clear and accessible documentation.
+
+---
+
+## Summary
+
+- **Q2-Q3**: Focus on **E2E testing** for core connectors.
+- **Q3**: Publish **Helm deployment** documentation and final testing.
+- **Q3-Q4**: Develop and release the **Kubernetes operator** for multi-transfer management.
+
+This roadmap aims to enhance testing, simplify deployment, and provide advanced scalability options for the transfer engine.
diff --git a/docs/roadmap/roadmap_2025.md b/docs/roadmap/roadmap_2025.md
@@ -0,0 +1,30 @@
+# Roadmap 2025
+
+## Key Goals
+
+1. **K8s Operator for Multi-Transfer Deployments**
+2. **Delta Sink**
+3. **Iceberg Sink**
+3. **Clickhouse Exactly Once Support**
+
+---
+
+## 1. Kubernetes Operator for Multi-Transfer Deployments
+
+### Objective:
+Develop a **Kubernetes operator** to manage multiple data transfers, simplifying the process for large-scale environments.
+
+### Steps:
+- [ ] Define CRD (Custom Resource Definitions) for transfer configurations.
+- [ ] Implement operator logic for scaling and managing multi-transfer deployments.
+- [ ] Add support for monitoring, scaling, and error recovery.
+- [ ] Write user documentation for deploying and managing transfers via the operator.
+
+### Milestone:
+Provide a scalable solution for managing multiple data transfers in Kubernetes environments with an operator.
+
+---
+
+## Summary
+
+TODO
diff --git a/docs/toc.yaml b/docs/toc.yaml
@@ -94,7 +94,17 @@ items:
       - name: Connect Prometheus to Transfer
         href: integrations/connect-prometheus-to-transfer.md
 
+  - name: Plans
+    items:
+      - name: Overview
+        href: roadmap/index.md
+      - name: "Roadmap 2024"
+        href: roadmap/roadmap_2024.md
+      - name: "Roadmap 2025"
+        href: roadmap/roadmap_2025.md
   - name: Resolve issues with Transfer
     href: transfer-self-help.md
   - name: Questions and answers
     href: transfer-faq.md
+  - name: Benchmarking
+    href: benchmarks.md
diff --git a/roadmap/roadmap_2024.md b/roadmap/roadmap_2024.md