This repository demonstrates how to modernize your data infrastructure by combining the power of:
Check out the YouTube talk for a quick walkthrough of this modernization: 📺 YouTube: Upgrade Your Infrastructure to Iceberg with dlt + Lakekeeper
This repository contains two configuration files:
dlt_warehouse.yml
– This is the original configuration before modernization, representing a traditional Snowflake-based warehouse setup.dlt.yml
– This is the modernized configuration, using Apache Iceberg as the destination via Lakekeeper.
Refer to each file to see how to transition from a legacy data warehouse pipeline to a modern, open table format with Iceberg.
- Install uv
pip install uv
- clone this repo and run the command:
make dev
- Put your Lakekeeper token to
dlt_portable_data_lake_demo/.dlt/secrets.toml
[destination.iceberg_lake.credentials]
credential="your-token"
- Download an archive with data
make download-gh
- Add the license:
[runtime]
license="..."
💡 The license is needed when using dlt+ features, sources, or destinations like Iceberg in this demo. Don’t have one yet? Join the waiting list to request it.
- Run the pipeline
uv run dlt pipeline loading_events run
-
You can see the data in the Lakekeeper: https://you.hosted.lakekeeper.app/catalog
-
(Optional) Run transformations. You need to specify credentials for Snowflake warehouse or change the warehouse type to DuckDB.
dlt transformation aggregate_issues run
Example Snowflake credentials in secrets.toml
[destination.snowflake.credentials]
database = "dlt_data"
password = "<password>"
username = "loader"
host = "your-host"
warehouse = "COMPUTE_WH"
role = "DLT_LOADER_ROLE"
➡️ See the dlt Snowflake destination docs for more.
- To access to your data, check the
access.ipynb
.