-
Notifications
You must be signed in to change notification settings - Fork 35
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implementation of pipeline to run LST-Bench on Trino in Azure (#242)
Closes #238
- Loading branch information
Showing
32 changed files
with
768 additions
and
76 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
<!-- | ||
{% comment %} | ||
Copyright (c) Microsoft Corporation. | ||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
{% endcomment %} | ||
--> | ||
|
||
# Azure Pipelines Deployment for LST-Bench on Trino 420 | ||
This directory comprises the necessary tooling for executing LST-Bench on Trino 420 with different LSTs using Azure Pipelines. The included tooling consists of: | ||
- `run-lst-bench.yml`: | ||
An Azure Pipelines script designed to deploy Trino and execute LST-Bench. | ||
- `sh/`: | ||
A directory containing shell scripts and engine configuration files supporting the deployment of Trino and the execution of experiments. | ||
- `config/`: | ||
A directory with LST-Bench configuration files necessary for executing the experiments that are part of the results. | ||
|
||
## Prerequisites | ||
- Automation for deploying the infrastructure in Azure to run LST-Bench is not implemented. As a result, the Azure Pipeline script expects the following setup: | ||
- A VM named 'lst-bench-client' connected to the pipeline environment to run the LST-Bench client. | ||
- A VM named 'lst-bench-head' to run the coordinator node of the Trino cluster, also connected to the pipeline environment. | ||
- A VMSS cluster, that will serve as the Trino worker nodes, within the same VNet as the coordinator node. | ||
- An Azure Storage Account accessible by both the VMSS and coordinator node. | ||
- An Azure SQL Database (or SQL Server flavored RDBMS) that will be running Hive Metastore. | ||
The Hive Metastore schema for version 2.3.9 should already be installed in the instance. | ||
- Prior to running the pipeline, several variables need definition in your Azure Pipeline: | ||
- `data_storage_account`: Name of the Azure Blob Storage account where the source data for the experiment is stored. | ||
- `data_storage_account_shared_key` (secret): Shared key for the Azure Blob Storage account where the source data for the experiment is stored. | ||
- `data_storage_account_container`: Name of the container in the Azure Blob Storage account where the source data for the experiment is stored. | ||
- `hms_jdbc_driver`: JDBC driver for the Hive Metastore. | ||
- `hms_jdbc_url`: JDBC URL for the Hive Metastore. | ||
- `hms_jdbc_user`: Username for the Hive Metastore. | ||
- `hms_jdbc_password` (secret): Password for the Hive Metastore. | ||
- `hms_storage_account`: Name of the Azure Blob Storage account where the Hive Metastore will store data associated with the catalog (can be the same as the data_storage_account). | ||
- `hms_storage_account_shared_key` (secret): Shared key for the Azure Blob Storage account where the Hive Metastore will store data associated with the catalog. | ||
- `hms_storage_account_container`: Name of the container in the Azure Blob Storage account where the Hive Metastore will store data associated with the catalog. | ||
- The LSTs to run experiments on can be modified via input parameters for the pipelines in the Azure Pipelines YAML file or from the Web UI. | ||
Default values are assigned to these parameters. | ||
Parameters also include experiment scale factor, machine type, and cluster size. | ||
Note that these parameters are not used to deploy the data or the infrastructure, as this process is not automated in the pipeline. | ||
Instead, they are recorded in the experiment telemetry for proper categorization and visualization of results later on. | ||
|
||
## Additional Notes | ||
For workloads within LST-Bench that include an `optimize` step, particularly those involving partitioned tables, a [custom task](/docs/workloads.md#custom-tasks) is used to execute this step. | ||
The task divides the `optimize` operation into batches, each containing up to 100 partitions (the parameter value is configurable). | ||
This approach was implemented to address issues where Trino would crash if the optimization step were applied to the entire table. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Description: Connections Configuration | ||
--- | ||
version: 1 | ||
connections: | ||
- id: trino_0 | ||
driver: io.trino.jdbc.TrinoDriver | ||
url: jdbc:trino://${TRINO_MASTER_HOST}:8080 | ||
username: admin | ||
password: '' |
30 changes: 30 additions & 0 deletions
30
run/trino-420/azure-pipelines/config/experiment_config-cow-delta.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Description: Experiment Configuration | ||
--- | ||
version: 1 | ||
id: "${EXP_NAME}" | ||
repetitions: 1 | ||
# Metadata accepts any key-value that we want to register together with the experiment run. | ||
metadata: | ||
system: trino | ||
system_version: 420 | ||
table_format: delta | ||
table_format_version: undefined | ||
scale_factor: "${EXP_SCALE_FACTOR}" | ||
mode: cow | ||
machine: "${EXP_MACHINE}" | ||
cluster_size: "${EXP_CLUSTER_SIZE}" | ||
# The following parameter values will be used to replace the variables in the workload statements. | ||
parameter_values: | ||
external_catalog: hive | ||
external_database: "external_tpcds_sf_${EXP_SCALE_FACTOR}" | ||
external_table_format: textfile | ||
external_data_path: "abfss://${DATA_STORAGE_ACCOUNT_CONTAINER}@${DATA_STORAGE_ACCOUNT}.dfs.core.windows.net/tpc-ds/csv/sf_${EXP_SCALE_FACTOR}/" | ||
external_options_suffix: '' | ||
external_tblproperties_suffix: ", textfile_field_separator=',', null_format='', skip_header_line_count=1" | ||
catalog: delta | ||
database: "delta_${EXP_NAME}" | ||
table_format: delta | ||
data_path: 'abfss://${DATA_STORAGE_ACCOUNT_CONTAINER}@${DATA_STORAGE_ACCOUNT}.dfs.core.windows.net/tpc-ds/run/delta/sf_${EXP_SCALE_FACTOR}/' | ||
options_suffix: '' | ||
tblproperties_suffix: '' | ||
partition_spec_keyword: 'partitioned_by' |
30 changes: 30 additions & 0 deletions
30
run/trino-420/azure-pipelines/config/experiment_config-mor-iceberg.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Description: Experiment Configuration | ||
--- | ||
version: 1 | ||
id: "${EXP_NAME}" | ||
repetitions: 1 | ||
# Metadata accepts any key-value that we want to register together with the experiment run. | ||
metadata: | ||
system: trino | ||
system_version: 420 | ||
table_format: iceberg | ||
table_format_version: undefined | ||
scale_factor: "${EXP_SCALE_FACTOR}" | ||
mode: mor | ||
machine: "${EXP_MACHINE}" | ||
cluster_size: "${EXP_CLUSTER_SIZE}" | ||
# The following parameter values will be used to replace the variables in the workload statements. | ||
parameter_values: | ||
external_catalog: hive | ||
external_database: "external_tpcds_sf_${EXP_SCALE_FACTOR}" | ||
external_table_format: textfile | ||
external_data_path: "abfss://${DATA_STORAGE_ACCOUNT_CONTAINER}@${DATA_STORAGE_ACCOUNT}.dfs.core.windows.net/tpc-ds/csv/sf_${EXP_SCALE_FACTOR}/" | ||
external_options_suffix: '' | ||
external_tblproperties_suffix: ", textfile_field_separator=',', null_format='', skip_header_line_count=1" | ||
catalog: iceberg | ||
database: "iceberg_${EXP_NAME}" | ||
table_format: iceberg | ||
data_path: 'abfss://${DATA_STORAGE_ACCOUNT_CONTAINER}@${DATA_STORAGE_ACCOUNT}.dfs.core.windows.net/tpc-ds/run/iceberg/sf_${EXP_SCALE_FACTOR}/' | ||
options_suffix: '' | ||
tblproperties_suffix: '' | ||
partition_spec_keyword: 'partitioning' |
20 changes: 20 additions & 0 deletions
20
run/trino-420/azure-pipelines/config/setup_experiment_config.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Description: Experiment Configuration | ||
--- | ||
version: 1 | ||
id: setup_experiment | ||
repetitions: 1 | ||
# Metadata accepts any key-value that we want to register together with the experiment run. | ||
metadata: | ||
system: trino | ||
system_version: 420 | ||
scale_factor: "${EXP_SCALE_FACTOR}" | ||
machine: "${EXP_MACHINE}" | ||
cluster_size: "${EXP_CLUSTER_SIZE}" | ||
# The following parameter values will be used to replace the variables in the workload statements. | ||
parameter_values: | ||
external_catalog: hive | ||
external_database: "external_tpcds_sf_${EXP_SCALE_FACTOR}" | ||
external_table_format: textfile | ||
external_data_path: "abfss://${DATA_STORAGE_ACCOUNT_CONTAINER}@${DATA_STORAGE_ACCOUNT}.dfs.core.windows.net/tpc-ds/csv/sf_${EXP_SCALE_FACTOR}/" | ||
external_options_suffix: '' | ||
external_tblproperties_suffix: ", textfile_field_separator=',', null_format='', skip_header_line_count=1" |
13 changes: 13 additions & 0 deletions
13
run/trino-420/azure-pipelines/config/telemetry_config.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Description: Telemetry Configuration | ||
--- | ||
version: 1 | ||
connection: | ||
id: duckdb_0 | ||
driver: org.duckdb.DuckDBDriver | ||
url: jdbc:duckdb:./telemetry-trino-420 | ||
execute_ddl: true | ||
ddl_file: 'src/main/resources/scripts/logging/duckdb/ddl.sql' | ||
insert_file: 'src/main/resources/scripts/logging/duckdb/insert.sql' | ||
# The following parameter values will be used to replace the variables in the logging statements. | ||
parameter_values: | ||
data_path: '' |
Oops, something went wrong.