test(bigquery): add integration tests for dataset admin operations #1914

alvarowolfx · 2025-04-25T13:34:45Z

Add basic admin operation on BigQuery Datasets. Later the method cleanup_stale_datasets can be reused to clean up resources on future BigQuery integration tests, as a Dataset is the highest level in the hierarchy and delete operations can delete all child resources.

Towards #1773

codecov · 2025-04-25T13:49:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.24%. Comparing base (0d16378) to head (bc3a53a).
Report is 9 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1914      +/-   ##
==========================================
- Coverage   95.68%   95.24%   -0.44%     
==========================================
  Files          57       58       +1     
  Lines        2061     2102      +41     
==========================================
+ Hits         1972     2002      +30     
- Misses         89      100      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coryan

Drive-by ...

src/auth/integration-tests/Cargo.toml

src/integration-tests/Cargo.toml

coryan · 2025-04-27T16:00:34Z

src/integration-tests/src/bigquery.rs

+    let rand_suffix: String = rand::rng()
+        .sample_iter(&Alphanumeric)
+        .take(8)
+        .map(char::from)
+        .collect();
+
+    let ds_name = format!("rust_bq_test_dataset_{rand_suffix}");


In general we prefer using labels to identify test resources. Can we do that here?

coryan · 2025-04-27T16:05:39Z

src/integration-tests/src/bigquery.rs

+    assert!(!list.datasets.is_empty());
+    assert!(list.datasets.len() > 1);
+    assert!(
+        list.datasets
+            .iter()
+            .find(|v| v.id.contains(&ds_name))
+            .is_some()
+    );


All of this can be changed to:

Suggested change

assert!(!list.datasets.is_empty());

assert!(list.datasets.len() > 1);

assert!(

list.datasets

.iter()

.find(|v| v.id.contains(&ds_name))

.is_some()

);

assert!(

list.datasets

.iter()

.find(|v| v.id.contains(&ds_name))

.is_some(),

"{:?}", list.datasets

);

Separately, why "contains"? Can we make that a more specific predicate?

the Dataset id contains the fully qualified id of the dataset, so it would be something like projects/{projectId}/datasets/{datasetId}, so that's why I used contains.

Yes, I was suggesting we do something more specific, such as:

v.id.strip_suffix(&ds_name).and(Some(true)).unwrap_or_default()

or

v.id == format!("projects/{project_id}/datasets/{ds_name}")

or (gross):

v.id.strip_prefix("projects/").and_then(|s| s.strip_prefix(&project_id)).and_then(|s| s.strip_prefix("/datasets/")).and_then(|s| s == &ds_name).unwrap_or_default()

coryan · 2025-04-27T16:33:41Z

src/integration-tests/src/bigquery.rs

+            return client
+                .get_dataset(
+                    project_id,
+                    v.dataset_reference.as_ref().map_or("", |v| &v.dataset_id),


This deserves some comment explaining what is going on. If the dataset_reference is not set it seems we cannot make a successful call to .get_dataset()? Maybe returning None is more appropriate? or maybe using v.id and stripping the project_id?

Maybe we should have something like:

fn extract_dataset_id(project_id: &str, v: bigquery::model::ListFormatDataset) -> Option<String> { match v.dataset_reference { Some(r) => r.dataset_id.clone, None => v.id.strip_prefix("projects/").strip_prefix(project_id).owned(), } }

and then we issue the request only if we have Some()...

coryan · 2025-04-27T16:47:15Z

src/integration-tests/src/bigquery.rs

+    let list = client.list_datasets(project_id).send().await?;
+    let pending_all_datasets = list
+        .datasets
+        .iter()
+        .map(|v| {
+            return client
+                .get_dataset(
+                    project_id,
+                    v.dataset_reference.as_ref().map_or("", |v| &v.dataset_id),
+                )
+                .send();
+        })
+        .collect::<Vec<_>>();
+
+    let stale_datasets = futures::future::join_all(pending_all_datasets)
+        .await
+        .into_iter()
+        .filter_map(|r| {
+            if r.as_ref()
+                .is_ok_and(|ds| ds.creation_time < stale_deadline && ds.id.contains("bq_rust"))
+            {
+                return r.ok();
+            }
+            return None;
+        })
+        .collect::<Vec<_>>();


This is a case where using into_stream() and StreamExt should help.

To start with, we should be able to write:

use futures::StreamExt; ... ... .. let list = client.list_datasets(project_id).send().await?; futures::stream::iter(list.datasets.iter()) .flat_map_unordered(16, // limit the number of concurrent requests async |item| { if let Some(id) = extract_dataset_id(item) { return client.get_dataset(project_id, id).send().await; } None }) .flat_map_unordered(16, |result| -> Option<Result<()>> { let result = result?; match result { Err(e) => Some(Err(e)), Ok(dataset) => { let id = extract_dataset_id(project_id, dataset.)?; // Blegh, this needs some refactoring if dataset.creation_time < stale_deadline && ds.labels.find("integration-test") == Some("true") { return Some(client.delete_dataset(project_id, id).await); } None } } });

Finally you may need to collect all the errors.

coryan

A small nit, please fix and merge.

coryan · 2025-04-30T18:07:29Z

src/integration-tests/tests/driver.rs

+    #[test_case(bigquery::client::DatasetService::builder(); "default")]
+    #[test_case(bigquery::client::DatasetService::builder().with_tracing(); "with tracing enabled")]
+    #[test_case(bigquery::client::DatasetService::builder().with_retry_policy(retry_policy()); "with retry enabled")]


I think we do not need all three cases for each library. One is enough, maybe this?

Suggested change

#[test_case(bigquery::client::DatasetService::builder(); "default")]

#[test_case(bigquery::client::DatasetService::builder().with_tracing(); "with tracing enabled")]

#[test_case(bigquery::client::DatasetService::builder().with_retry_policy(retry_policy()); "with retry enabled")]

#[test_case(bigquery::client::DatasetService::builder().with_retry_policy(retry_policy()).with_tracing(); "with retry and tracing enabled")]

alvarowolfx added 2 commits April 24, 2025 23:38

feat(wkt): allow empty/null responses to be treated as Empty

411e587

test(bigquery): add basic dataset admin integration tests

4bc7210

product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Apr 25, 2025

coryan reviewed Apr 27, 2025

View reviewed changes

alvarowolfx added 4 commits April 28, 2025 22:04

fix: lint issues

34050b2

Merge branch 'main' into test-bigquery-dataset-admin

427b138

fix: revert empty.rs file

3d7181c

fix: address pr comments

7a5ff65

alvarowolfx marked this pull request as ready for review April 30, 2025 17:52

alvarowolfx requested a review from a team as a code owner April 30, 2025 17:52

coryan approved these changes Apr 30, 2025

View reviewed changes

test: reduce amount of test cases for bigquery

bc3a53a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(bigquery): add integration tests for dataset admin operations #1914

test(bigquery): add integration tests for dataset admin operations #1914

alvarowolfx commented Apr 25, 2025 •

edited

Loading

codecov bot commented Apr 25, 2025 •

edited

Loading

coryan left a comment

coryan Apr 27, 2025

coryan Apr 27, 2025

alvarowolfx Apr 29, 2025

coryan Apr 30, 2025

coryan Apr 27, 2025

coryan Apr 27, 2025

coryan left a comment

coryan Apr 30, 2025

test(bigquery): add integration tests for dataset admin operations #1914

Are you sure you want to change the base?

test(bigquery): add integration tests for dataset admin operations #1914

Conversation

alvarowolfx commented Apr 25, 2025 • edited Loading

codecov bot commented Apr 25, 2025 • edited Loading

Codecov Report

coryan left a comment

Choose a reason for hiding this comment

coryan Apr 27, 2025

Choose a reason for hiding this comment

coryan Apr 27, 2025

Choose a reason for hiding this comment

alvarowolfx Apr 29, 2025

Choose a reason for hiding this comment

coryan Apr 30, 2025

Choose a reason for hiding this comment

coryan Apr 27, 2025

Choose a reason for hiding this comment

coryan Apr 27, 2025

Choose a reason for hiding this comment

coryan left a comment

Choose a reason for hiding this comment

coryan Apr 30, 2025

Choose a reason for hiding this comment

alvarowolfx commented Apr 25, 2025 •

edited

Loading

codecov bot commented Apr 25, 2025 •

edited

Loading