Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an assert_batches_eq! macro #7106

Open
ttencate opened this issue Feb 10, 2025 · 0 comments
Open

Add an assert_batches_eq! macro #7106

ttencate opened this issue Feb 10, 2025 · 0 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@ttencate
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

For unit tests, I want to compare two sets of record batches for equality, and report a readable error message if they aren't. For example:

let batch = record_batch!(
    ("column", Int32, [1, 2]),
);
assert_batches_eq!(
    &[batch],
    [
        "+--------+",
        "| column |",
        "+--------+",
        "| 1      |",
        "| 2      |",
        "+--------+",
    ];
);

Because this diffs the pretty-printed output, the output of a test failure can directly be copied into the source code as the expected result.

Describe the solution you'd like

Such a macro already exists in DataFusion. The implementation does not depend on DataFusion internals, so it could be copied into arrow-rs with only small adjustments, for example like this:

#[macro_export]
macro_rules! assert_batches_eq {
    ($left:expr, [$($right:tt)*] $(,)*) => {
        let expected_lines: Vec<&str> = [$($right)*].iter().map(|&s| s).collect();

        let formatted = ::arrow::util::pretty::pretty_format_batches($left)
            .unwrap()
            .to_string();

        let actual_lines: Vec<&str> = formatted.trim().lines().collect();

        assert!(
            expected_lines == actual_lines,
            "record batches are not equal:\nleft:\n{:#?}\nright:\n{:#?}",
            expected_lines,
            actual_lines,
        );
    };
    ([$($left:tt)*], $right:expr, $(,)*) => {
        assert_batches_eq!($right, [$($left)*])
    };
}

Here, I've already changed it to accept its arguments in either order, and to not require the syntactic noise of & or vec! in the call.

Describe alternatives you've considered

Copying the macro into my own code.

Additional context

Additional ergonomics, for bonus points, could include:

  • Accepting two Vec<RecordBatch>es instead of one Vec<RecordBatch> and one [String]
  • Accepting IntoIterator<Item = RecordBatch> instead of just Vec<RecordBatch>
  • Accepting individual RecordBatches instead of only sequences of them

All of these would require a new trait such as PrettyFormat, which would have to be public because it's called from within the macro expansion. It would be implemented for Iterator<Item = RecordBatch>, Vec<String> and so on.

@ttencate ttencate added the enhancement Any new improvement worthy of a entry in the changelog label Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

1 participant