Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue Querying MotherDuck After v0.12.2 Upgrade #662

Open
enlore opened this issue Jan 31, 2025 · 8 comments
Open

Issue Querying MotherDuck After v0.12.2 Upgrade #662

enlore opened this issue Jan 31, 2025 · 8 comments

Comments

@enlore
Copy link

enlore commented Jan 31, 2025

Summary

Dependencies

Dependency versions:

  "dependencies": {
    "@motherduck/wasm-client": "^0.6.6",
    "@uwdata/vgplot": "^0.12.2",
    "apache-arrow": "^17.0.0",
    "react": "^18.3.1",
    "react-dom": "^18.3.1"
  },
➜  debug-motherduck-mosaic-v_0_12_2 git:(main)  node --version
v21.0.0

Errors

When attempting to build plots using data fetched from MotherDuck, one of two errors occurs.

ERROR 1: getFieldInfo throws:

field-info.js:75 Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'column_type')
   at getFieldInfo (field-info.js:75:19)
   at async Promise.all (:5173/index 1)
   at async queryFieldInfo (field-info.js:50:13)
   at async Coordinator.initializeClient (Coordinator.js:251:24)

This happens when attempting to build a plot by querying data from MotherDuck.

ERROR 2: arrowToColumns throws:

hook.js:608 TypeError: data.toColumns is not a function
  at arrowToColumns (to-data-columns.js:35:35)
  at toDataColumns (to-data-columns.js:24:7)
  at ConnectedMark.queryResult (Mark.js:177:17)
  at Coordinator.js:207:24

While exploring this issue, I validated that I can execute DESCRIBE queries against the plot's source table directly using vg.coordinator().query().

If I execute a DESCRIBE query with the same exact quotes as generated by Mosaic's query building logic before attempting to create the plot, this error is thrown instead of ERROR 1. See src/App.tsx line 28 for sample code.

Repro

See reference repo here: https://github.com/enlore/debug-motherduck-mosaic-v_0_12_2

It should be good to go so you can clone, npm i, and npm run dev.

The repo includes a read only token so the app can connect to my personal MotherDuck free tier account for convenience.

  • src/App.tsx contains the relevant logic
  • Set CONNECT_TO_MOTHERDUCK to true to see ERROR 1
  • Uncomment line 45 to see ERROR 2
  • Set CONNECT_TO_MOTHERDUCK to false to validate that Mosaic can build a line plot from a browser local DuckDB Wasm instance without issue
  • DuckDB version number is logged to the console for both code paths

Let me know if I can provide additional information.

Thank you.

@enlore enlore changed the title Issue Querying Motherduck After v0.12.2 Upgrade Issue Querying MotherDuck After v0.12.2 Upgrade Jan 31, 2025
@domoritz
Copy link
Member

I suspect the issue is that mosaic now uses flechette instead of arrow js. Can motherduck just return the binary blob instead of an arrow table object?

@enlore
Copy link
Author

enlore commented Feb 3, 2025

I will find out.

What would be the idea from that point? Hand the binary blob to Flechette to turn into a Table?

@domoritz
Copy link
Member

domoritz commented Feb 3, 2025

Yeah, something like

: query.type === 'arrow' ? decodeIPC(await res.arrayBuffer())
.

@jheer
Copy link
Member

jheer commented Feb 3, 2025

Note that the decodeIPC method (which parses IPC bytes to a Flechette table) is exported for reuse:

export { decodeIPC } from './util/decode-ipc.js';

Personally, I would prefer if data tools defaulted to proving Arrow bytes rather than a specific Arrow implementation, especially as the default (apache-arrow package) implementation does not (last time I checked) support all of the data types returned in DuckDB arrow results.

@enlore
Copy link
Author

enlore commented Feb 4, 2025

Chatted with the MotherDuck team, they suggested an approach for getting at the IPC data.

I implemented their suggestion into my sample project. See summary below and https://github.com/enlore/debug-motherduck-mosaic-v_0_12_2/blob/main/src/App.tsx for details.

Good news: seems to work as far as ensuring we are creating Flechette tables from query result byte streams.

Bad news: Mosaic is still choking on getFieldInfo.

Below are the key elements of getting at the IPC data directly.

  // We need connection ID to poll against later and the MDConnection does not 
  //   provide a useUnsafe method
  //
  // const connection = MDConnection.create({
  //   mdToken: token,
  // });

  // So we drop down a level to the DuckDB instance provided by MotherDuck's 
  //   getAsyncDuckDb and get a connection from it
  const duckDb = await getAsyncDuckDb({
    mdToken: token,
  })

  // Cast to the DuckDB Wasm version of the AsyncDuckDBConnection type
  const duckDbConn = await duckDb.connect() as unknown as AsyncDuckDBConnection
/**
 * Start a query and poll it to see when it finishes. Get the IPC data and build a Uint8Array from it.
 * @param conn AsyncDuckDBConnection
 * @param sql string A DuckDB SQL query string
 * @returns Promise<Uint8Array | undefined> The query results IPC data stream all buffered into one place
 */
async function getMDArrowIPC(conn: AsyncDuckDBConnection, sql: string): Promise<Uint8Array | undefined> {
  return await conn.useUnsafe(async (bindings: AsyncDuckDB, connId: number) => {
    try {
      let header = await bindings.startPendingQuery(connId, sql);
      
      while (header == null) {
          header = await bindings.pollPendingQuery(connId);
      }

      // see the duckdb-wasm AsyncResultStreamIterator: https://github.com/duckdb/duckdb-wasm/blob/41c03bef54cbc310ba2afa2e5f71f7a22b38364f/packages/duckdb-wasm/src/parallel/async_connection.ts#L111-L158 
      const iter = new AsyncResultStreamIterator(bindings, connId, header);
      
      const buffer = await accumulateIPCBuffer(iter);
     
      return buffer;

    } catch (error) {
        // TODO blearg
        console.error(error)
    }
  });
}

Finally we

function decodeIPCToFlechette(result: Uint8Array): Table {
  const table = decodeIPC(result)
  return table
}

@enlore
Copy link
Author

enlore commented Feb 4, 2025

In the below screenshot, an error is thrown because desc is undefined.

Image

Errors:

Image

Possibly related behavior: if I run a DESCRIBE query using the exact same quotes as the query generated by Mosaic before I build the plot, error goes away. Chart still seems busted though.

Code the runs the describe query:

const d2 = await vg.coordinator().query('DESCRIBE SELECT "tpep_pickup_datetime" AS "column" FROM "sample_data"."nyc"."taxi" AS "source"') as Table
console.info('d2', d2.toArray())

Screenshot:

Image

Further, if I run a DESCRIBE query prior to building the plot without worrying about the quotes: error.

Code to run the describe query:

// Confirming that we can DESCRIBE successfully
const d1 = await vg.coordinator().query('DESCRIBE SELECT tpep_pickup_datetime AS column FROM sample_data.nyc.taxi AS source') as Table
console.info('d1', d1.toArray())

Screenshot of error, no plot:

Image

@enlore
Copy link
Author

enlore commented Feb 4, 2025

Possibly related?

duckdb/duckdb#16017

@jheer
Copy link
Member

jheer commented Feb 4, 2025

Possibly related?

duckdb/duckdb#16017

Hmm, that definitely looks like it could break Mosaic in some cases. Thanks for flagging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants