-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(go/adbc/driver/snowflake): use one session for connection #2494
Conversation
@davlee1972 @Zan-L @Niivii Could all three of you try testing using this PR and confirm that it eliminates the issue you were having with adbc_ingest? |
How often are nightly builds run? I’m using python so the driver is a Linux shared library object. |
@davlee1972 you would probably need to pull down this PR and build it yourself to test it. If you need I can build it and upload the shared lib and wheel |
This would really help me at least. I do not have enough knowledge in Go to build everything from scratch 🥲 I pulled the nightly 1.5 easily but I saw it was 8d old so I guess it wont have this change ! |
@Niivii here's a wheel I built using my branch. This should be installable and testable (I hope) |
I installed the 1.5.0dev wheel and copy into operations are still getting truncated. Three Copy Into operations were running for 30 seconds before they were killed by select count(*) call. Error messages.. time="2025-01-29T18:19:06-05:00" level=error msg="failed to get response. err: context canceled" func=gosnowflake.postRestfulQueryHelper file="restful.go:290" time="2025-01-29T18:19:06-05:00" level=error msg="error: context canceled" func="gosnowflake.(*snowflakeConn).queryContextInternal" file="connection.go:413" time="2025-01-29T18:19:06-05:00" level=error msg="error: context canceled" func="gosnowflake.(*snowflakeConn).queryContextInternal" file="connection.go:413" time="2025-01-29T18:19:06-05:00" level=error msg="error: context canceled" func="gosnowflake.(*snowflakeConn).queryContextInternal" file="connection.go:413" dbapi.py:937 return _blocking_call(self._stmt.execute_update, (), {}, self._stmt.cancel) InternalError: INTERNAL: some files not loaded by COPY command, 96 files remain after 5 retries |
Well, "failed to get response" is at least a different error. Is there any way you can give me a reproducer with synthetic data? I know that you're testing with internal stuff, but I haven't been able to recreate the truncated copies myself. |
Hello, this didn't solve the issue, still get some remaining COPY commands. This seems to happens more often with large amount of data. A couple millions of rows goes through without issue. |
@Niivii would you be able to share your reproducer? If you can't share the data, can you generate synthetic data that I would be able to reproduce it with? |
I will make a small python project to simulate what is done and generate some dummy data. Not sure I will have the time before the week-end though. Will let you know |
@davlee1972 @Niivii checking in again to see if either of you can provide a reproducer that I can try utilizing to debug and test this |
Hello @zeroshade, This should do the trick: import adbc_driver_snowflake.dbapi
import numpy as np
import polars as pl
def main() -> None:
big_df = generate_big_dataframe(num_rows=5000000)
with create_snowflake_connection() as snowflake_connection:
big_df.write_database(table_name="my_table", connection=snowflake_connection,
if_table_exists="append", engine="adbc")
def generate_big_dataframe(num_rows: int) -> pl.DataFrame:
rng = np.random.default_rng(12345)
data = {
"column_int": rng.integers(low=-10000, high=10000, size=num_rows),
"column_float": rng.random(size=num_rows),
"column_string": rng.choice(["A", "b", "cdefghijkl", "âozZEzaéXZez"], size=num_rows),
}
return pl.DataFrame(data=data)
def create_snowflake_connection() -> adbc_driver_snowflake.dbapi.Connection:
return adbc_driver_snowflake.dbapi.connect(db_kwargs={
"username": "myusername",
"adbc.snowflake.sql.account": "myaccount",
"adbc.snowflake.sql.auth_type": "auth_ext_browser",
"adbc.snowflake.sql.warehouse": "mywarehouse",
"adbc.snowflake.sql.role": "myrole",
"adbc.snowflake.sql.db": "mydb",
"adbc.snowflake.sql.schema": "myschema",
})
if __name__ == "__main__":
main() python>=3.13 Let me know if you can reproduce. |
Thanks @Niivii with the help of that script I was able to narrow down and determine the precise cause of the issue. While this is needed to fix it, it will ultimately need another solution to fix that issue (I'm gonna start working on it now). In the meantime, this does provide a good fix for some issues including #2517. (I've updated the PR description such that it marks fixing the correct issue) @lidavidm Any further comments on this or do you think it's good to merge as-is while I work on the actual solution for the COPY into problem? |
@Niivii @davlee1972 I think i figured it out without needing a larger refactor.... here's a built version that you can try and hopefully fixes the issue on your end, it fixed my reproducer at least.... Please report back! |
@Niivii Okay, I think i spotted where the other potential race condition was:
That's my theory as to how it still happened in your test. So I've updated If not, then I'm probably going to have to just rework and refactor the entire process here which i'm trying to avoid.... |
Is there a cleaner architecture we can achieve by refactoring? At some point it feels a bit brittle 😅 (that said I'm happy to merge this for the time being if it does fix everything) |
Unfortunately this fix didn't work from my end after installing the manylinux whl file. First time it aborted an active copy into. Unfortunately I'm having a corporate issue between my personal GitHub and copilot GitHub account so here is a camera photo of the error. The error I'm seeing is 64 files remain after 5 retries. I'm wondering why we have all these copy into operations with no rows loaded and if there happens to be 5 of these in a row will it abort prematurely??? |
It's the can you give me any more information about the shape of your data? How many rows/columns? I want to try to see if I can reproduce the failure you're seeing somehow so I can try to debug how it's happening. |
Should we merge this for the time being while we continue investigating? It seems it fixes at least a couple issues if not all |
@lidavidm I agree, let's merge this to fix the issues it does fix, and I'll continue investigating. |
@davlee1972 Please feel free to reply here if you can give me more information (number of rows / columns / batches, etc) about your dataset that can help me to try to reproduce your failure (also, please try to build it fresh locally with the updated changes and see if that helps). In the meantime, I have some ideas about reducing the extraneous copy calls that have 0 rows inserted. |
From his photo, you can see the number of inserted rows in Snowflake on the right. Otherwise, try to saturate your RAM with the biggest dataframe possible and see how it goes 🤷 |
Finally got my dual login to github issues resolved (for next 90 days)... Here's my test code using the NYC Taxi Trip Data. https://www.kaggle.com/datasets/elemento/nyc-yellow-taxi-trip-data?resource=download It aborted after coming up 5 empty copy into(s)..
|
Nice, I'll see if I can reproduce with the same data. Hopefully it'll be consistent and I can fix this. Thanks @davlee1972 |
Instead of creating a separate connection for simple queries like SELECT COUNT and the metadata queries, we'll keep using the single connection to maintain the existing session throughout. This should avoid the problem mentioned in #2128 where a new session cancels the remaining COPY INTO queries. It also simplifies some of the work on metadata queries so that we don't need to explicitly propagate catalog, schema and database names between connections.
Fixes #2517