-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Most BCP improvements from NCBI, as of May 2024 #558
base: master
Are you sure you want to change the base?
Conversation
e6facc4
to
7b71bdb
Compare
I've force-pushed a corrected version superseding #570. |
Meanwhile, I'm looking to extend this series with BCP-specific tuneups superseding #573. To that end, I wanted to compare behavior for three cases: normal BCP in, BCP into an alternate version of the table where column types are all On the DBLIB front, I found that all three cases fully succeeded when built against this branch, but
Any idea what I might be missing? I took care to predefine UPDATE: Those errors turned out to stem from the use of |
Here are my summarized comparison results (comparing against version 16.0 of Sybase's client libraries), where CTLIB BCP truncation differences w/CS layer messages ignored (see below for corrected version):
DBLIB BCP truncation differences (also corrected below):
|
I'll turn back to other PRs for now, but do still plan to add a proper replacement for #573 at some point. |
Ah, |
CTLIB BCP truncation differences w/CS layer messages printed, and more corner cases examinedGeneral notes
Notably, all errors under Sybase officially come from the CT/BLK layer, whereas FreeTDS only ever issues CS-layer errors that offer less context and may be likelier to slip through the cracks. That should be possible to remedy by letting |
Corrected DBLIB truncation differences (modulo formatting details along the above lines and FreeTDS's support for additional client types):
|
7b71bdb
to
468812f
Compare
468812f
to
8739305
Compare
My latest version of this series adds a commit to address a corner case I recently encountered by (re)allowing BCP into computed columns in the context of a |
Picking up a commit to replace one work I was doing, specifically ae519f8. |
Another commit merged! I'm rebasing the initial merge on master to get an updated version. See https://github.com/freddy77/freetds/tree/pr/558/ncbi-2024-05-bcp-improvements. Some considerations
|
src/tds/bulk.c
Outdated
if (s[n] == '\0') { | ||
return s; /* already all uppercase */ | ||
} else { | ||
char * result = malloc(n + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about this ? Isn't this function converting ASCII to upper case but stopping at first lower case character ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yeah, I needed to reset n
to the length before proceeding here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, the calling code could use strcasestr
, but that's not portable, so we'd need to add an implementation to replacements
.
const char STD_DATETIME_FMT[] = "%b %e %Y %I:%M%p"; | ||
const char STD_DATETIME_FMT[] = "%Y-%m-%d %H:%M:%S.%z"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure useful for BCP but I suppose it breaks a lot of tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry, I checked for regressions and didn't find any.
* Read defaults, send them back explicitly as needed, and clean up their storage afterwards. * tds5_bcp_add_variable_columns: ** Send spaces in lieu of fully empty SYB(VAR)CHAR data. ** Send correct textpos values in all cases. ** When writing the adjustment table, avoid a potential Sybase "row format error" by eliding the (effective) column count when it would have repeated the largest adjustment table entry. * tds_deinit_bcpinfo: Free and reset current_row for transfers in. Avoid potentially trying to interpret fragments of outgoing TDS data as BLOB pointers. Signed-off-by: Aaron M. Ucko <ucko@ncbi.nlm.nih.gov>
Under these protocol versions, MS SQL Server describes these columns as having NVARCHAR(n) types rather than traditional DATETIME types, due to DATETIME's range limitations. (Likewise for TIME, which however still needs to come in as a preconverted UTF-16 string to compensate for combined API and protocol limitations.) - ctlib/blk.c: Perform character conversion on strings converted from fixed-width types such as CS_DATETIME_TYPE when a comparison of column-size limits appears to indicate an encoding difference. - tds/config.c: Adjust STD_DATETIME_FMT to match MS's (ISO-ish) tastes; in particular, ensure that the entire date portion fits within the first 10 characters. Signed-off-by: Aaron M. Ucko <ucko@ncbi.nlm.nih.gov>
* Extend _cs_convert to take an additional handle argument (allowed to be NULL) to provide for the possibility of replacing the caller's original output buffer with the one tdsconvert allocated itself (as it always does for blobs), rather than copying the blob data. * blk.c (_blk_get_col_data): Provide a non-NULL handle, both for efficiency's sake and because tds_alloc_bcp_column_data imposes a 4 KiB cap with the expectation that callers will take this approach. (dblib already does.) Signed-off-by: Aaron M. Ucko <ucko@ncbi.nlm.nih.gov>
To that end, factor a static _blk_clean_desc function out of blk_done's CS_BLK_ALL case. Signed-off-by: Aaron M. Ucko <ucko@ncbi.nlm.nih.gov>
* tds.h (struct tds_bcpinfo): Add more fields to enable the necessary bookkeeping. * ctlib/blk.c: Implement blk_textfer; adjust other functions accordingly. * tds/{bulk,data}.c: Allow for deferred, and possibly piecemeal, blob submission; in particular, make tds_bcp_send_record resumable. Signed-off-by: Aaron M. Ucko <ucko@ncbi.nlm.nih.gov>
Signed-off-by: Aaron M. Ucko <ucko@ncbi.nlm.nih.gov>
Adjust existing failure modes' boundaries and introduce the possibility of proceeding with truncated input with or without a message (with the latter occurring in some cases that require no conversion). As for messages, arrange to emit them via _ctclient_msg rather than _csclient_msg with the same (English) wording and context as Sybase yields, modulo formal punctuation differences and the usual discrepancies in layer and origin numbering. To that end: * Add internal BLK_CONV_* constants that serve both as (new) client message numbers and as possible return values for _cs_convert. * Cover those numbers in _ct_get_user_api_layer_error. * Have _cs_convert take the presence of a non-NULL handle (which only _blk_get_col_data supplies) to indicate that the caller wants bulk-insertion semantics, including in particular the possible use of those return values. * Keep a running count of rows sent for reporting purposes (determining the column number to cite by a linear scan of the bindinfo's columns array). Signed-off-by: Aaron M. Ucko <ucko@ncbi.nlm.nih.gov>
The destination view may well have an INSTEAD OF INSERT trigger that makes meaningful use of the supplied values. Signed-off-by: Aaron M. Ucko <ucko@ncbi.nlm.nih.gov>
Signed-off-by: Aaron M. Ucko <ucko@ncbi.nlm.nih.gov>
8739305
to
1c45955
Compare
Thanks for all that and for taking another look in general, and sorry for not getting back to you sooner -- I work for a US federal agency, and the new administration imposed a broad communications ban until recently.
Is there some other name you'd prefer?
Fair point. I was also using the presence of a non-
Good question. I was preserving the status quo here, but perhaps some simplification is possible.
Wouldn't that be a subtle API break?
By my reading of https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-ver16, it defaults to a legacy 8-bit encoding (e.g. CP-1252), but the Windows version can be directed to use UTF-16 instead. I'm not sure we have any installations here I can use for empirical testing. At any rate, I'm specifically concerned with bulk insertion.
Perhaps, but that's also out of scope here. |
This PR roughly corresponds to the primarily BCP-related commits from #555, adding 02aa63c per dependency considerations and omitting a couple of commits that can receive individual PRs, which I'll open shortly.