Skip to content

SQLServer Extended Event Handlers #20229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 136 commits into from
May 12, 2025
Merged

Conversation

azhou-datadog
Copy link
Contributor

@azhou-datadog azhou-datadog commented May 6, 2025

What does this PR do?

Implements the SQLServer Extended Event Handlers. This enables deobfuscation and query error visibility. This is a beefy PR so I will describe at a high level each component. See the RFC here

Configuration

  • Adds new xe_collection config section for two handlers: query_completions and query_errors
  • Each handler has enabled and collection_interval settings
  • Updates documentation for collect_raw_query_statement to mention XE events support - if you want RQT events sourced from the XE collection, you need to enable this config.

XE Handler

  • Implements XESessionBase for all XE session interaction
  • Handles connection to SQL Server and efficient XML event processing
  • Logic to read from ring buffer. Event file reading is not currently fully implemented, and will be a future improvement.
  • Provides standardized event normalization and payload generation
  • Calls SQL obfuscation on relevant fields and signature generation
  • Includes RQT (Raw Query Text) event generation for raw SQL collection
  • Uses timestamp-based filtering to avoid duplicates

Events emitted

Currently collects three types of query completion events, emitted as dbm_type=query_completion:

  • SQL batch completions
  • RPC completions
  • Module/procedure completions
  • Eventually we will also collect sp_statement_completed and sql_statement_completed.

Collects two types of error events, emitted as dbm_type=query_error:

  • SQL query errors with severity >= 11
  • Attention signals (query cancellations)
  • Eventually we will bring deadlock monitoring into this as well, but out of scope for now

Emits RQT (Raw Query Text) events when collect_raw_query_statement.enabled is true:

  • Contains original unobfuscated SQL statements with proper rate limiting
  • Includes both obfuscated and raw query signatures for future query correlation
  • Collects metadata about tables, commands, and query structure
  • Available for both query_completion and query_error events
  • The RQT events have a "statement" field, which represents the SQL executed. Some event types have multiple fields that can be interpreted as representing the sql statement. See _get_primary_sql_field implementations to see how each event type considers its primary sql field, which will get filled into the statement field.

Testing

  • Unit tests with XML fixtures covering the full XE collection pipeline
  • Integration tests verifying actual XE session interaction
  • Updated SQL scripts to create required XE sessions in test environment
  • Added validation of payload structure and field values

Motivation

Get query error and deobfuscated query visibility for sqlserver. This is a targeted feature for Rockstar, but greatly strengthens DBM's sqlserver offering.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Copy link

codecov bot commented May 6, 2025

Codecov Report

Attention: Patch coverage is 88.04745% with 131 lines in your changes missing coverage. Please review.

Project coverage is 91.14%. Comparing base (ea04835) to head (c370dab).
Report is 16 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
active_directory ?
activemq ?
activemq_xml ?
aerospike ?
airflow ?
amazon_msk ?
ambari ?
apache ?
appgate_sdp ?
arangodb ?
argo_rollouts ?
argo_workflows ?
argocd ?
aspdotnet ?
avi_vantage ?
aws_neuron ?
azure_iot_edge ?
boundary ?
btrfs ?
cacti ?
calico ?
cassandra ?
cassandra_nodetool ?
celery ?
ceph ?
cert_manager ?
cilium ?
cisco_aci ?
citrix_hypervisor ?
clickhouse ?
cloud_foundry_api ?
cloudera ?
cockroachdb ?
confluent_platform ?
consul ?
coredns ?
couch ?
couchbase ?
crio ?
datadog_checks_base ?
datadog_checks_dev ?
datadog_checks_downloader ?
datadog_cluster_agent ?
dcgm ?
ddev ?
directory ?
disk ?
dns_check ?
dotnetclr ?
druid ?
duckdb ?
ecs_fargate ?
eks_fargate ?
elastic ?
envoy ?
esxi ?
etcd ?
exchange_server ?
external_dns ?
fluentd ?
fluxcd ?
fly_io ?
foundationdb ?
gearmand ?
gitlab ?
gitlab_runner ?
glusterfs ?
go_expvar ?
gunicorn ?
haproxy ?
harbor ?
hazelcast ?
hdfs_datanode ?
hdfs_namenode ?
hive ?
hivemq ?
http_check ?
hudi ?
ibm_ace ?
ibm_db2 ?
ibm_i ?
ibm_mq ?
ibm_was ?
ignite ?
iis ?
impala ?
infiniband ?
istio ?
jboss_wildfly ?
kafka ?
kafka_consumer ?
karpenter ?
keda ?
kong ?
kube_apiserver_metrics ?
kube_controller_manager ?
kube_dns ?
kube_metrics_server ?
kube_proxy ?
kube_scheduler ?
kubeflow ?
kubelet ?
kubernetes_cluster_autoscaler ?
kubernetes_state ?
kubevirt_api ?
kubevirt_controller ?
kubevirt_handler ?
kyototycoon ?
kyverno ?
lighttpd ?
linkerd ?
linux_proc_extras ?
mapr ?
mapreduce ?
marathon ?
marklogic ?
mcache ?
mesos_master ?
milvus ?
mongo ?
mysql ?
nagios ?
network ?
nfsstat ?
nginx ?
nginx_ingress_controller ?
nvidia_nim ?
nvidia_triton ?
octopus_deploy ?
openldap ?
openmetrics ?
openstack ?
openstack_controller ?
pdh_check ?
pgbouncer ?
php_fpm ?
postfix ?
postgres ?
powerdns_recursor ?
presto ?
process ?
prometheus ?
proxysql ?
pulsar ?
quarkus ?
rabbitmq ?
ray ?
redisdb ?
rethinkdb ?
riak ?
riakcs ?
sap_hana ?
scylla ?
silk ?
silverstripe_cms ?
singlestore ?
slurm ?
snmp ?
snowflake ?
solr ?
sonarqube ?
sonatype_nexus ?
spark ?
sqlserver 91.09% <88.04%> (+4.94%) ⬆️
squid ?
ssh_check ?
statsd ?
strimzi ?
supabase ?
supervisord ?
system_core ?
system_swap ?
tcp_check ?
teamcity ?
tekton ?
teleport ?
temporal ?
teradata ?
tibco_ems ?
tls ?
tomcat ?
torchserve ?
traefik_mesh ?
traffic_server ?
twemproxy ?
twistlock ?
varnish ?
vault ?
velero ?
vertica ?
vllm ?
voltdb ?
vsphere ?
weaviate ?
weblogic ?
win32_event_log ?
windows_performance_counters ?
windows_service ?
wmi_check ?
yarn ?
zk ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@azhou-datadog azhou-datadog changed the title WIP: SQLServer Extended Event Handlers SQLServer Extended Event Handlers May 7, 2025
@azhou-datadog azhou-datadog force-pushed the allen.zhou/sqlserver_xe_deobf branch from 550ae52 to 19a6803 Compare May 8, 2025 14:01
Copy link
Contributor

@sethsamuel sethsamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, really appreciate the thorough commenting. A few questions/comments then LGTM

filtered_events = []
try:
# Convert string to bytes for lxml
xml_stream = BytesIO(xml_data.encode('utf-8'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the data returned always utf-8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the raw data is taken at raw_xml = str(row[0]), which is already getting a properly decoded string. I just choose an encoding here to convert it to bytes for lxml parsing. Just in case though, I can add back up logic to encode in utf-16 if we hit encoding errors.

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented May 12, 2025

Datadog Summary

✅ Code Quality    ✅ Code Security    ❌ Dependencies


Was this helpful? Give us feedback!

@azhou-datadog azhou-datadog added this pull request to the merge queue May 12, 2025
Merged via the queue into master with commit 75be8a6 May 12, 2025
42 checks passed
@azhou-datadog azhou-datadog deleted the allen.zhou/sqlserver_xe_deobf branch May 12, 2025 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants