Skip to content

Commit 75be8a6

Browse files
SQLServer Extended Event Handlers (#20229)
* poc test first pass * log events * logging * run_job_loop, not start * params correction * rpc_events xml parsing basic * batch_events and share utils * timestamp and timing implementation * event file implement * fix file path * return complete xml * parse xml on client side * time parsing and query section seperately * convert string to bytes * now test sqlserver parsing * remove sqlserver parsing version * missing statement from rpc_events * print event payload * fix json parsing * add event source to event payload * implement error events * remove config * test start time timestamp calculation * make allen test check more loose * log host and session id as well * delete log * delete correct log * use resolved hostname * try to detect ring buffer event loss * more visibility on timestamp gaps * do not limit max events for testing * temp increase of max events * fill in dbm_type based on event session name * implement sql statement events * implement sp statement events * combine query completions to a single event session * refactors * implement attention events * remove joined event handlers, add query start timing data * clean up * clean up * more clean up * RQT and obfuscate queries first pass * get query completion timestamp into rqt event * better timing data * add more logging * remove caching for now to get visibility for debugging * calculate raw query signature * normalize timestamps * add xe_type * fix event_name for error events * add query_signature to non-RQT event * refactor obfuscating logic * clean up dead code * consolidate more code * normalize timestamp for timestamp filtering * simplify timestamp filtering * fix timestamp gap logging * simplify event logging * omit duration and query_start from query error RQT * omit in XE event too * refactors * missed path fix * add sql fields back * explicitly state sql fields expected for each event session * move raw query signature calculation * implement configuration * unit test first pass * change imports * import change * add handlers test * fix stub import * don't mock event handler * mock keys return dict * fix tests * timestamp mock fixes * TimeMock class * avoid mocking time.time * refactors * fix expected types in rqt event * module end test * space in file name!! * add attention test * fix attention test * add integration test * send events to datadog * check if sleep makes test consistent * debug test * fix cursor call * grant select to datadog user * grant to bob * wrong setup * delete extra vars * log all calls * run check * follow activity.py pattern * fix event type * debug logging * fix config * refactor test * remove sleep * enable cache, add timestamp test * fix happy path test * linter fixes part 1 * linters part 2 * concat strings for linter * delete statement level event files * Add database instance to events * batch events for query_completion and query_errors * fix unit test serialization and add test for checking batching logic * add method tracking and code clean up * add change log * fix conditional logging * remove timing data now that we have tracked methods * log ANY first rqt event * validate config * fix import * license fix * validate models * make collection interval a number, not int * fix unit tests * update all setup scripts to set up XE sessions * add query visibility into error * clean up code * add raw query signature to query completion and error * revert to execute with retries * debug pipeline, only run on 2022 sqlserver * use convert syntax for adodbapi * add back 2019 sqlserver version * address review comments * delete dead code * parse XML only once * linter * add configurable max events * linter * validate config
1 parent 0cefae1 commit 75be8a6

File tree

26 files changed

+3941
-2
lines changed

26 files changed

+3941
-2
lines changed

sqlserver/assets/configuration/spec.yaml

Lines changed: 72 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -885,7 +885,9 @@ files:
885885
display_default: false
886886
- name: collect_raw_query_statement
887887
description: |
888-
Configure the collection of raw query statements in query activity and execution plans.
888+
Configure the collection of raw query statements in query activity, execution plans, and XE events.
889+
To collect raw query statements from XE events, set `xe_collection.query_completions.enabled` and
890+
`xe_collection.query_errors.enabled` to `true`.
889891
Raw query statements and execution plans may contain sensitive information (e.g., passwords)
890892
or personally identifiable information in query text.
891893
Enabling this option will allow the collection and ingestion of raw query statements and
@@ -997,6 +999,75 @@ files:
997999
value:
9981000
example: false
9991001
type: boolean
1002+
- name: xe_collection
1003+
description: |
1004+
Configure the collection of events from XE (Extended Events) sessions. Requires `dbm: true`.
1005+
1006+
Set `collect_raw_query_statement.enabled` to `true` to collect the raw query statements for each event.
1007+
options:
1008+
- name: debug_sample_events
1009+
description: |
1010+
Set the maximum number of XE events to log in debug mode per collection. Used for troubleshooting.
1011+
This only affects logging when debug mode is enabled. Defaults to 3.
1012+
hidden: true
1013+
value:
1014+
type: integer
1015+
example: 3
1016+
display_default: 3
1017+
- name: query_completions
1018+
description: |
1019+
Configure the collection of completed queries from the `datadog_query_completions` XE session.
1020+
1021+
Set `query_completions.enabled` to `true` to enable the collection of query completion events.
1022+
1023+
Use `query_completions.collection_interval` to set the interval (in seconds) for the collection of
1024+
query completion events. Defaults to 10 seconds. If you intend on updating this value,
1025+
it is strongly recommended to use a consistent value throughout all SQL Server agent deployments.
1026+
1027+
Use `query_completions.max_events` to set the maximum number of query completion events to process
1028+
per collection. Note that SQL Server's ring buffer has a maximum of 1000 events per query,
1029+
so values above 1000 will still be capped at 1000 by the database engine. Defaults to 1000.
1030+
value:
1031+
type: object
1032+
properties:
1033+
- name: enabled
1034+
type: boolean
1035+
example: false
1036+
- name: collection_interval
1037+
type: number
1038+
example: 10
1039+
display_default: 10
1040+
- name: max_events
1041+
type: integer
1042+
example: 1000
1043+
display_default: 1000
1044+
- name: query_errors
1045+
description: |
1046+
Configure the collection of query errors from the `datadog_query_errors` XE session.
1047+
1048+
Set `query_errors.enabled` to `true` to enable the collection of query error events.
1049+
1050+
Use `query_errors.collection_interval` to set the interval (in seconds) for the collection of
1051+
query error events. Defaults to 10 seconds. If you intend on updating this value,
1052+
it is strongly recommended to use a consistent value throughout all SQL Server agent deployments.
1053+
1054+
Use `query_errors.max_events` to set the maximum number of query error events to process
1055+
per collection. Note that SQL Server's ring buffer has a maximum of 1000 events per query,
1056+
so values above 1000 will still be capped at 1000 by the database engine. Defaults to 1000.
1057+
value:
1058+
type: object
1059+
properties:
1060+
- name: enabled
1061+
type: boolean
1062+
example: false
1063+
- name: collection_interval
1064+
type: number
1065+
example: 10
1066+
display_default: 10
1067+
- name: max_events
1068+
type: integer
1069+
example: 1000
1070+
display_default: 1000
10001071
- name: deadlocks_collection
10011072
description: |
10021073
Configure the collection of deadlock data.

sqlserver/changelog.d/20229.added

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Added SQLServer Extended Event Handlers
2+

sqlserver/datadog_checks/sqlserver/config.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ def __init__(self, init_config, instance, log):
5757
self.activity_config: dict = instance.get('query_activity', {}) or {}
5858
self.schema_config: dict = instance.get('schemas_collection', {}) or {}
5959
self.deadlocks_config: dict = instance.get('deadlocks_collection', {}) or {}
60+
self.xe_collection_config: dict = instance.get('xe_collection', {}) or {}
6061
self.cloud_metadata: dict = {}
6162
aws: dict = instance.get('aws', {}) or {}
6263
gcp: dict = instance.get('gcp', {}) or {}

sqlserver/datadog_checks/sqlserver/config_models/instance.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,36 @@ class SchemasCollection(BaseModel):
347347
max_execution_time: Optional[float] = None
348348

349349

350+
class QueryCompletions(BaseModel):
351+
model_config = ConfigDict(
352+
arbitrary_types_allowed=True,
353+
frozen=True,
354+
)
355+
collection_interval: Optional[float] = Field(None, examples=[10])
356+
enabled: Optional[bool] = Field(None, examples=[False])
357+
max_events: Optional[int] = Field(None, examples=[1000])
358+
359+
360+
class QueryErrors(BaseModel):
361+
model_config = ConfigDict(
362+
arbitrary_types_allowed=True,
363+
frozen=True,
364+
)
365+
collection_interval: Optional[float] = Field(None, examples=[10])
366+
enabled: Optional[bool] = Field(None, examples=[False])
367+
max_events: Optional[int] = Field(None, examples=[1000])
368+
369+
370+
class XeCollection(BaseModel):
371+
model_config = ConfigDict(
372+
arbitrary_types_allowed=True,
373+
frozen=True,
374+
)
375+
debug_sample_events: Optional[int] = None
376+
query_completions: Optional[QueryCompletions] = None
377+
query_errors: Optional[QueryErrors] = None
378+
379+
350380
class InstanceConfig(BaseModel):
351381
model_config = ConfigDict(
352382
validate_default=True,
@@ -406,6 +436,7 @@ class InstanceConfig(BaseModel):
406436
tags: Optional[tuple[str, ...]] = None
407437
use_global_custom_queries: Optional[str] = None
408438
username: Optional[str] = None
439+
xe_collection: Optional[XeCollection] = None
409440

410441
@model_validator(mode='before')
411442
def _initial_validation(cls, values):

sqlserver/datadog_checks/sqlserver/data/conf.yaml.example

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -643,7 +643,9 @@ instances:
643643
#
644644
# keep_identifier_quotation: false
645645

646-
## Configure the collection of raw query statements in query activity and execution plans.
646+
## Configure the collection of raw query statements in query activity, execution plans, and XE events.
647+
## To collect raw query statements from XE events, set `xe_collection.query_completions.enabled` and
648+
## `xe_collection.query_errors.enabled` to `true`.
647649
## Raw query statements and execution plans may contain sensitive information (e.g., passwords)
648650
## or personally identifiable information in query text.
649651
## Enabling this option will allow the collection and ingestion of raw query statements and
@@ -797,6 +799,42 @@ instances:
797799
#
798800
# propagate_agent_tags: false
799801

802+
## Configure the collection of events from XE (Extended Events) sessions. Requires `dbm: true`.
803+
##
804+
## Set `collect_raw_query_statement.enabled` to `true` to collect the raw query statements for each event.
805+
#
806+
# xe_collection:
807+
808+
## @param query_completions - mapping - optional
809+
## Configure the collection of completed queries from the `datadog_query_completions` XE session.
810+
##
811+
## Set `query_completions.enabled` to `true` to enable the collection of query completion events.
812+
##
813+
## Use `query_completions.collection_interval` to set the interval (in seconds) for the collection of
814+
## query completion events. Defaults to 10 seconds. If you intend on updating this value,
815+
## it is strongly recommended to use a consistent value throughout all SQL Server agent deployments.
816+
##
817+
## Use `query_completions.max_events` to set the maximum number of query completion events to process
818+
## per collection. Note that SQL Server's ring buffer has a maximum of 1000 events per query,
819+
## so values above 1000 will still be capped at 1000 by the database engine. Defaults to 1000.
820+
#
821+
# query_completions: {}
822+
823+
## @param query_errors - mapping - optional
824+
## Configure the collection of query errors from the `datadog_query_errors` XE session.
825+
##
826+
## Set `query_errors.enabled` to `true` to enable the collection of query error events.
827+
##
828+
## Use `query_errors.collection_interval` to set the interval (in seconds) for the collection of
829+
## query error events. Defaults to 10 seconds. If you intend on updating this value,
830+
## it is strongly recommended to use a consistent value throughout all SQL Server agent deployments.
831+
##
832+
## Use `query_errors.max_events` to set the maximum number of query error events to process
833+
## per collection. Note that SQL Server's ring buffer has a maximum of 1000 events per query,
834+
## so values above 1000 will still be capped at 1000 by the database engine. Defaults to 1000.
835+
#
836+
# query_errors: {}
837+
800838
## Configure the collection of deadlock data.
801839
#
802840
# deadlocks_collection:

sqlserver/datadog_checks/sqlserver/sqlserver.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@
5353
from datadog_checks.sqlserver.statements import SqlserverStatementMetrics
5454
from datadog_checks.sqlserver.stored_procedures import SqlserverProcedureMetrics
5555
from datadog_checks.sqlserver.utils import Database, construct_use_statement, parse_sqlserver_major_version
56+
from datadog_checks.sqlserver.xe_collection.registry import get_xe_session_handlers
5657

5758
try:
5859
import datadog_agent
@@ -157,6 +158,9 @@ def __init__(self, name, init_config, instances):
157158
self.agent_history = SqlserverAgentHistory(self, self._config)
158159
self.deadlocks = Deadlocks(self, self._config)
159160

161+
# XE Session Handlers
162+
self.xe_session_handlers = []
163+
160164
# _database_instance_emitted: limit the collection and transmission of the database instance metadata
161165
self._database_instance_emitted = TTLCache(
162166
maxsize=1,
@@ -169,6 +173,7 @@ def __init__(self, name, init_config, instances):
169173
self.check_initializations.append(self.load_static_information)
170174
self.check_initializations.append(self.config_checks)
171175
self.check_initializations.append(self.make_metric_list_to_collect)
176+
self.check_initializations.append(self.initialize_xe_session_handlers)
172177

173178
# Query declarations
174179
self._query_manager = None
@@ -177,6 +182,13 @@ def __init__(self, name, init_config, instances):
177182

178183
self._schemas = Schemas(self, self._config)
179184

185+
def initialize_xe_session_handlers(self):
186+
"""Initialize the XE session handlers without starting them"""
187+
# Initialize XE session handlers if not already initialized
188+
if not self.xe_session_handlers:
189+
self.xe_session_handlers = get_xe_session_handlers(self, self._config)
190+
self.log.debug("Initialized %d XE session handlers", len(self.xe_session_handlers))
191+
180192
def cancel(self):
181193
self.statement_metrics.cancel()
182194
self.procedure_metrics.cancel()
@@ -185,6 +197,13 @@ def cancel(self):
185197
self._schemas.cancel()
186198
self.deadlocks.cancel()
187199

200+
# Cancel all XE session handlers
201+
for handler in self.xe_session_handlers:
202+
try:
203+
handler.cancel()
204+
except Exception as e:
205+
self.log.error("Error canceling XE session handler for %s: %s", handler.session_name, e)
206+
188207
def config_checks(self):
189208
if self._config.autodiscovery and self.instance.get("database"):
190209
self.log.warning(
@@ -810,6 +829,13 @@ def check(self, _):
810829
self.sql_metadata.run_job_loop(self.tags)
811830
self._schemas.run_job_loop(self.tags)
812831
self.deadlocks.run_job_loop(self.tags)
832+
833+
# Run XE session handlers
834+
for handler in self.xe_session_handlers:
835+
try:
836+
handler.run_job_loop(self.tags)
837+
except Exception as e:
838+
self.log.error("Error running XE session handler for %s: %s", handler.session_name, e)
813839
else:
814840
self.log.debug("Skipping check")
815841

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# (C) Datadog, Inc. 2025-present
2+
# All rights reserved
3+
# Licensed under a 3-clause BSD style license (see LICENSE)

0 commit comments

Comments
 (0)